Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pytest tests for dataset URL accessibility #95

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fardin-developer
Copy link
Contributor

issue #72

This PR introduces unit tests in test_registry.py to ensure the datasets in REGISTRY are accessible and functioning correctly. The tests focus on validating the availability of the dataset URLs, specifically checking if the Hugging Face links are active and returning the expected responses. This helps proactively identify broken or inaccessible datasets, contributing to smoother maintenance and management of the dataset registry.

@msoedov
Copy link
Owner

msoedov commented Jan 23, 2025

Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?

@fardin-developer
Copy link
Contributor Author

Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?

Hi,
The test cases are failing as the hugging face requests are throwing 401 unauthorised. Do I need to send any specific headers in requests?
eg url: https://huggingface.co/ShawnMenz/DAN_jailbreak

@msoedov
Copy link
Owner

msoedov commented Jan 24, 2025

Hey @fardin-developer, I just added HUGGINGFACE_API_KEY to the repo secret that will be accessible in GitHub Actions.
I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and run huggingface-cli login

I would suggest changing HTTP GET requests to load_dataset, from datasets import load_dataset.

Thank you for your effort; that's great progress!

@fardin-developer
Copy link
Contributor Author

fardin-developer commented Jan 24, 2025

Hey @fardin-developer, I just added HUGGINGFACE_API_KEY to the repo secret that will be accessible in GitHub Actions. I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and run huggingface-cli login

I would suggest changing HTTP GET requests to load_dataset, from datasets import load_dataset.

Thank you for your effort; that's great progress!

I just wanted to mention that all the URLs inside agentic_security/probe_data/init.py are invalid and returning 404 errors. I tried opening them in the browser as well and they didn’t work. However, I tested with a valid URL like https://huggingface.co/datasets/fka/awesome-chatgpt-prompts and it worked fine without any issues.

@msoedov
Copy link
Owner

msoedov commented Jan 24, 2025

@fardin-developer let me pull your branch and look into that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants