title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | short_description |
---|---|---|---|---|---|---|---|---|
Datasets Convertor |
👁 |
purple |
indigo |
gradio |
5.20.0 |
app.py |
false |
Support by Parquet, CSV, Jsonl, XLS |
Datasets Converter is a tool for migrating datasets between different platforms, including GitHub, Hugging Face, Kaggle, and Google Colab Notebooks. It simplifies dataset transfers for seamless integration into AI/ML workflows.
- Convert and migrate datasets between:
- GitHub → Hugging Face
- GitHub → Kaggle
- GitHub → Google Colab
- Supports multiple dataset formats (CSV, JSON, Parquet, etc.).
- Automated metadata handling and versioning.
- CLI and API support for easy automation.
Clone the repository and install dependencies:
git clone https://github.com/canstralian/Datasets-Convertor.git
cd Datasets-Convertor
pip install -r requirements.txt
Usage
CLI Usage
python convert.py --source github --destination huggingface --repo "https://github.com/user/repo"
API Usage (FastAPI)
Start the API:
uvicorn app:app --host 0.0.0.0 --port 8000
Example request:
curl -X POST "http://localhost:8000/convert" -H "Content-Type: application/json" \
-d '{"source": "github", "destination": "huggingface", "repo": "https://github.com/user/repo"}'
Roadmap
• Add support for more dataset sources (Google Drive, S3, etc.).
• Enhance error handling and logging.
• Implement dataset validation and transformation features.
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
This project is licensed under the Apache License 2.0.
Let me know if you need any adjustments!