Skip to content

canstralian/DataMigrationMaster

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned python_version hf_oauth hf_oauth_scopes license thumbnail short_description
Github To Huggingface Dataset Migration Tool πŸš€
πŸ“ˆ
green
gray
gradio
5.20.0
app.py
true
3.11.6
true
write-repos
manage-repos
apache-2.0
A web-based tool to migrate and analyze datasets with ease 🌟

Github to Hugging Face Dataset Migration Tool πŸ™2οΈβƒ£πŸ€—

This web-based tool allows you to migrate and analyze datasets from GitHub to the Hugging Face Datasets Hub. It provides a user-friendly interface for importing GitHub repositories, analyzing their contents, and exporting them to the Hub with automatic dataset card generation and validation.

πŸ† Key Features:

  • πŸš€ Import GitHub Repositories: Easily import GitHub repositories containing datasets by providing their URLs.
  • πŸ” Dataset Analysis: Analyze the repository's structure, identify potential dataset files (CSV, JSON, etc.), and extract relevant metadata.
  • 🀝 Hugging Face Hub Integration: Seamlessly export datasets to the Hugging Face Datasets Hub with built-in validation and dataset card generation.
  • 🧠 AI-Powered Analysis: Leverage AI to generate summaries of the dataset, analyze data quality, and identify potential issues.
  • πŸ“Š Comprehensive Reports: Generate detailed reports on code quality, community engagement, technical metrics, and AI-powered insights.

This tool is designed to make it easier for researchers, developers, and data scientists to:

  • Share their datasets with the Hugging Face community.
  • Discover and access datasets from GitHub.
  • Analyze and understand datasets before using them.
  • Improve the quality and accessibility of their datasets.

πŸ’‘ How to Use:

  1. 🌐 Provide GitHub Repository URL: Enter the URL of the GitHub repository containing the dataset you want to migrate.
  2. πŸ“ˆ Analyze and Review: The tool will analyze the repository and extract relevant metadata. Review and edit the metadata as needed.
  3. πŸš€ Export to Hugging Face: Click the "Export to Hugging Face" button to initiate the migration process.
  4. πŸ“ Generate Report: Download a comprehensive analysis report to gain insights into the dataset.

Additional Information:

Technology Stack: This tool is built using Gradio, Python, and Hugging Face Transformers. AI Provider: AI-powered analysis is provided by Anthropic. Open Source: The code for this tool is available on GitHub.

πŸ— We welcome contributions and feedback from the community to make this tool even better!

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

About

Dataset Migration and Analysis Tool

Resources

Stars

Watchers

Forks

Languages