Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anuvaad Parallel Corpus for Indian languages #81

Closed
GokulNC opened this issue Nov 6, 2021 · 1 comment · Fixed by #83
Closed

Anuvaad Parallel Corpus for Indian languages #81

GokulNC opened this issue Nov 6, 2021 · 1 comment · Fixed by #83

Comments

@GokulNC
Copy link

GokulNC commented Nov 6, 2021

https://github.com/project-anuvaad/anuvaad-parallel-corpus

Has significant overlap with AI4Bharat corpus. Needs deduplication.

@thammegowda
Copy link
Owner

@GokulNC Thanks for creating this issue,
We can/should definitely add these datasets to the next version.

If possible, could you please send a pull request?
Here is an example for how to list datasets https://github.com/thammegowda/mtdata/blob/master/mtdata/index/ai4bharat.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants