Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloadable Training Data #3

Closed
djl11 opened this issue Apr 12, 2021 · 4 comments
Closed

Downloadable Training Data #3

djl11 opened this issue Apr 12, 2021 · 4 comments

Comments

@djl11
Copy link

djl11 commented Apr 12, 2021

It would be very useful to have the training data returned from generate_data_parallel.py script available to download, for both the pile and packed cases.

I appreciate this may be a large amount of memory, and therefore difficult to host, so there is no expectation of course!
But it would avoid people needing to run the costly data generation process locally in order to experiment with the training.

@Steve-Tod
Copy link
Collaborator

Hi, I upload the link of data we generated here. Hopefully it would be helpful.

@djl11
Copy link
Author

djl11 commented Apr 13, 2021

Great, thanks a lot!

@djl11 djl11 closed this as completed Apr 13, 2021
@djl11
Copy link
Author

djl11 commented Apr 22, 2021

Hey, quick heads up. The links in the table for the README are mis-matched. The pile links leads to packed data, and vice versa. Also a small type for the word "this" beforehand. Both should be easy to fix! :)

@Steve-Tod
Copy link
Collaborator

Thank you for reminding me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants