-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to process arXiv tex files without downloading? #51
Comments
Hi @irene622 , thanks for your question! The
You can then call RedPajama-Data/data_prep/arxiv/arxiv_cleaner.py Lines 131 to 177 in d174968
Let me know if this helps! |
I download myself arXiv tex files without using running scripts/arxiv-kickoff-download.sh.
My data structure is
I want to preprocess my latex data, so I run
bash scripts/arxiv-kickoff-cleaning.sh
and
arxiv-kickoff-cleaning.sh
is the followingarxiv-kickoff-cleaning.sh
runs with no error but,the result files which are
arxiv_1.jsonl
andarxiv_2.jsonl
have not content...What is the DATA_DIR and TARGET_DIR ?
Is there anything running method with latex files?
The text was updated successfully, but these errors were encountered: