Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put temporary OCR files in a folder #105

Open
Shreeshrii opened this issue May 21, 2018 · 1 comment
Open

Put temporary OCR files in a folder #105

Shreeshrii opened this issue May 21, 2018 · 1 comment

Comments

@Shreeshrii
Copy link

When running do_ocr_jpg.py, the OCRed files are kept in the main directory as well as a couple of folders.

~/OCR4wikisource$ ls *.txt
all_text_for_2015.253393.Hanuman-Chalisa.pdf.txt       text_for_page_00004.txt  text_for_page_00011.txt  text_for_page_00018.txt  text_for_page_00025.txt  text_for_page_00032.txt
all_text_for_Hanuman Chalisa.pdf.txt                   text_for_page_00005.txt  text_for_page_00012.txt  text_for_page_00019.txt  text_for_page_00026.txt  text_for_page_00033.txt
all_text_for_Mudgala Purana (Pothi or Oblong).pdf.txt  text_for_page_00006.txt  text_for_page_00013.txt  text_for_page_00020.txt  text_for_page_00027.txt  text_for_page_00034.txt
missing_files.txt                                      text_for_page_00007.txt  text_for_page_00014.txt  text_for_page_00021.txt  text_for_page_00028.txt
text_for_page_00001.txt                                text_for_page_00008.txt  text_for_page_00015.txt  text_for_page_00022.txt  text_for_page_00029.txt
text_for_page_00002.txt                                text_for_page_00009.txt  text_for_page_00016.txt  text_for_page_00023.txt  text_for_page_00030.txt
text_for_page_00003.txt                                text_for_page_00010.txt  text_for_page_00017.txt  text_for_page_00024.txt  text_for_page_00031.txt

Suggest that instead of keeping in the root /OCR4wikisource folder, these should be kept in a subfolder under it.

@Shreeshrii
Copy link
Author

The mv command needed some change. The following works:

command = "mv folder*.log currentfile.pdf  doc_data.txt pg*.pdf page* txt* text*  " + '"' +  temp_folder + '"'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant