Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in input pdf filename not handled correctly #102

Open
Shreeshrii opened this issue May 21, 2018 · 3 comments
Open

Spaces in input pdf filename not handled correctly #102

Shreeshrii opened this issue May 21, 2018 · 3 comments

Comments

@Shreeshrii
Copy link

While testing do_ocr_jpg.py v2 I came across a problem related to spaces in the original file name.

I made the following changes to copy statement.

command = "cp *.txt 'text-for-"+ original_filename + "'"
logger.info("Making a copy of all text files to 'text-for-"+ original_filename + "'")

The file I tested with:

https://ia800107.us.archive.org/3/items/Hanuman_Chalisa/Hanuman%20Chalisa.pdf

It is a 2 page pdf in Devanagari script.

@Shreeshrii
Copy link
Author

Shreeshrii commented May 21, 2018

With version 3 of script

Moving all temp files to OCR-Hanuman Chalisa.pdf-temp-2018-05-21-08-05-14

INFO:__main__:Running mv folder*.log currentfile.pdf  doc_data.txt pg*.pdf page* txt* *.jpg  "OCR-Hanuman Chalisa.pdf-temp-2018-05-21-08-05-14"
mv: cannot stat ‘page_00001.jpg’: No such file or directory
mv: cannot stat ‘page_00002.jpg’: No such file or directory
INFO:__main__:Merged all OCRed files to  all_text_for_Hanuman Chalisa.pdf.txt
INFO:__main__:Making a copy of all text files to text-for-Hanuman Chalisa.pdf
INFO:__main__:Running cp *.txt text-for-Hanuman Chalisa.pdf
cp: target ‘Chalisa.pdf’ is not a directory

The output folders are not created. All files stay in the main directory.

@tshrinivasan
Copy link
Owner

tshrinivasan commented May 21, 2018 via email

@Shreeshrii
Copy link
Author

Shreeshrii commented May 21, 2018

Errors from another file - with v2 of script

mv: cannot stat 'page_01087.jpg': No such file or directory
mv: cannot stat 'page_01088.jpg': No such file or directory
mv: cannot stat 'page_01089.jpg': No such file or directory
INFO:__main__:Merged all OCRed files to  all_text_for_Mudgala Purana (Pothi or Oblong).pdf.txt
INFO:__main__:Making a copy of all text files to text-for-Mudgala Purana (Pothi or Oblong).pdf
INFO:__main__:Running cp *.txt text-for-Mudgala Purana (Pothi or Oblong).pdf
sh: 1: Syntax error: "(" unexpected
INFO:__main__:

Done. Check the text files start with text_for_page_

Edit: Looks like sh: 1: Syntax error: "(" unexpected has been reported previously also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants