Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mediawiki_uploader.py not running if there is a missing page #79

Closed
ravidreams opened this issue Feb 28, 2016 · 2 comments
Closed

Mediawiki_uploader.py not running if there is a missing page #79

ravidreams opened this issue Feb 28, 2016 · 2 comments

Comments

@ravidreams
Copy link

Mediawiki_uploader.py not running if there is a missing page.

I tried creating this page manually and also tried following commands:

touch page_00001.txt
touch page_00001.upload

This is a recurring problem for many files. Google won't OCR these pages and gets stuck when we try running do_ocr.py again.

Logged in to https://ta.wikisource.org
INFO:root:Checking for bot access rights
INFO:root:The user Ravidreamsbot has bot access.
INFO:root:
Done. Uploaded all text files to wiki source

mv: cannot stat ‘all_text_for_’: No such file or directory
mv: cannot stat ‘OCR_’: No such file or directory
mv: cannot stat ‘upload-*’: No such file or directory
mv: cannot stat ‘செந்தமிழ்ப்_பெட்டகம்-2.pdf’: No such file or directory

@tha-uzhavan
Copy link

I think the issue arose because of the Internet connectivity. When i rerun the do_ocr.py. All page convertion are well and uploaded the text in ta.wikisource.

@tshrinivasan
Copy link
Owner

When google can not ocr few text files, run the following command.

python create_dummy_files.py

This will create dummy text files for the incomplete pdf files.

Then, run again
python do_ocr.py

to complete all the pending works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants