Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor #27

Merged
merged 5 commits into from Jan 5, 2016

Conversation

@jimregan
Copy link
Contributor

jimregan commented May 18, 2015

Zdenko writes:

I found one older patch[1] comming from interesting androidapp Text Fairy (OCR Text Scanner) [2], [3]. I put it to separate branch (monitor) and spitted original patch to 3 commits for testing/cherry picking:
adds monitor/ETEXT_DESC to GetHOCRText
extends ETEXT_DESC ETEXT_DESC with PROGRESS_FUNC field and changed the percentage progress values to start with 0% instead of 30%.
extends hocr output by row attribute I skipped part where patch hard-code font (size?) to 15...

[1] https://www.mail-archive.com/tesseract-ocr@googlegroups.com/msg08089.html
[2] https://github.com/renard314/textfairy
[3] https://play.google.com/store/apps/details?id=com.renard.ocr

@jimregan

This comment has been minimized.

Copy link
Contributor Author

jimregan commented May 18, 2015

n-way history is in #26

@@ -587,6 +587,16 @@ class TESS_API TessBaseAPI {
* Make a HTML-formatted string with hOCR markup from the internal
* data structures.
* page_number is 0-based but will appear in the output as 1-based.
* monitor can be used to
* cancel the regocnition

This comment has been minimized.

Copy link
@jimregan

jimregan May 18, 2015

Author Contributor

Typo: 'recognition'

@theraysmith

This comment has been minimized.

Copy link
Contributor

theraysmith commented May 18, 2015

I like the spirit of the changes, except that I think it would be better to add some progress code to layout analysis than to start the progress at 0 for word recognition.
The time taken by layout analysis is heavily dependent on the PageSegMode, so it would be better if it (Tesseract::SegmentPage/Tesseract::AutoPageSeg/ColumnFinder::FindBlocks) could call the callbacks and return a number that becomes the base percentage to the recognition phase instead of using either an arbitrary 30, 0 or even worse, making it depend on the existence of a callback function.
That is a bigger change of course.

@zdenop

This comment has been minimized.

Copy link
Contributor

zdenop commented May 19, 2015

If there is no space to implement progress monitor for layout analysis for 3.04 release I would suggest to merge this change with explanation that this is only progress monitor for word recognition and it does not cover layout analysis...

zdenop added a commit that referenced this pull request Jan 5, 2016
Monitor
@zdenop zdenop merged commit c53add7 into master Jan 5, 2016
@zdenop zdenop deleted the monitor branch Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.