You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It doesn't seem like the Tesseract trained data set is optional (i.e. 'fast' vs 'best') and as far as I can tell, you are using 'fast'. Is that the case?
There may also be corruption somewhere in the trained data you have (at least, for eng) as I just noticed totally nonsensical series of characters in the conversion of a single basic word when it is multi-line. Something like...
The brown fox jumps over the lazy
qj2]a%sLo1
The text was updated successfully, but these errors were encountered:
In the docker image, trained data are downloaded from here: https://github.com/tesseract-ocr/tessdata/, from the readme it looks like a version between 'best' and 'fast'
The eng.traineddata from tessdata_best, coupled with the latest version of PgsToSrt (1.4.5 at time of writing) should give you pretty much perfect results @cmjordan42.
It doesn't seem like the Tesseract trained data set is optional (i.e. 'fast' vs 'best') and as far as I can tell, you are using 'fast'. Is that the case?
There may also be corruption somewhere in the trained data you have (at least, for eng) as I just noticed totally nonsensical series of characters in the conversion of a single basic word when it is multi-line. Something like...
The text was updated successfully, but these errors were encountered: