New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate and incomplete data for German fraktur #49
Comments
frk is the ISO 639-3 code for Frankish. |
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files FYI, the source of the 'Language' column in the tables is the old google code download page. Ray uploaded the official traineddata files to that old page, Zdenko added a few 3rd party files. |
I should have explained my question better. Why is German fraktur called Frankish? Neither the characters nor the words and also not the fonts used are Frankish language. And without hints from others I'd never have thought of using |
It seems frk is trained using modern German corpus and a small number of fonts. |
@stweil, maybe you want to close this issue? |
Do you think that |
Is 'frk' only for German Fraktur? |
I expect that the @theraysmith, it would be really interesting to know more details of the process which leads to that and also the other word lists. They look like extracts from random web sites. I don't think that good word lists for Fraktur can be produced like that. |
Although frk is Frankish in ISO 693-3, the data is actually for German Fractur. See: tesseract-ocr/tessdata_best#68 tesseract-ocr/tessdata#49 tesseract-ocr/langdata#61
Both deu_frak.traineddata and frk.traineddata try to support German fraktur.
deu_frak
is not part of the official tesseract-ocr/langdata, but comes from paalberti/tesseract-dan-fraktur. It does not support the new LSTM recognizer introduced by Tesseract 4, but currently gives better results for fraktur texts thanfrk
(which supports LSTM).frk
can be improved a lot by adding missing characters (primarily the long s, but also paragraph and dollar sign and maybe more) and based on latest corrections for langdata. With an improvedfrk
,deu_frak
would no longer be needed.It is unclear who invented the name
frk
for Frankish. Maybe it should be renamed.The text was updated successfully, but these errors were encountered: