Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devanagari + Latin + Cyrillic #1

Open
gasyoun opened this issue Feb 28, 2021 · 1 comment
Open

Devanagari + Latin + Cyrillic #1

gasyoun opened this issue Feb 28, 2021 · 1 comment

Comments

@gasyoun
Copy link

gasyoun commented Feb 28, 2021

Since 2007 I'm submitting errors to https://www.sanskrit-lexicon.uni-koeln.de/ - main source of Sanskrit dictionaries on the net.
In 2014 I launched https://github.com/sanskrit-lexicon/ to make error submission public. We add new dictionaries. Now I want to
add a few Sanskrit-Russian dictionaries, but they use inermixed languages and Google OCR fails in that even more than Abbyy Fine Reader 12 (published in 2013, all other later versions have even weaker algorithms and a higher level of dirt in output). In 2013 (see http://samskrtam.ru/hellwigs-devanagari-ocr/) I wrote why Hellwig’s Devanagari OCR failed for batch recognition of Sanskrit OCR (1.0.0.9 beta).

Knauer's whole dictionary can be seen at http://samskrtam.ru/sanskrit-lexicon/knauer/ The original book was scanned with 600 dpi, the print is clear. Still the output is worse than 7-10 years ago with desktop software (where I was able to teach and edit patterns myself).

01002

Output:

а са епсі, и, также, даже, ибо, а, же,
въ стихахъ иногда expl., съ арі (иногда и безъ него) далѣе, также, съ Thйуаs (и безъ) еще; са — са и — и (какъ — такъ и, съ одной - съ другой стороны, хотя — но), при отрицаніи ни — ни; са — па са — tu N. 3, 16 хотя — но не — а; сäiva (ca eva) = са или нѣсколько выразительнѣе; cа, обыкн. сапа (са па) при вопросит. мѣстоим. и нарѣч. = нибудь. — [те, лат. que, гот. -1].

Issues:

  • initial
@gasyoun
Copy link
Author

gasyoun commented Mar 7, 2021

  1. devanagari lost to Лha
  2. italics lost
  3. hyphen at end of line lost

gurutva

Лha гуру-тва, ср. (-твам), 1) тяжесть, — вѣсъ, важность, —
достоинство, уваженіе внушаемое лѣтами и нравственнымъ достоинствомъ, — достоинство наставника; — 2) тя
гости, горе.

gurutva2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant