Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vulgar fraction for 1/2 #69

Open
Shreeshrii opened this issue Apr 27, 2017 · 10 comments
Open

Add vulgar fraction for 1/2 #69

Shreeshrii opened this issue Apr 27, 2017 · 10 comments

Comments

@Shreeshrii
Copy link
Contributor

Shreeshrii commented Apr 27, 2017

@theraysmith

Please see tesseract-ocr/tesseract#841 (comment)

Out of the box, Tesseract already performs pretty well, but 150 years ago, house numbers in New York sometimes included ½, so I have to include this character in the desited_characters file:

https://cloud.githubusercontent.com/assets/1194896/25436113/477a23b6-2a60-11e7-967f-c4b97b21e3a9.png

I could not find any font which has 1/2 in this vertical format with straight line between 1 and 2.

@stweil
Copy link
Contributor

stweil commented Apr 27, 2017

This is not limited to English, but applies to more Latin based languages.

@Shreeshrii Shreeshrii changed the title Add vulgar fraction for 1/2 to english Add vulgar fraction for 1/2 Apr 27, 2017
@Shreeshrii
Copy link
Contributor Author

@stweil

  1. Do you know of any font which is similar to the image, has 1/2 in vertical format?

  2. Do other fractions (1/4, 3/4. 1/3 etc) also need to be supported?

@stweil
Copy link
Contributor

stweil commented Apr 27, 2017

  1. I saw that you did not find a matching font. Nor did I in a short search, but I have that on my list now.
  2. I'm afraid, yes, although I assume that 1/4 occurs less often than 1/2, and other fractions are even more rare. Collecting examples of such cases is also on my list of things to be done. Maybe someone has an old book with cooking recipes - I expect we can find more fractions there than in listings of house numbers.

@amitdo
Copy link

amitdo commented Apr 27, 2017

Pango, which is what we use to render the images with text2image, supports MathML.

@stweil
Copy link
Contributor

stweil commented Apr 27, 2017

Now we only need a Tesseract which can detect formulae in images and generate hOCR with MathML for those formulae. :-)

@amitdo
Copy link

amitdo commented Apr 27, 2017

Now we only need a Tesseract which can detect formulae in images

https://github.com/tesseract-ocr/tesseract/blob/master/ccmain/equationdetect.h

@Shreeshrii
Copy link
Contributor Author

tesseract-ocr/tesseract#2274 (comment)

It is possible to finetune to recognize fractions. See above comment.

@Shreeshrii
Copy link
Contributor Author

Also with a tool such as https://www.calligraphr.com/en/ it is possible to create a ttf with the desired form of characters and then use it for generating synthetic data. It will work well for Latin script based languages that do not have many ligatures or combining marks.

@Shreeshrii
Copy link
Contributor Author

Shreeshrii commented Apr 10, 2019

Font which has the fractions with numbers vertically above each other with a horizontal bar in between -
https://www.myfonts.com/fonts/russian-fonts/rf-rostin/

alt text

@Shreeshrii
Copy link
Contributor Author

https://graphicdesign.stackexchange.com/questions/71097/fractions-in-indesign-typing-not-%C2%BD-alt-0189 has a short list:

A list of some Google Fonts (all free) that you can use (thanks to @RadLexus):

Coda – by Vernon Adams
Telex – by Huerta Tipografica
Arbutus Slab – by Karolina Lach
Unica One – by Eduardo Tunni
Concert One
Cherry Swash – by Nataliya Kasatkina
Economica – by Vicente Lamonaca
Special Elite – by Astigmatic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants