Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About OCR generation #37

Open
LC-John opened this issue Dec 14, 2021 · 2 comments
Open

About OCR generation #37

LC-John opened this issue Dec 14, 2021 · 2 comments
Labels
good first issue Good for newcomers TODO Features to update!

Comments

@LC-John
Copy link

LC-John commented Dec 14, 2021

Hi, guys!
I am trying to reuse the OCR transformation module in TextFlint, but I somehow find it rather trivial...
I quote the code about the OCR rules in the source code as below:

mapping = {
            '0': ['8', '9', 'o', 'O', 'D'],
            '1': ['4', '7', 'l', 'I'],
            '2': ['z', 'Z'],
            '5': ['8'],
            '6': ['b'],
            '8': ['s', 'S', '@', '&'],
            '9': ['g'],
            'o': ['u'],
            'r': ['k'],
            'C': ['G'],
            'O': ['D', 'U'],
            'E': ['B']
        }

Here, the rules do not even cover the alphabet... And there are for sure more rules, eg., "w" => "vv". "m" => "rn".
I have found a dataset here (https://github.com/jie-mei/MiBio-OCR-dataset), which contains some OCR errors retrieved from real-world.
Although I find it quite annoying to parse the files in the aforementioned dataset... I believe that it may be benefitial to this work!

@BeyonderXX
Copy link
Contributor

Hi,

Thanks for the suggestion!
We will update the function soon!

Xiao Wang

@BeyonderXX BeyonderXX added TODO Features to update! good first issue Good for newcomers labels Dec 15, 2021
@aflah02
Copy link

aflah02 commented Mar 20, 2022

Hey @BeyonderXX is anyone working on this right now? I'm interested to have a look at this if that works

As a starter I feel just parsing and making a dictionary using this file and adding it to the current mapping if that seems fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers TODO Features to update!
Projects
None yet
Development

No branches or pull requests

3 participants