Chinese_OCR_synthetic_data

The progress was used to generate synthetic dataset for Chinese OCR.

Here we used Augmenter to augment out output characters in images, including rotate, skew, shear and distort. And you can change characters.txt file to use other characters. The main function can be found in the synthetic_data.py file.

The python package you may need:

tqdm
PIL(pillow)
pathlib
cv2(opencv)
numpy
codecs
glob

本程序用于合成中文OCR数据库。

本程序使用了Augmenter库，以对输出的图像进行增强图片中的文本，其中包括旋转、倾斜、剪切和扭曲。这些形变的参数可以在utils.py中找到并修改。在characters.txt中存放着所有的中文字符，如果想更换训练的字符请替换该文件。 main函数在synthetic_data.py中，可以按需要做修改。

使用之前可能需要安装一下的包：

tqdm
PIL(pillow)
pathlib
cv2(opencv)
numpy
codecs
glob

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
background		background
fonts		fonts
test_ocrdataset		test_ocrdataset
Aug_Operations.py		Aug_Operations.py
Aug_Operations.pyc		Aug_Operations.pyc
README.md		README.md
characters.txt		characters.txt
characters_top_5.txt		characters_top_5.txt
synthetic_data.py		synthetic_data.py
utils.py		utils.py
utils.pyc		utils.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese_OCR_synthetic_data

The progress was used to generate synthetic dataset for Chinese OCR.

本程序用于合成中文OCR数据库。

About

Releases

Packages

Languages

zzmcdc/Chinese_OCR_synthetic_data

Folders and files

Latest commit

History

Repository files navigation

Chinese_OCR_synthetic_data

The progress was used to generate synthetic dataset for Chinese OCR.

本程序用于合成中文OCR数据库。

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages