Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

文字识别模型的训练数据来源 #280

Closed
fourierer opened this issue Nov 14, 2023 · 3 comments
Closed

文字识别模型的训练数据来源 #280

fourierer opened this issue Nov 14, 2023 · 3 comments

Comments

@fourierer
Copy link

大佬您好,感谢您的工作,请问文字识别模型ch_rec_server_crnn_res34.pth是用什么样的数据训练的呢,我自己加载您的开源模型然后在我自己场景数据上finetune分类层,得到的模型在特定场景比较好,但是同时失去了原先的通用文字识别能力,请问原版模型是用什么样子的数据训练的呢

@WenmuZhou
Copy link
Owner

公开数据集加生成数据集,具体参考paddleocr

@fourierer
Copy link
Author

公开数据集加生成数据集,具体参考paddleocr

感谢~,想再请教下具体的数据比例和训练方式,请问公开数据集是指360w开源数据集么,合成数据的量大概是多少呢,最后就是训练方式是混在一起train的,还是说先在合成数据上train然后在360w开源数据finetune什么的,我发现您的模型训练的效果很好,我这边总是复现不出来

@WenmuZhou
Copy link
Owner

我也记不得了,你翻一下paddleocr的issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants