GitHub - uk0/LLMOCR: 使用LLM + OCR进行总结OCR识别的内容，返回相应的结构数据

LLM + OCR

后面可以扩展插件支持一些奇怪的表格处理数据，也可以使用openai的模型，目前是做了一个POC验证。
主要是为了方便识别一些简单的小图片，比如有些图片上的文字等消息，需要整理出来或是复制等，也能识别一些不清晰的内容。

Quick Start

PP-OCR-V4.0
Ollama (gemma2:2b-instruct-q8_0)
flask
chrome plugin

pip install -r requirements.txt
python app.py

install chrome plugin

open chrome://extensions/
switch developer mode on
load unpacked extension

find image and right click to open with WiseRead（Analyze Image）

Result for Chrome Tab will be shown as below on right div box

Model

llama3.1:8b-instruct-q8_0 效果最好
gemma2:9b-instruct-q8_0
Qwen1.5-MoE-A2.7B-Chat:latest

TODO

优化提示词(Doing...)
使用RAG优化结果，使结果更稳定(Doing...)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
chrome_plugin		chrome_plugin
doc		doc
ocr_test		ocr_test
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Yuanti.ttc		Yuanti.ttc
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM + OCR

Quick Start

install chrome plugin

Model

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM + OCR

Quick Start

install chrome plugin

Model

TODO

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages