A vision-based RAG (Retrieval-Augmented Generation) system for studying PDFs with charts and images, using the ColPali vision language model.
- PDF processing with visual awareness of images, charts, and tables
- No OCR required - works directly with the visual content
- Vector storage using ChromaDB (local, free, open-source)
- Simple and intuitive web interface
- Compatible with Ollama for completely local LLM generation
The repository is being updated to work with Python 3.13. Some dependencies need adaptation to work correctly with this version.
-
Clone this repository
git clone https://github.com/tofunori/vision-study-rag.git cd vision-study-rag -
Run the simplified setup script
setup_simplified.bat
-
Run the application
run_app.bat
-
Open your browser at
http://localhost:8501
This project is under active development. The current version includes:
- Basic Streamlit interface
- Simplified setup for Python 3.13
- Roadmap for full implementation
Future updates will add:
- Full ColPali integration
- Vector database connectivity
- PDF processing
- LLM integration with Ollama
Contributions are welcome! Please feel free to submit a Pull Request.