Gemini Document Processor

A powerful document processing tool that uses Google's Gemini AI to generate high-quality Thai language summaries from PDF and EPUB files, with image extraction and Obsidian integration.

Features

Core Functionality

AI-Powered Summarization: Uses Google's latest Gemini models (gemini-2.0-flash, gemini-2.5-flash-preview, gemini-1.5-pro)
Multiple Document Formats: Processes both PDF and EPUB files
Thai-Focused Summaries: Optimized for creating comprehensive Thai language summaries

Advanced Processing

Smart Chunking: Processes documents in manageable chunks for better AI performance
Image Extraction: Extracts and filters images from documents with size thresholds
Robust Error Handling: Includes intelligent retry mechanisms with model fallbacks
Timeout Management: Configurable timeouts for both API calls and chunk processing

User Experience

Web Interface: Clean, tabbed web application for document processing
Real-time Progress Tracking: Live updates during processing
Job Status Monitoring: Track failed chunks and retry problematic sections
Parallel Processing: Multi-threaded image extraction for improved performance

Obsidian Integration

Direct Export: Create markdown files directly in your Obsidian vault
Metadata Support: Includes YAML frontmatter with tags and other metadata
Customizable Tags: Define your own Obsidian tags for processed documents

Installation

Clone this repository:

git clone https://github.com/kidpeterpan/gemini-document-processor.git
cd gemini-document-processor

Install the required dependencies:
```
pip install -r requirements.txt
```
Get a Google Gemini API key from Google AI Studio

Usage

Starting the Web Interface

Run the web server:

python document_gui.py

Then open your web browser and navigate to: http://127.0.0.1:8081/

Web Interface Features

The interface is organized into three tabs:

Basic Settings:
- Upload PDF or EPUB files
- Select Gemini model:
  - gemini-2.0-flash (Faster)
  - gemini-2.5-flash-preview (More accurate)
  - gemini-1.5-pro (Backup option)
- Adjust chunk size (pages per processing unit)
- Enter your Gemini API key
- Toggle image extraction
Obsidian Integration:
- Enable automatic export to Obsidian
- Verify and set Obsidian vault path
- Configure tags, author, cover URL, and review ratings
- Automatic path validation
Advanced Settings:
- Configure timeout settings:
  - Chunk processing timeout (60-1800 seconds)
  - API request timeout (30-300 seconds)
- Set retry attempts for API calls
- Configure image size thresholds
- Select image format (PNG/JPG)
- Adjust worker thread count (1-16)

Job Status and Monitoring

Real-time Progress: View detailed progress during processing
Log Viewer: See all processing events as they happen
Failed Chunks: Identify and retry problematic sections
Result Management: Download or view generated summaries
Obsidian Export: Track files exported to your Obsidian vault

How It Works

Document Loading: The application loads PDF or EPUB files and extracts text content
Chunking: Content is divided into manageable chunks (by page for PDFs, by chapter for EPUBs)
Image Extraction: Images are extracted with size filtering and saved separately
AI Processing: Each chunk is sent to Gemini API with timeout handling and retries
Error Recovery: Failed chunks are tracked and can be retried with more robust settings
Summary Creation: Results are compiled into a well-formatted Markdown document
Integration: Summary and images are saved locally and (optionally) to Obsidian

Troubleshooting

Common Issues

API Errors: Check your API key and internet connection
Processing Timeouts: Increase the chunk and API timeout values in Advanced Settings
Failed Chunks: Use the "Retry Failed Chunks" button on the job status page
Obsidian Integration: Ensure your Obsidian vault path is correct and contains a .obsidian folder

Error Logs

For detailed error information, check the application logs in your terminal or command prompt.

Project Structure

document_gui.py - Web interface and job management
document_processor.py - Core processing logic for documents
epub_processor.py - EPUB-specific processing functionality
templates/ - HTML templates for web interface
uploads/ - Temporary storage for uploaded files and processing results

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

This project uses the following technologies:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
tests		tests
.coverage		.coverage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Screenshot.png		Screenshot.png
cleanup.sh		cleanup.sh
document_gui.py		document_gui.py
document_processor.py		document_processor.py
epub_processor.py		epub_processor.py
requirements.txt		requirements.txt
settings.json		settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gemini Document Processor

Features

Core Functionality

Advanced Processing

User Experience

Obsidian Integration

Installation

Usage

Starting the Web Interface

Web Interface Features

Job Status and Monitoring

How It Works

Troubleshooting

Common Issues

Error Logs

Project Structure

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kidpeterpan/gemini-document-processor

Folders and files

Latest commit

History

Repository files navigation

Gemini Document Processor

Features

Core Functionality

Advanced Processing

User Experience

Obsidian Integration

Installation

Usage

Starting the Web Interface

Web Interface Features

Job Status and Monitoring

How It Works

Troubleshooting

Common Issues

Error Logs

Project Structure

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages