A comprehensive document processing system with AI-powered analysis, storage, and retrieval capabilities.
- Clone the repository:
git clone https://github.com/yourusername/document-processing-system.git cd document-processing-system - Copy and edit the environment file:
cp .env.example .env # Edit .env with your settings - Start all services:
make dev
- Node-RED: http://localhost:1880
- OCR API: http://localhost:8081
- MinIO Console: http://localhost:9001
- Kibana: http://localhost:5601
- Elasticsearch: http://localhost:9200
- Tika Server: http://localhost:9998
- (See
make helpfor more)
- Upload a PDF via Node-RED HTTP endpoint (
/start-ocr) or other input. - Node-RED flow definition
- Node-RED receives the file, sends it to the OCR API.
- Node-RED OCR automation flow
- Receives the PDF, extracts text using OCRmyPDF, and returns the result as JSON.
- OCR API implementation
- OCR API Dockerfile
- Test the OCR API from the command line with a sample PDF.
- Test script
- All services are orchestrated via Docker Compose.
- docker-compose.yml
- Environment variables
- Start all services:
make dev
- Upload a PDF for OCR via Node-RED:
- POST a PDF to http://localhost:1880/start-ocr
- Example with curl:
curl -X POST -F "file=@/path/to/your.pdf" http://localhost:1880/start-ocr
- Directly test OCR API:
- Use the test script:
./scripts/test_ocr_api.sh /path/to/your.pdf
- Or use the web form at http://localhost:8081/
- Use the test script:
- Node-RED flows.json
- OCR API Python app.py
- OCR API Dockerfile
- OCR Bash Test Script
- docker-compose.yml
- Makefile
- Environment Variables
.
├── config/ # Service configurations
│ ├── elasticsearch/ # Elasticsearch config
│ ├── grafana/ # Grafana dashboards and config
│ ├── node-red/ # Node-RED flows and settings
│ └── prometheus/ # Prometheus config
├── data/ # Persistent data
│ ├── elasticsearch/ # ES data
│ ├── grafana/ # Grafana data
│ ├── minio/ # MinIO data
│ └── redis/ # Redis data
├── docker/ocr-api/ # OCR API implementation
├── scripts/ # Utility scripts
├── docker-compose.yml # Main compose file
├── Makefile # Project commands
└── .env # Environment variables
Create a .env file with the following variables (see .env.example for more):
OCR_HOST=ocr
OCR_PORT=8081
# ... other variables ...- Use
make helpto see all available commands. - To update services:
git pull origin main make dev
- To view logs:
docker-compose logs -f
This project is licensed under the MIT License - see the LICENSE file for details.
For support, open an issue in the GitHub repository or contact the maintainers.