A FastAPI-based service that converts PDF files to EPUB format with smart chapter detection and text formatting.
- Convert PDF files to EPUB format
- Smart chapter detection
- Automatic text formatting and cleanup
- REST API interface
- Support for metadata (title, author)
- Async processing
- Temporary file handling
- Modular and maintainable codebase
- Python 3.8+
- FastAPI
- Uvicorn
- pdfminer.six
- ebooklib
- python-multipart
- Clone the repository:
git clone https://github.com/eulixir/pdf2epub.git
cd pdf2epub
- Install dependencies:
pip install -r requirements.txt
- Start the server:
make run
- Access the API documentation at:
http://localhost:8000/docs
- Convert a PDF file using the API:
curl -X POST \
-F "file=@/path/to/your/file.pdf" \
-F "title=My Document" \
-F "author=John Doe" \
-F "output_format=epub" \
http://localhost:8000/convert/
GET /health/
- Returns service status
POST /convert/
- Converts PDF to EPUB
- Parameters:
file
: PDF file to converttitle
: Document titleauthor
: Document authoroutput_format
: Output format (default: epub)
app/
├── domain/
│ └── services/
│ ├── convert.py # Main conversion orchestration
│ ├── extract_pdf.py # PDF text extraction
│ ├── text_processor.py # Text cleaning and formatting
│ ├── chapter_processor.py # Chapter detection and splitting
│ └── epub.py # EPUB file generation
├── infra/
│ └── controllers/
│ ├── convert.py # API endpoints for conversion
│ └── healthcheck.py # Health check endpoint
└── main.py # Application entry point
The project follows a clean architecture approach with clear separation of concerns:
-
Domain Services: Core business logic organized into specialized modules
convert.py
: Orchestrates the conversion processextract_pdf.py
: Handles PDF text extractiontext_processor.py
: Manages text cleaning and formattingchapter_processor.py
: Detects and splits chaptersepub.py
: Handles EPUB file generation
-
Infrastructure: API endpoints and external interfaces
convert.py
: REST API endpoints for file conversionhealthcheck.py
: Service health monitoring
- Install development dependencies:
pip install -r requirements.txt
- Run linter:
make lint
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License