PDF to EPUB Converter

A FastAPI-based service that converts PDF files to EPUB format with smart chapter detection and text formatting.

Features

Convert PDF files to EPUB format
Smart chapter detection
Automatic text formatting and cleanup
REST API interface
Support for metadata (title, author)
Async processing
Temporary file handling
Modular and maintainable codebase

Requirements

Python 3.8+
FastAPI
Uvicorn
pdfminer.six
ebooklib
python-multipart

Installation

Clone the repository:

git clone https://github.com/eulixir/pdf2epub.git
cd pdf2epub

Install dependencies:

pip install -r requirements.txt

Usage

Start the server:

make run

Access the API documentation at:

http://localhost:8000/docs

Convert a PDF file using the API:

curl -X POST \
  -F "file=@/path/to/your/file.pdf" \
  -F "title=My Document" \
  -F "author=John Doe" \
  -F "output_format=epub" \
  http://localhost:8000/convert/

API Endpoints

Health Check

GET /health/
- Returns service status

Convert PDF

POST /convert/
- Converts PDF to EPUB
- Parameters:
  - file: PDF file to convert
  - title: Document title
  - author: Document author
  - output_format: Output format (default: epub)

Project Structure

app/
├── domain/
│   └── services/
│       ├── convert.py          # Main conversion orchestration
│       ├── extract_pdf.py      # PDF text extraction
│       ├── text_processor.py   # Text cleaning and formatting
│       ├── chapter_processor.py # Chapter detection and splitting
│       └── epub.py            # EPUB file generation
├── infra/
│   └── controllers/
│       ├── convert.py         # API endpoints for conversion
│       └── healthcheck.py     # Health check endpoint
└── main.py                    # Application entry point

Code Organization

The project follows a clean architecture approach with clear separation of concerns:

Domain Services: Core business logic organized into specialized modules
- convert.py: Orchestrates the conversion process
- extract_pdf.py: Handles PDF text extraction
- text_processor.py: Manages text cleaning and formatting
- chapter_processor.py: Detects and splits chapters
- epub.py: Handles EPUB file generation
Infrastructure: API endpoints and external interfaces
- convert.py: REST API endpoints for file conversion
- healthcheck.py: Service health monitoring

Development

Install development dependencies:

pip install -r requirements.txt

Run linter:

make lint

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF to EPUB Converter

Features

Requirements

Installation

Usage

API Endpoints

Health Check

Convert PDF

Project Structure

Code Organization

Development

Contributing

License

About

Uh oh!

Releases 1

Languages

License

eulixir/pdf2epub

Folders and files

Latest commit

History

Repository files navigation

PDF to EPUB Converter

Features

Requirements

Installation

Usage

API Endpoints

Health Check

Convert PDF

Project Structure

Code Organization

Development

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages