AutoAudioBook

AutoAudioBook ingests PDF or DOCX books, extracts the text, creates a reviewable annotation artifact with Gemini inline TTS tags, previews chunking, and can generate per-chunk WAV audio with Gemini 3.1 Flash TTS. I run this on a Ubuntu 24.04 VM running on two cores and 4GB ram om a Beelink ME Pro.

This project was developed in VS Code with GitHub Copilot using GPT-5.4.

Install on Ubuntu 24.04

Clone the repository:

git clone https://github.com/tronba/AutoAudioBook.git
cd AutoAudioBook

Run the installer:

sudo bash install_ubuntu_24.sh

Open the app in a browser:

http://<server-ip>:8000

The installer will:

install Ubuntu packages
create or reuse .venv
install Python dependencies
securely prompt for the Gemini API key
save the key to /etc/autoaudiobook/autoaudiobook.env with restricted permissions
install and enable the systemd service

To manage the service:

sudo systemctl status autoaudiobook
sudo systemctl restart autoaudiobook
sudo journalctl -u autoaudiobook -n 100 --no-pager

What it does

Imports PDF or DOCX books
Generates a reviewable annotated DOCX with inline TTS tags
Lets you upload an approved annotation for audio generation
Previews chunking before synthesis
Generates WAV audio with Gemini TTS

Input file notes

DOCX input files should mark chapter headings with the Word style Header 1
Text before the first Chapter 1 marker is included in Chapter 1 by default
That opening text can be split out instead if you enable the separate pre-chapter text option during generation

Tag configuration

Editable inline tag vocabulary lives in tts_tags.toml.

[[expressive_tags]]
tag = "[angry]"
min_mode = "expressive"

Each tag belongs to either expressive_tags or vocalization_tags, and each entry includes a min_mode of conservative, balanced, or expressive.

Storage layout

storage/app.db - SQLite database
storage/uploads/ - original uploaded source files
storage/extracted/ - normalized extracted JSON
storage/annotated/ - draft and approved annotation DOCX files
storage/audio/ - reserved for future chunk and chapter audio output

Gemini configuration

Set these environment variables on the server:

GEMINI_API_KEY
GEMINI_TEXT_MODEL optional, defaults to gemini-2.5-flash
GEMINI_TTS_MODEL optional, defaults to gemini-3.1-flash-tts-preview

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
index.html		index.html
install_ubuntu_24.sh		install_ubuntu_24.sh
requirements.txt		requirements.txt
tts_tags.toml		tts_tags.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoAudioBook

Install on Ubuntu 24.04

What it does

Input file notes

Tag configuration

Storage layout

Gemini configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoAudioBook

Install on Ubuntu 24.04

What it does

Input file notes

Tag configuration

Storage layout

Gemini configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages