可以將 PDF 電子書依照自己所選的章節切成多個 PDF 檔案,方便後續使用(例如餵給 ChatGPT 或 Notebooklm)
A desktop application that allows users to select a PDF file, view its table of contents, and chunk the PDF into multiple smaller PDF files based on selected chapters. The application runs locally and prioritizes ease of use for PDF processing.
- File Selection: Provides a button to open a file dialog for PDF selection
- ToC Extraction: Automatically extracts the Table of Contents (ToC) from the PDF
- Hierarchical Display: Shows the PDF ToC in a tree-like structure for easy selection
- Smart Selection:
- When a parent chapter is selected, all its child chapters are automatically included in the same chunk
- When a parent chapter is not selected, child chapters can be individually selected
- Automatic Chunking: Automatically determines page ranges and creates new PDF files based on selected chapters
- Friendly Naming: Chunked files are named using the original filename plus the chapter title
- Programming Language: Python
- PDF Processing: PyMuPDF (fitz)
- GUI Framework: PySide6 (Qt for Python)
- Python 3.8 or higher
- Supported Operating Systems: Windows, macOS, Linux
-
Clone or download this project to your local machine
-
Install dependencies:
pip install -r requirements.txt
- Launch the application:
python pdf_chunker_gui.py
-
Click the "Select PDF" button to choose a PDF file to process
-
Check the chapters you want to split in the ToC list:
- Checking a parent chapter will automatically include all its child chapters
- When a parent chapter is not checked, child chapters can be individually checked
-
Click the "Start Chunking" button
-
Select an output directory
-
Wait for the process to complete; the system will display a list of created PDF chunk files
-
Go to the GitHub Releases page and download the latest
.dmg
file -
Open the
.dmg
file and drag the PDFChunker application to your Applications folder -
Launch PDFChunker from the Applications folder or Launchpad
-
Follow steps 2-6 as described above
You can create a standalone macOS application (.app
bundle) using PyInstaller. This allows users to run the application without needing to install Python or any dependencies.
-
Install PyInstaller: If you haven't already, install PyInstaller:
pip install pyinstaller
-
Navigate to the project directory: Open your terminal and change to the project's root directory:
cd path/to/your/chunk_pdf
-
Run PyInstaller: Use the following command to build the application. This command creates a single executable file within an
.app
bundle, suitable for GUI applications.pyinstaller --name "PDFChunker" --onefile --windowed --icon="path/to/your/icon.icns" pdf_chunker_gui.py
--name "PDFChunker"
: Sets the name of your application.--onefile
: Bundles everything into a single executable inside the.app
.--windowed
: Prevents a terminal console window from appearing when the GUI app runs.--icon="path/to/your/icon.icns"
: (Optional) Specifies the path to your custom application icon (.icns
file). If you don't have one, you can omit this or create one.pdf_chunker_gui.py
: The main script for your application.
-
Find the application: After PyInstaller finishes, you will find the
PDFChunker.app
(or the name you specified) inside thedist
directory within your project folder. -
Distribute: You can then distribute this
.app
file. For wider distribution, consider code signing and notarization for macOS. The generated.app
file should not be committed to the Git repository; instead, use GitHub Releases to distribute it.
Note on .gitignore
:
Ensure that PyInstaller's build artifacts are ignored by Git. The .gitignore
file in this project should already include:
build/
dist/
*.spec
pdf_chunker.py
: Core logic class for handling PDF loading, ToC extraction, and chunking functionalitypdf_chunker_gui.py
: GUI implementation using PySide6 to create the user interfacetest_chunker.py
: Test script for testing core logic functionalitycreate_test_pdf.py
: Script for creating test PDF filesrequirements.txt
: List of dependencies
The application handles the following situations:
- No ToC: If the PDF has no table of contents, a warning message is displayed
- Encrypted/Unreadable PDF: If the PDF cannot be opened, an error message is displayed
- File I/O Errors: Handles potential errors when saving chunked files
- Filename Sanitization: Automatically cleans invalid characters in chapter titles to ensure valid filenames
This project is licensed under the MIT License.
Feel free to submit issue reports, feature requests, or contribute code directly.