PDF Chunking Tool

可以將 PDF 電子書依照自己所選的章節切成多個 PDF 檔案，方便後續使用（例如餵給 ChatGPT 或 Notebooklm）

A desktop application that allows users to select a PDF file, view its table of contents, and chunk the PDF into multiple smaller PDF files based on selected chapters. The application runs locally and prioritizes ease of use for PDF processing.

Features

File Selection: Provides a button to open a file dialog for PDF selection
ToC Extraction: Automatically extracts the Table of Contents (ToC) from the PDF
Hierarchical Display: Shows the PDF ToC in a tree-like structure for easy selection
Smart Selection:
- When a parent chapter is selected, all its child chapters are automatically included in the same chunk
- When a parent chapter is not selected, child chapters can be individually selected
Automatic Chunking: Automatically determines page ranges and creates new PDF files based on selected chapters
Friendly Naming: Chunked files are named using the original filename plus the chapter title

Screenshot

Technology Stack

Programming Language: Python
PDF Processing: PyMuPDF (fitz)
GUI Framework: PySide6 (Qt for Python)

Installation

Requirements

Python 3.8 or higher
Supported Operating Systems: Windows, macOS, Linux

Setup

Clone or download this project to your local machine
Install dependencies:

pip install -r requirements.txt

Usage

Method 1: Running from Source Code

Launch the application:

python pdf_chunker_gui.py

Click the "Select PDF" button to choose a PDF file to process
Check the chapters you want to split in the ToC list:
- Checking a parent chapter will automatically include all its child chapters
- When a parent chapter is not checked, child chapters can be individually checked
Click the "Start Chunking" button
Select an output directory
Wait for the process to complete; the system will display a list of created PDF chunk files

Method 2: Using Pre-compiled Version (macOS)

Go to the GitHub Releases page and download the latest .dmg file
Open the .dmg file and drag the PDFChunker application to your Applications folder
Launch PDFChunker from the Applications folder or Launchpad
Follow steps 2-6 as described above

Creating a Standalone Application (macOS)

You can create a standalone macOS application (.app bundle) using PyInstaller. This allows users to run the application without needing to install Python or any dependencies.

Install PyInstaller: If you haven't already, install PyInstaller:
```
pip install pyinstaller
```
Navigate to the project directory: Open your terminal and change to the project's root directory:
```
cd path/to/your/chunk_pdf
```
Run PyInstaller: Use the following command to build the application. This command creates a single executable file within an .app bundle, suitable for GUI applications.
```
pyinstaller --name "PDFChunker" --onefile --windowed --icon="path/to/your/icon.icns" pdf_chunker_gui.py
```
- --name "PDFChunker": Sets the name of your application.
- --onefile: Bundles everything into a single executable inside the .app.
- --windowed: Prevents a terminal console window from appearing when the GUI app runs.
- --icon="path/to/your/icon.icns": (Optional) Specifies the path to your custom application icon (.icns file). If you don't have one, you can omit this or create one.
- pdf_chunker_gui.py: The main script for your application.
Find the application: After PyInstaller finishes, you will find the PDFChunker.app (or the name you specified) inside the dist directory within your project folder.
Distribute: You can then distribute this .app file. For wider distribution, consider code signing and notarization for macOS. The generated .app file should not be committed to the Git repository; instead, use GitHub Releases to distribute it.

Note on .gitignore: Ensure that PyInstaller's build artifacts are ignored by Git. The .gitignore file in this project should already include:

build/
dist/
*.spec

File Description

pdf_chunker.py: Core logic class for handling PDF loading, ToC extraction, and chunking functionality
pdf_chunker_gui.py: GUI implementation using PySide6 to create the user interface
test_chunker.py: Test script for testing core logic functionality
create_test_pdf.py: Script for creating test PDF files
requirements.txt: List of dependencies

Error Handling

The application handles the following situations:

No ToC: If the PDF has no table of contents, a warning message is displayed
Encrypted/Unreadable PDF: If the PDF cannot be opened, an error message is displayed
File I/O Errors: Handles potential errors when saving chunked files
Filename Sanitization: Automatically cleans invalid characters in chapter titles to ensure valid filenames

License

This project is licensed under the MIT License.

Contributing

Feel free to submit issue reports, feature requests, or contribute code directly.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
.gitignore		.gitignore
PDFChunker_screenshot.png		PDFChunker_screenshot.png
README.md		README.md
build_macos.sh		build_macos.sh
create_test_pdf.py		create_test_pdf.py
pdf_chunker.py		pdf_chunker.py
pdf_chunker_gui.py		pdf_chunker_gui.py
requirements.txt		requirements.txt
test_chunker.py		test_chunker.py
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Chunking Tool

Features

Screenshot

Technology Stack

Installation

Requirements

Setup

Usage

Method 1: Running from Source Code

Method 2: Using Pre-compiled Version (macOS)

Creating a Standalone Application (macOS)

File Description

Error Handling

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

spreered/chunk_pdf

Folders and files

Latest commit

History

Repository files navigation

PDF Chunking Tool

Features

Screenshot

Technology Stack

Installation

Requirements

Setup

Usage

Method 1: Running from Source Code

Method 2: Using Pre-compiled Version (macOS)

Creating a Standalone Application (macOS)

File Description

Error Handling

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages