Skip to content

wangty6/local-pdf-reader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Local PDF Reader MCP

A Model Context Protocol (MCP) server that enables Claude to read and analyze local PDF files. Supports text extraction, figure extraction, and intelligent navigation for large documents.

Features

  • Full Text Extraction - Extract all text content from PDFs
  • Figure Extraction - Extract images and vector graphics with captions
  • Smart Navigation - Get PDF structure/TOC and read specific sections
  • Large PDF Support - Auto-chunking for sections over 10k words

Installation

Prerequisites

pip install pymupdf pypdf mcp

Install MCP

From local file:

claude mcp add /path/to/local-pdf-reader.mcpb

From GitHub release:

claude mcp add https://github.com/YOUR_USERNAME/local-pdf-reader-mcp/releases/download/v1.0/local-pdf-reader.mcpb

Tools

read_pdf_text

Extract all text from a PDF file.

read_pdf_text(file_path="/path/to/document.pdf")

read_pdf_figures

Extract figures and images with captions.

read_pdf_figures(file_path="/path/to/document.pdf")

get_pdf_structure

Get the table of contents with page ranges and word counts. Recommended for large PDFs.

get_pdf_structure(file_path="/path/to/document.pdf")

Returns:

  • Total pages and word count
  • Section hierarchy with IDs, titles, page ranges
  • Chunking info for large sections (>10k words)

read_pdf_section

Read a specific section by ID or title.

# By section ID
read_pdf_section(file_path="/path/to/doc.pdf", section_id=0)

# By title (fuzzy match)
read_pdf_section(file_path="/path/to/doc.pdf", section_title="Introduction")

# Read chunk 2 of a large section
read_pdf_section(file_path="/path/to/doc.pdf", section_id=5, chunk=2)

Usage Guide

Small PDFs (< 20 pages): Use read_pdf_text directly.

Large PDFs (> 20 pages):

  1. Call get_pdf_structure to get the TOC
  2. Use read_pdf_section to read relevant sections
  3. For sections marked needs_chunking: true, use chunk parameter

Extract figures: Use read_pdf_figures to get all images and charts.

License

MIT

About

MCP server for Claude to read local PDFs - text extraction, figure extraction, and smart TOC navigation for large documents

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages