pdfsail is an open-source PDF processing library designed to simplify operations, parsing, and conversion of PDF files. Whether you need to convert a PDF to another format or extract information from a PDF, pdfsail makes it easy.
- PDF Conversion: Convert PDF files to images or text.
- Text Extraction: Extract text, metadata, images, etc., from PDFs.
- Page Operations: Merge, split, rotate, crop pages, and more.
- Cross-platform: Supports all major operating systems (Windows, macOS, Linux).
- High Performance: Optimized performance, smooth operation even with large files.
Install via Pythonβs package manager pip
:
pip install pdfsail
Here are a few simple examples to get you started with pdfsail.
from pdfsail import PDFToImage
# Convert PDF to images (one image per page)
pdf_to_image = PDFToImage("example.pdf")
pdf_to_image.convert("output_folder/")
from pdfsail import PDFTextExtractor
# Extract text from PDF
pdf_text = PDFTextExtractor("example.pdf")
text = pdf_text.extract_text()
print(text)
from pdfsail import PDFMerger
pdf_merger = PDFMerger()
pdf_merger.merge(["file1.pdf", "file2.pdf"], "output_merged.pdf")
- Python: Main development language.
- PyPDF2: For basic PDF operations.
- Pillow: For converting PDF pages to images.
- PDFMiner: For text extraction.
- Project Documentation β Detailed API documentation and usage examples.
- GitHub Issues β Report bugs or request new features.
We welcome contributions in any form! If you have suggestions or find a bug, feel free to submit a Pull Request or open an Issue.
- Fork the project and clone it to your local machine.
- Create a new branch.
- Commit your changes.
- Submit a Pull Request.
The pdfsail project is licensed under the MIT License.