Name		Name	Last commit message	Last commit date
parent directory ..
Images		Images
README.md		README.md
requirements.txt		requirements.txt
text_extractor.py		text_extractor.py

README.md

Extract Text from PDF using Python Module

Modules Required

PyPDF2 - It is used in Python for PDF related operations

AIM

To build a Python Script using the PyPDF2 Module which can extract text from a PDF file.

COMPILATION STEPS

Import PyPDF2 Module to:-
- Read the pdf into the program to further manipulate it
- Count the number of pages in the PDF
- Extract the text from a single PDF page
Initialize an empty string which will store the text being extracted from the PDF file
A for loop is made to parse through each page
- The extractText() function is used to extract text from the parsed PDF page
- The extracted text is added to the emptry string initialized using simple string concatenation
After parsing is done, the string in which the extracted text is stored is written in a new file named extracted_text.txt using basic File Handling in Python

PDF FILE WITH TEXT

OUTPUT OF EXTRACTED TEXT SHOWN IN NEW TEXT FILE extracted_text.txt

PyPDF2:

A Pure-Python library built as a PDF toolkit. To know more: PyPDF2 Docs

File handling

Python has some inbuilt methods to handles files and perform operations like reading and writing. read about them : File Handling Docs

Author

@Sakalya Mitra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Extractor from PDF

Text Extractor from PDF

README.md

Extract Text from PDF using Python Module

Modules Required

AIM

COMPILATION STEPS

PDF FILE WITH TEXT

OUTPUT OF EXTRACTED TEXT SHOWN IN NEW TEXT FILE extracted_text.txt

PyPDF2:

File handling

Author

Files

Text Extractor from PDF

Directory actions

More options

Directory actions

More options

Latest commit

History

Text Extractor from PDF

Folders and files

parent directory

README.md

Extract Text from PDF using Python Module

Modules Required

AIM

COMPILATION STEPS

PDF FILE WITH TEXT

OUTPUT OF EXTRACTED TEXT SHOWN IN NEW TEXT FILE extracted_text.txt

PyPDF2:

File handling

Author