#### Load Data From PDF

In [None]:
from langchain_community.document_loaders import PyPDFLoader

def load_pdf(file_path: str):
    """
    Load a PDF file and return its content as a list of documents.

    Args:
        file_path (str): The path to the PDF file.

    Returns:
        list: A list of documents extracted from the PDF.
    """
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    
    return documents  # This returns a list of Document objects

In [23]:
documents = load_pdf("../data/Passport1.pdf")
print(documents)

[Document(metadata={'producer': 'FREE PDFill PDF and Image Writer', 'creator': 'PScript5.dll Version 5.2.2', 'creationdate': '2024-01-31T19:54:07+05:30', 'title': 'Microsoft Word - Instructions for Online TD - Local Applicants -english.doc', 'author': 'ewis', 'moddate': '2024-01-31T19:54:08+05:30', 'source': '../data/Passport1.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Department of Immigration and Emigration \nOnline Submission of Travel Document Applications –  Local Applicants \n  \n \nInstructions to Online Apply for a Passport \n(Please read the instructions carefully) \n \n1.  How to apply? \n1.1. You can apply for either urgent service or normal service.  \n1.2. If you select the urgent service, your travel docum ent will be issued after three days of capturing \nyour fingerprints. \n1.3. If you select the normal service, your travel docum ent will be issued after thirty days of capturing \nyour fingerprints. \n \n2.  Eligibility \n2.1 The eligibility c

In [24]:
type(documents)

list

In [25]:
print(documents[0].page_content)

Department of Immigration and Emigration 
Online Submission of Travel Document Applications –  Local Applicants 
  
 
Instructions to Online Apply for a Passport 
(Please read the instructions carefully) 
 
1.  How to apply? 
1.1. You can apply for either urgent service or normal service.  
1.2. If you select the urgent service, your travel docum ent will be issued after three days of capturing 
your fingerprints. 
1.3. If you select the normal service, your travel docum ent will be issued after thirty days of capturing 
your fingerprints. 
 
2.  Eligibility 
2.1 The eligibility criteria in order to avail this service are as follows: 
2.1.1  Your age shall be 16 years or above as at th e date of submitting your application. 
2.1.2  You shall possess your valid passport, if an y.  
2.1.3  Your NIC /Passport/Full Name shall not be bl acklisted. 
2.1.4  You shall not have a travel ban imposed by t he Courts of Law. 
2.1.5 You shall not have a travel ban imposed by th e Tri-forces or any o

In [26]:
len(documents)

4

In [27]:
len(documents[0].page_content)

2015

- You can directly give the output of load_pdf() to the split_text function

#### Lets Split the text into smaller chunks

In [29]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List
from langchain_core.documents import Document

def split_text(documents: List[Document], chunk_size=1000, chunk_overlap=100):
    """
    Split a list of LangChain Document objects into smaller chunks.

    Args:
        documents (List[Document]): The documents to split.
        chunk_size (int): Max size of each chunk.
        chunk_overlap (int): Overlap between chunks.

    Returns:
        List[Document]: Smaller chunks with metadata preserved.
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = splitter.split_documents(documents) 
    return chunks
# You can directly give the output of load_pdf() to the split_text function

In [30]:
chunks = split_text(documents)
chunks

[Document(metadata={'producer': 'FREE PDFill PDF and Image Writer', 'creator': 'PScript5.dll Version 5.2.2', 'creationdate': '2024-01-31T19:54:07+05:30', 'title': 'Microsoft Word - Instructions for Online TD - Local Applicants -english.doc', 'author': 'ewis', 'moddate': '2024-01-31T19:54:08+05:30', 'source': '../data/Passport1.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Department of Immigration and Emigration \nOnline Submission of Travel Document Applications –  Local Applicants \n  \n \nInstructions to Online Apply for a Passport \n(Please read the instructions carefully) \n \n1.  How to apply? \n1.1. You can apply for either urgent service or normal service.  \n1.2. If you select the urgent service, your travel docum ent will be issued after three days of capturing \nyour fingerprints. \n1.3. If you select the normal service, your travel docum ent will be issued after thirty days of capturing \nyour fingerprints. \n \n2.  Eligibility \n2.1 The eligibility c

In [31]:
type(chunks)

list

In [33]:
for i in chunks:
    print(i.page_content)
    print("----------------------------------------------")

Department of Immigration and Emigration 
Online Submission of Travel Document Applications –  Local Applicants 
  
 
Instructions to Online Apply for a Passport 
(Please read the instructions carefully) 
 
1.  How to apply? 
1.1. You can apply for either urgent service or normal service.  
1.2. If you select the urgent service, your travel docum ent will be issued after three days of capturing 
your fingerprints. 
1.3. If you select the normal service, your travel docum ent will be issued after thirty days of capturing 
your fingerprints. 
 
2.  Eligibility 
2.1 The eligibility criteria in order to avail this service are as follows: 
2.1.1  Your age shall be 16 years or above as at th e date of submitting your application. 
2.1.2  You shall possess your valid passport, if an y.  
2.1.3  Your NIC /Passport/Full Name shall not be bl acklisted. 
2.1.4  You shall not have a travel ban imposed by t he Courts of Law.
----------------------------------------------
2.1.4  You shall not have a

In [35]:
print(chunks[0].page_content)

Department of Immigration and Emigration 
Online Submission of Travel Document Applications –  Local Applicants 
  
 
Instructions to Online Apply for a Passport 
(Please read the instructions carefully) 
 
1.  How to apply? 
1.1. You can apply for either urgent service or normal service.  
1.2. If you select the urgent service, your travel docum ent will be issued after three days of capturing 
your fingerprints. 
1.3. If you select the normal service, your travel docum ent will be issued after thirty days of capturing 
your fingerprints. 
 
2.  Eligibility 
2.1 The eligibility criteria in order to avail this service are as follows: 
2.1.1  Your age shall be 16 years or above as at th e date of submitting your application. 
2.1.2  You shall possess your valid passport, if an y.  
2.1.3  Your NIC /Passport/Full Name shall not be bl acklisted. 
2.1.4  You shall not have a travel ban imposed by t he Courts of Law.
