<a href="https://colab.research.google.com/github/vitchierath/Gen_Ai_miniprojects/blob/main/multiplepdf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.22-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.55 (from langchain-community)
  Downloading langchain_core-0.3.55-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain<1.0.0,>=0.3.24 (from langchain-community)
  Downloading langchain-0.3.24-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

In [None]:
import os
import PyPDF2
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from pathlib import Path
import uuid

def extract_text_from_pdfs(pdf_directory):
    texts = []
    pdf_files = Path(pdf_directory).glob("*.pdf")

    for pdf_file in pdf_files:
        try:
            with open(pdf_file, "rb") as file:
                pdf_reader = PyPDF2.PdfReader(file)
                text = ""
                for page in pdf_reader.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text += page_text + "\n"
                texts.append(text)
        except Exception as e:
            print(f"Error processing {pdf_file}: {e}")

    return texts

def create_vector_store(texts):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )

    documents = text_splitter.create_documents(texts)
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vectorstore = Chroma.from_documents(documents, embeddings)

    return vectorstore

def setup_qa_pipeline():
    model_name = "distilbert-base-cased-distilled-squad"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)
    qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)
    return qa_pipeline

def answer_question(question, vectorstore, qa_pipeline):
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    qa_chain = RetrievalQA.from_chain_type(
        llm=HuggingFacePipeline(pipeline=qa_pipeline),
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True
    )

    result = qa_chain({"query": question})
    return result["result"], result["source_documents"]

def main(pdf_directory, question):
    # Extract text from PDFs
    texts = extract_text_from_pdfs(pdf_directory)
    if not texts:
        return "No valid PDF content found.", []

    # Create vector store
    vectorstore = create_vector_store(texts)

    # Setup QA pipeline
    qa_pipeline = setup_qa_pipeline()

    # Answer the question
    answer, source_documents = answer_question(question, vectorstore, qa_pipeline)

    return answer, source_documents

if __name__ == "__main__":
    # Example usage
    pdf_directory = "./pdfs"  # Directory containing PDF files
    question = "What is the main topic discussed in the PDFs?"

    answer, sources = main(pdf_directory, question)
    print(f"Question: {question}")
    print(f"Answer: {answer}")
    print("\nSources:")
    for i, doc in enumerate(sources, 1):
        print(f"{i}. {doc.page_content[:200]}...")

Question: What is the main topic discussed in the PDFs?
Answer: No valid PDF content found.

Sources:


In [None]:
import os
import PyPDF2
import sys
import io
import uuid
from pathlib import Path
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

# Detect if running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    from google.colab import files, drive
else:
    try:
        import ipywidgets as widgets
        from IPython.display import display
    except ImportError:
        print("ipywidgets not installed. File upload in Jupyter requires ipywidgets.")
        print("Install with: pip install ipywidgets")

# Install dependencies
if IN_COLAB:
    !pip install PyPDF2 langchain langchain-huggingface chromadb sentence-transformers transformers torch huggingface_hub ipywidgets
else:
    print("Ensure dependencies are installed: PyPDF2, langchain, langchain-huggingface, chromadb, sentence-transformers, transformers, torch, huggingface_hub, ipywidgets")
    print("Install with: pip install PyPDF2 langchain langchain-huggingface chromadb sentence-transformers transformers torch huggingface_hub ipywidgets")

def mount_google_drive():
    """Mount Google Drive in Colab."""
    if not IN_COLAB:
        return False
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully.")
        return True
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        return False

def extract_text_from_pdf_file(pdf_file, is_uploaded=False):
    """Extract text from a single PDF file (from path or uploaded bytes)."""
    try:
        if is_uploaded:
            pdf_reader = PyPDF2.PdfReader(io.BytesIO(pdf_file))
        else:
            with open(pdf_file, 'rb') as file:
                pdf_reader = PyPDF2.PdfReader(file)

        text = ""
        for page in pdf_reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\n"
        return text
    except Exception as e:
        print(f"Error processing {pdf_file if not is_uploaded else 'uploaded file'}: {e}")
        return ""

def upload_files_colab():
    """Handle file uploads in Google Colab."""
    print("Please upload your PDF files:")
    uploaded = files.upload()
    return uploaded

def upload_files_jupyter():
    """Handle file uploads in Jupyter using ipywidgets."""
    uploader = widgets.FileUpload(accept='.pdf', multiple=True)
    display(uploader)
    print("Upload PDFs and then run the next cell to process them.")
    return uploader

def process_uploaded_files(uploaded):
    """Process uploaded files from Colab or Jupyter."""
    texts = []
    if IN_COLAB:
        for filename, file_content in uploaded.items():
            if not filename.lower().endswith('.pdf'):
                print(f"Skipping {filename}: Not a PDF file")
                continue
            text = extract_text_from_pdf_file(file_content, is_uploaded=True)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {filename}")
    else:
        for filename, file_data in uploaded.value.items():
            if not filename.lower().endswith('.pdf'):
                print(f"Skipping {filename}: Not a PDF file")
                continue
            text = extract_text_from_pdf_file(file_data['content'], is_uploaded=True)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {filename}")
    return texts

def extract_text_from_pdfs(source, is_directory=False):
    """Extract text from PDFs (directory or uploaded files)."""
    texts = []

    if is_directory:
        pdf_directory = Path(source)
        if not pdf_directory.exists():
            raise FileNotFoundError(f"Directory {source} does not exist.")

        for pdf_file in pdf_directory.glob("*.pdf"):
            text = extract_text_from_pdf_file(pdf_file, is_uploaded=False)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {pdf_file}")
    else:
        texts = process_uploaded_files(source)

    return texts

def create_vector_store(texts):
    """Create a vector store from extracted texts."""
    if not texts:
        raise ValueError("No valid text extracted from PDFs")

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )

    documents = text_splitter.create_documents(texts)
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    vectorstore = Chroma.from_documents(documents, embeddings, persist_directory=f"/tmp/chroma_{uuid.uuid4()}")

    return vectorstore

def setup_qa_pipeline():
    """Set up the question-answering pipeline."""
    model_name = "distilbert-base-cased-distilled-squad"
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForQuestionAnswering.from_pretrained(model_name)
        qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer, device=-1)
        return qa_pipeline
    except Exception as e:
        print(f"Error setting up QA pipeline: {e}")
        raise

def answer_question(question, vectorstore, qa_pipeline):
    """Answer the question using the vector store and QA pipeline."""
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    qa_chain = RetrievalQA.from_chain_type(
        llm=HuggingFacePipeline(pipeline=qa_pipeline),
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True
    )

    result = qa_chain.invoke({"query": question})
    return result["result"], result["source_documents"]

def main(question):
    """Main function to process PDFs and answer a question."""
    if IN_COLAB:
        print("Choose PDF input method:")
        print("1. Upload PDF files")
        print("2. Load PDFs from Google Drive directory")
        choice = input("Enter 1 or 2: ").strip()
    else:
        print("Choose PDF input method:")
        print("1. Upload PDF files")
        print("2. Load PDFs from local directory")
        choice = input("Enter 1 or 2: ").strip()

    texts = []
    if choice == "1":
        if IN_COLAB:
            uploaded = upload_files_colab()
        else:
            uploader = upload_files_jupyter()
            print("Waiting for uploads... After uploading, run the code again or process manually.")
            return None  # Jupyter requires manual handling
        texts = extract_text_from_pdfs(uploaded, is_directory=False)
    elif choice == "2":
        if IN_COLAB:
            if mount_google_drive():
                pdf_directory = input("Enter the Google Drive directory path (e.g., /content/drive/MyDrive/pdfs): ").strip()
                try:
                    texts = extract_text_from_pdfs(pdf_directory, is_directory=True)
                except FileNotFoundError as e:
                    return str(e), []
            else:
                return "Failed to mount Google Drive.", []
        else:
            pdf_directory = input("Enter the local directory path (e.g., ./pdfs): ").strip()
            try:
                texts = extract_text_from_pdfs(pdf_directory, is_directory=True)
            except FileNotFoundError as e:
                return str(e), []
    else:
        return "Invalid choice. Please select 1 or 2.", []

    if not texts:
        return "No valid PDF content found. Please upload PDFs or specify a valid directory.", []

    # Create vector store
    vectorstore = create_vector_store(texts)

    # Setup QA pipeline
    qa_pipeline = setup_qa_pipeline()

    # Answer the question
    answer, source_documents = answer_question(question, vectorstore, qa_pipeline)

    return answer, source_documents

if __name__ == "__main__":
    question = "What is the main topic discussed in the PDFs?"

    try:
        result = main(question)
        if result is None:
            print("Please process uploaded files manually in Jupyter.")
        else:
            answer, sources = result
            print(f"Question: {question}")
            print(f"Answer: {answer}")
            print("\nSources:")
            for i, doc in enumerate(sources, 1):
                print(f"{i}. {doc.page_content[:200]}...")
    except Exception as e:
        print(f"Error: {e}")

Choose PDF input method:
1. Upload PDF files
2. Load PDFs from Google Drive directory
Enter 1 or 2: 1
Please upload your PDF files:


Saving Death Stranding Vol 2 - Hitori Nojima.pdf to Death Stranding Vol 2 - Hitori Nojima (1).pdf
Saving Death Stranding Vol 1 - Hitori Nojima.pdf to Death Stranding Vol 1 - Hitori Nojima (1).pdf


Device set to use cpu


Error: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

simple-minded beings favor ed small family units , so that
even if a breakthrough occur red, it was unlikely to be shared
with others. This isolation, more than any other factor,
seems to have led to their decline.
“Homo sapie ns, meanwhile, conceived religion, with
which large numbers of individuals could be bound together
in service to a common cause. Strength in numbers also
made their communities more resistant to famine and other
calamities. In other words, Homo sapiens grew stronger
through interpersonal connections. By creating what came
to be called ‘society .’ The meta-level law we talk about
could be referred to as ﬁction. While each Beach belongs to
an individual, what uniﬁes them all is a common ﬁction.”
After listeni ng to Heartman’s explanation intently,
Deadman turned back towar d the monitor showing th



In [None]:
!pip install langchain_huggingface

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain_huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain_huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain_huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain_huggingface)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==1

In [None]:
import os
import PyPDF2
import sys
import io
import uuid
from pathlib import Path
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

# Detect if running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

# Check for dependencies
try:
    if IN_COLAB:
        from google.colab import files, drive
    else:
        import ipywidgets as widgets
        from IPython.display import display
except ImportError:
    print("ipywidgets not installed. File upload in Jupyter requires ipywidgets.")
    print("Install with: pip install ipywidgets")
    sys.exit(1)

required_packages = [
    'PyPDF2', 'langchain', 'langchain_huggingface', 'chromadb',
    'sentence_transformers', 'transformers', 'torch', 'huggingface_hub'
]
missing_packages = []
for pkg in required_packages:
    try:
        __import__(pkg)
    except ImportError:
        missing_packages.append(pkg)

if missing_packages:
    print("Missing required packages:", ", ".join(missing_packages))
    print("Install with:")
    print("pip install", " ".join(missing_packages))
    if IN_COLAB:
        print("Or in Colab, run: !pip install", " ".join(missing_packages))
    sys.exit(1)


def mount_google_drive():
    """Mount Google Drive in Colab."""
    if not IN_COLAB:
        return False
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully.")
        return True
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        return False

def extract_text_from_pdf_file(pdf_file, is_uploaded=False):
    """Extract text from a single PDF file."""
    try:
        if is_uploaded:
            pdf_reader = PyPDF2.PdfReader(io.BytesIO(pdf_file))
        else:
            with open(pdf_file, 'rb') as file:
                pdf_reader = PyPDF2.PdfReader(file)

        text = ""
        for page in pdf_reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\n"
        return text
    except Exception as e:
        print(f"Error processing {pdf_file if not is_uploaded else 'uploaded file'}: {e}")
        return ""

def upload_files_colab():
    """Handle file uploads in Google Colab."""
    if not IN_COLAB:
        raise EnvironmentError("File upload is only supported in Google Colab.")
    print("Please upload your PDF files:")
    uploaded = files.upload()
    return uploaded

def upload_files_jupyter():
    """Handle file uploads in Jupyter using ipywidgets."""
    if IN_COLAB:
        raise EnvironmentError("Jupyter upload not applicable in Colab.")
    uploader = widgets.FileUpload(accept='.pdf', multiple=True)
    display(uploader)
    print("Upload PDFs and then enter 'process' to continue.")
    return uploader

def process_uploaded_files(uploaded):
    """Process uploaded files from Colab or Jupyter."""
    texts = []
    if IN_COLAB:
        for filename, file_content in uploaded.items():
            if not filename.lower().endswith('.pdf'):
                print(f"Skipping {filename}: Not a PDF file")
                continue
            text = extract_text_from_pdf_file(file_content, is_uploaded=True)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {filename}")
    else:
        for filename, file_data in uploaded.value.items():
            if not filename.lower().endswith('.pdf'):
                print(f"Skipping {filename}: Not a PDF file")
                continue
            text = extract_text_from_pdf_file(file_data['content'], is_uploaded=True)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {filename}")
    return texts

def extract_text_from_pdfs(source, is_directory=False):
    """Extract text from PDFs (directory or uploaded files)."""
    texts = []

    if is_directory:
        pdf_directory = Path(source)
        if not pdf_directory.exists():
            raise FileNotFoundError(f"Directory {source} does not exist.")

        for pdf_file in pdf_directory.glob("*.pdf"):
            text = extract_text_from_pdf_file(pdf_file, is_uploaded=False)
            if text.strip():
                texts.append(text)
            else:
                print(f"Warning: No text extracted from {pdf_file}")
    else:
        texts = process_uploaded_files(source)

    return texts

def create_vector_store(texts):
    """Create a vector store from extracted texts."""
    if not texts:
        raise ValueError("No valid text extracted from PDFs")

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )

    documents = text_splitter.create_documents(texts)
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    vectorstore = Chroma.from_documents(documents, embeddings, persist_directory=f"/tmp/chroma_{uuid.uuid4()}")

    return vectorstore

def setup_qa_pipeline():
    """Set up the question-answering pipeline."""
    model_name = "distilbert-base-cased-distilled-squad"
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForQuestionAnswering.from_pretrained(model_name)
        qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer, device=-1)
        return qa_pipeline
    except Exception as e:
        print(f"Error setting up QA pipeline: {e}")
        raise

def setup_summarization_pipeline():
    """Set up the summarization pipeline."""
    model_name = "facebook/bart-large-cnn"
    try:
        summarizer = pipeline("summarization", model=model_name, device=-1)
        return summarizer
    except Exception as e:
        print(f"Error setting up summarization pipeline: {e}")
        raise

def generate_summary(texts, summarizer):
    """Generate a summary of the PDF content."""
    # Combine all texts into a single string
    full_text = " ".join(texts)
    if not full_text.strip():
        return "No valid text found in the PDFs to summarize."

    # Split into chunks for summarization (BART has token limits)
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=100
    )
    chunks = text_splitter.split_text(full_text)

    # Summarize each chunk and combine
    summaries = []
    for chunk in chunks:
        try:
            summary = summarizer(chunk, max_length=100, min_length=30, do_sample=False)[0]['summary_text']
            summaries.append(summary)
        except Exception as e:
            print(f"Error summarizing chunk: {e}")
            summaries.append("Failed to summarize this chunk.")

    # Combine summaries into a final summary
    final_summary = " ".join(summaries)
    if len(final_summary) > 1000:
        # Summarize the combined summaries if too long
        try:
            final_summary = summarizer(final_summary, max_length=300, min_length=100, do_sample=False)[0]['summary_text']
        except Exception as e:
            print(f"Error summarizing combined summaries: {e}")
            final_summary = final_summary[:1000] + "... (truncated)"

    return final_summary

def answer_question(question, vectorstore, qa_pipeline):
    """Answer a question using the vector store and QA pipeline."""
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    docs = retriever.get_relevant_documents(question)

    context = " ".join([doc.page_content for doc in docs])
    if not context.strip():
        return "No relevant content found in the PDFs.", []

    try:
        result = qa_pipeline(question=question, context=context)
        answer = result['answer']
    except Exception as e:
        print(f"Error in QA pipeline: {e}")
        answer = "Failed to generate an answer due to pipeline error."

    return answer, docs

def main():
    """Main function to process PDFs, generate a summary, and answer questions."""
    if IN_COLAB:
        print("Choose PDF input method:")
        print("1. Upload PDF files")
        print("2. Load PDFs from Google Drive directory")

        choice = input("Enter 1, 2: ").strip()
    else:
        print("Choose PDF input method:")
        print("1. Upload PDF files (Jupyter only)")
        print("2. Load PDFs from local directory")

        choice = input("Enter 1, 2 ").strip()

    texts = []
    if choice == "1":
        if IN_COLAB:
            uploaded = upload_files_colab()
        else:
            try:
                uploader = upload_files_jupyter()
                user_input = input("Enter 'process' to proceed: ").strip().lower()
                if user_input != 'process':
                    return "Aborted. Please rerun and enter 'process' after uploading.", []
                uploaded = uploader
            except EnvironmentError:
                return "File upload is only supported in Jupyter or Colab environments.", []
        texts = extract_text_from_pdfs(uploaded, is_directory=False)
    elif choice == "2":
        if IN_COLAB:
            if mount_google_drive():
                pdf_directory = input("Enter the Google Drive directory path (e.g., /content/drive/MyDrive/pdfs): ").strip()
                try:
                    texts = extract_text_from_pdfs(pdf_directory, is_directory=True)
                except FileNotFoundError as e:
                    return str(e), []
            else:
                return "Failed to mount Google Drive.", []
        else:
            pdf_directory = input("Enter the local directory path (e.g., ./pdfs): ").strip()
            try:
                texts = extract_text_from_pdfs(pdf_directory, is_directory=True)
            except FileNotFoundError as e:
                return str(e), []
    else:
        return "Invalid choice. Please select 1 or 2.", []

    if not texts:
        return "No valid PDF content found. Please upload PDFs or specify a valid directory.", []

    # Generate summary
    print("\nGenerating summary of content...")
    summarizer = setup_summarization_pipeline()
    summary = generate_summary(texts, summarizer)
    print("Summary:")
    print(summary)

    # Create vector store for QA
    vectorstore = create_vector_store(texts)

    # Setup QA pipeline
    qa_pipeline = setup_qa_pipeline()

    # Chat-like question loop
    print("\nAsk follow-up questions (type 'exit' to stop):")
    while True:
        question = input("You: ").strip()
        if question.lower() == 'exit':
            print("Me: Goodbye!")
            break
        if not question:
            print("Me: Please enter a valid question.")
            continue

        answer, sources = answer_question(question, vectorstore, qa_pipeline)
        print(f"Me: {answer}")
        print("\nSources:")
        for i, doc in enumerate(sources, 1):
            print(f"{i}. {doc.page_content[:200]}...")

if __name__ == "__main__":
    try:
        result = main()
        if isinstance(result, tuple):
            answer, sources = result
            print(f"Error: {answer}")
            print("\nSources:")
            for i, doc in enumerate(sources, 1):
                print(f"{i}. {doc.page_content[:200]}...")
    except Exception as e:
        print(f"Error: {e}")

Choose PDF input method:
1. Upload PDF files
2. Load PDFs from Google Drive directory
Enter 1, 2: 1
Please upload your PDF files:


Saving Cloud Deployment Models U6 SN.pdf to Cloud Deployment Models U6 SN (1).pdf
Saving Virtualization SN U6.2.pdf to Virtualization SN U6.2 (1).pdf

Generating summary of content...


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Your max_length is set to 150, but your input_length is only 50. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=25)
Your max_length is set to 150, but your input_length is only 144. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=72)


Error summarizing combined summaries: index out of range in self
Summary:
Cloud defines how cloud services are implemented and accessed. Cloud provides fou r deployment models. They are: Private Cloud, Public Cloud, Hybrid Cloud and Community Cloud. Software runs on cloud servers or external infrastructure. Data is stored o n the remote data center or service provider. VMWare Private Cloud, OpenStack (Open source), Microsoft Azure Stack, Red Hat OpenShift, IBM Cloud Private. Advantages include enhanced security and privacy. Disadvantages include high costs for setup and maintenance. VMWare Private Cloud is suitable for large enterprise, banks, healthcare, and government organizations. Public Cloud is dedicated to general public (available to public via internet) This model is maintained and operated by 3rd party cloud service providers. Examples are Amazon AWS, Google Cloud, Microsoft Azure, etc. Cost effective (Minimal investment) No capital expenses - pay-per-use pricing. Scalability

Device set to use cpu



Ask follow-up questions (type 'exit' to stop):
You: what is virtualization
Me: Running single OS / dual booting with synchronous

Sources:
1. | Virtualization SN   | 
5 
 VIRTUAL MACHINES (VM) / GUEST OS  
• A software -based simulation of a physical computer  is called as 
virtual machine(s) . Multiple VMs can run parallelly on a single 
p...
2. complex.  
▪ No quick failover options.  
• Physical space requirements  - Data centers need large space to 
accommodate many machines.  
VIRTUALIZATION  
• Process of creating a virtual version of so...
3. reducing  and energy saving techniques . 
Virtualization Machine  
• Virtualization is actually a process. So, machines made using 
virtualization is called as virtual machines or VMs.  
• Process of ...
You: deployment model?
Me: pay per use model

Sources:
1. | Cloud Deployment Mode ls  | 
1 
  
 
CLOUD DEPLOYMENT MODELS  
• This defines how cloud services are implemented and accessed  
• Cloud provides fou r deployment models. They are 

KeyboardInterrupt: Interrupted by user