## Document Chains Demo
Document Chains allow you to process and analyze large amounts of text data efficiently. They provide a structured approach to working with documents, enabling you to retrieve, filter, refine, and rank them based on specific criteria.

By using different types of Document Chains like **Stuff, Refine, Map Reduce, or Map Re-rank**, you can perform specific operations on the retrieved documents and obtain more accurate and relevant results.

In [None]:
import os
import getpass
import textwrap

from langchain_google_genai import GoogleGenerativeAI
from langchain import PromptTemplate, LLMChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
# We will cover docstores and splitters in more details when we get to retrieval
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

from dotenv import load_dotenv

In [None]:
load_dotenv()

In [3]:
model = GoogleGenerativeAI(
  model="gemini-1.5-pro-latest",
  temperature=0.5,
  google_api_key=os.getenv("GOOGLE_API_KEY"), 
)

### Stuff Chain
This involves putting all relevant data into the Prompt for LangChain's StuffDocumentsChain to process. The advantage of this method is that it only requires one call to the LLM, and the model has access to all the information at once.

In [4]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("../pdf/CV (1).pdf")
docs = loader.load()

In [None]:
cnt = 0
for doc in docs:
  cnt = cnt + 1
  print("---- Document #", cnt)
  print(doc.page_content.strip())

In [6]:
prompt_template ="""
You are given a Resume as the below text. 
-----
{text}
-----
Question: Please respond with the Key Skills and Experience summary of the person. 
Key Skills:
Experience Summary: 
"""

In [None]:
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

stuff_chain = load_summarize_chain(model, chain_type="stuff", prompt=prompt)
print(stuff_chain.llm_chain.prompt.template)

output_summary = stuff_chain.run(docs)

In [None]:
print(output_summary)

### Refine Chain

The Refine Documents Chain uses an iterative process to generate a response by analyzing each input document and updating its answer accordingly.

It passes all non-documents inputs, the current document, and the latest intermediate answer to an LLM chain to obtain a new answer for each document.

This chain is ideal for tasks that involve analyzing more document that can fit in the model's context, as it only passes a single document to the LLM at a time.

In [None]:
refine_chain = load_summarize_chain(model, chain_type="refine")
print(refine_chain.refine_llm_chain.prompt.template)

In [None]:
output_summary = refine_chain.run(docs)
output_summary

### Map-Reduce Chain

To process large amounts of data efficiently, the MapReduceDocumentsChain method is used.

This involves applying an LLM chain to each document individually (in the Map step), producing a new document. Then, all the new documents can be compressed before passing them to the combine documents chain.

This compression step is performed recursively.

In [12]:
map_reduce_chain = load_summarize_chain(model, chain_type="map_reduce", verbose=True)

In [None]:
print(map_reduce_chain.llm_chain.prompt.template)

In [None]:
# Just using the first 20 chunks as I don't want to run too long.
output_summary = map_reduce_chain.run(docs)
print(output_summary)