## SUMMARIZATION LANGCHAIN : 
Summarization is a key component of many tasks involving large language models (LLMs). It enables the distillation of extensive text into a concise and meaningful set of points. Whether dealing with a brief document or an entire book, choosing the right summarization approach depends on the text's length and complexity.

Here are five approaches to summarization, ranging from beginner-friendly methods to more advanced techniques. While these aren't exhaustive, they offer a strong starting point. If you come across a unique approach that works well, consider sharing it with the broader community!

### Differentways of Summarization in Langchain
1. #### Summarizing a Few Sentences – Basic Prompts :
    - Use straightforward prompts to condense short passages into clear summaries. This is ideal for quick insights or simple tasks.
2. #### Summarizing a Few Paragraphs – Prompt Templates :
    - Employ structured prompt templates that guide the LLM to extract key points from multiple paragraphs. This approach is great for slightly larger texts that require thematic organization.
3. #### Summarizing a Few Pages – Map-Reduce :
    - Break the text into chunks and summarize each individually (the "map" step), then combine those summaries into a cohesive output (the "reduce" step). This method excels when summarizing medium-length documents like reports or articles.
4. #### Refine chain summarizer :
    - A chain for refining summaries by processing each document in stages.
    - Similar to map reduce with one key difference : with every chunk, it passes the summary of previous chunk as well to the llm
5. #### Summarizing Text of Unknown Length – Agents :   TBD
    - Leverage intelligent agents that dynamically adjust their approach based on the input's length and complexity. Agents can handle diverse text sizes, making them the most versatile option for summarization.

In [79]:
import os
from dotenv import load_dotenv
load_dotenv()


os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

## Basic Prompt Summarization 
##### used in chat models  - conversational chatbots with system message, human message , ai message

In [80]:
from langchain_openai import ChatOpenAI
from langchain.prompts.chat import (ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

In [13]:
speech="""
People across the country, involved in government, political, and social activities, are dedicating their time to make the ‘Viksit Bharat Sankalp Yatra’ (Developed India Resolution Journey) successful. Therefore, as a Member of Parliament, it was my responsibility to also contribute my time to this program. So, today, I have come here just as a Member of Parliament and your ‘sevak’, ready to participate in this program, much like you.

In our country, governments have come and gone, numerous schemes have been formulated, discussions have taken place, and big promises have been made. However, my experience and observations led me to believe that the most critical aspect that requires attention is ensuring that the government’s plans reach the intended beneficiaries without any hassles. If there is a ‘Pradhan Mantri Awas Yojana’ (Prime Minister’s housing scheme), then those who are living in jhuggis and slums should get their houses. And he should not need to make rounds of the government offices for this purpose. The government should reach him. Since you have assigned this responsibility to me, about four crore families have got their ‘pucca’ houses. However, I have encountered cases where someone is left out of the government benefits. Therefore, I have decided to tour the country again, to listen to people’s experiences with government schemes, to understand whether they received the intended benefits, and to ensure that the programs are reaching everyone as planned without paying any bribes. We will get the real picture if we visit them again. Therefore, this ‘Viksit Bharat Sankalp Yatra’ is, in a way, my own examination. I want to hear from you and the people across the country whether what I envisioned and the work I have been doing aligns with reality and whether it has reached those for whom it was meant.

It is crucial to check whether the work that was supposed to happen has indeed taken place. I recently met some individuals who utilized the Ayushman card to get treatment for serious illnesses. One person met with a severe accident, and after using the card, he could afford the necessary operation, and now he is recovering well. When I asked him, he said: “How could I afford this treatment? Now that there is the Ayushman card, I mustered courage and underwent an operation. Now I am perfectly fine.”  Such stories are blessings to me.

The bureaucrats, who prepare good schemes, expedite the paperwork and even allocate funds, also feel satisfied that 50 or 100 people who were supposed to get the funds have got it. The funds meant for a thousand villages have been released. But their job satisfaction peaks when they hear that their work has directly impacted someone’s life positively. When they see the tangible results of their efforts, their enthusiasm multiplies. They feel satisfied. Therefore, ‘Viksit Bharat Sankalp Yatra’ has had a positive impact on government officers. It has made them more enthusiastic about their work, especially when they witness the tangible benefits reaching the people. Officers now feel satisfied with their work, saying, “I made a good plan, I created a file, and the intended beneficiaries received the benefits.” When they find that the money has reached a poor widow under the Jeevan Jyoti scheme and it was a great help to her during her crisis, they realise that they have done a good job. When a government officer listens to such stories, he feels very satisfied.

There are very few who understand the power and impact of the ‘Viksit Bharat Sankalp Yatra’. When I hear people connected to bureaucratic circles talking about it, expressing their satisfaction, it resonates with me. I’ve heard stories where someone suddenly received 2 lakh rupees after the death of her husband, and a sister mentioned how the arrival of gas in her home transformed her lives. The most significant aspect is when someone says that the line between rich and poor has vanished. While the slogan ‘Garibi Hatao’ (Remove Poverty) is one thing, but the real change happens when a person says, “As soon as the gas stove came to my house, the distinction between poverty and affluence disappeared.
"""

In [14]:
llm=ChatOpenAI(model_name='gpt-3.5-turbo')

# Create the ChatPromptTemplate using individual message templates
chat_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template("You are an expert assistant with expertise in summarizing speeches."),
        HumanMessagePromptTemplate.from_template("Please provide a short and concise summary of the following speech:\nTEXT: {speech}"),
    ]
)

In [15]:
##total tokens
llm.get_num_tokens(speech)


866

In [16]:
## Get summary
formatted_prompt = chat_prompt.format_prompt(speech=speech)

# Get the response
response = llm(messages=formatted_prompt.to_messages())
print(response.content)

  response = llm(messages=formatted_prompt.to_messages())


The speech highlights the importance of ensuring government schemes reach their intended beneficiaries smoothly without any hassles. The speaker, a Member of Parliament, discusses the success of the 'Viksit Bharat Sankalp Yatra' in reaching people and providing tangible benefits. The yatra has positively impacted government officers, making them more enthusiastic about their work when they see the direct impact on people's lives. The speaker emphasizes the satisfaction derived from witnessing successful implementation of schemes, such as the Ayushman card and housing programs, and how it bridges the gap between rich and poor. The yatra serves as an examination for the speaker to ensure alignment between their vision and the reality of reaching those in need.


## Prompt template text summarization
Prompt templates offer a flexible way to incorporate text dynamically into your prompts. Similar to Python's f-strings, they are specifically designed to facilitate interaction with language models.

In [17]:
from langchain.chains import LLMChain
from langchain import PromptTemplate

In [18]:
generic_template='''
Write a summary of the following speech:
Speech : `{speech}`
Translate the precise summary to {language}.

'''
prompt=PromptTemplate(
    input_variables=['speech','language'],
    template=generic_template
)

In [19]:
complete_prompt=prompt.format(speech=speech,language='Hindi')
complete_prompt

'\nWrite a summary of the following speech:\nSpeech : `\nPeople across the country, involved in government, political, and social activities, are dedicating their time to make the ‘Viksit Bharat Sankalp Yatra’ (Developed India Resolution Journey) successful. Therefore, as a Member of Parliament, it was my responsibility to also contribute my time to this program. So, today, I have come here just as a Member of Parliament and your ‘sevak’, ready to participate in this program, much like you.\n\nIn our country, governments have come and gone, numerous schemes have been formulated, discussions have taken place, and big promises have been made. However, my experience and observations led me to believe that the most critical aspect that requires attention is ensuring that the government’s plans reach the intended beneficiaries without any hassles. If there is a ‘Pradhan Mantri Awas Yojana’ (Prime Minister’s housing scheme), then those who are living in jhuggis and slums should get their hou

In [20]:
llm_chain=LLMChain(llm=llm,prompt=prompt)
summary=llm_chain.invoke({'speech':speech,'language':'hindi'})

  llm_chain=LLMChain(llm=llm,prompt=prompt)


In [21]:
summary['text']

"भारत विकसित संकल्प यात्रा को सफल बनाने के लिए लोग राष्ट्रभर में सरकार, राजनीतिक और सामाजिक गतिविधियों में लगे हुए हैं। संसद के सदस्य के नाते, मेरी जिम्मेदारी भी इस कार्यक्रम में समय योगदान करना था। इसलिए, आज मैं एक सांसद और आपका 'सेवक' के रूप में यहाँ आया हूं, इस कार्यक्रम में आपके साथ हूं। मेरे अनुभव और अवलोकन से मुझे लगता है कि सरकार की योजनाएं सही लोगों तक बिना किसी परेशानी के पहुंचना सबसे महत्वपूर्ण है। मैंने देखा है कि चार करोड़ परिवारों को उनके 'पुक्का' मकान मिल गए हैं। लेकिन कुछ लोग सरकारी लाभों से बाहर रह गए हैं। इसलिए, मैंने देश भर में फिर से यात्रा करने का निर्णय लिया है, लोगों के साथ सरकारी योजनाओं के अनुभव सुनने के लिए, समझने के लिए कि क्या उन्हें इच्छित लाभ मिला है और यह सुनिश्चित करने के लिए कि कार्यक्रम किसी भी रिश्वत के बिना सभी तक पहुँच रहा है।\n\nजब आपको लोगों के अनुभव सुनने को मिलता है, तो आपको किसी नए उत्साह की जरुरत नहीं पड़ती। अधिकारी अच्छी योजनाएं तैयार करते हैं, कागजात तेजी से प्रस्तुत करते हैं और धन का आवंटन भी करते हैं। मगर जब उन्हें यह पता चलता है कि उनका का

## Stuff Document Chain Summarization

When the document size is small enough to fit into context window of llm, we can use stuff document chain , that stuffs or passes the entire document into llm along with the prompt

In [22]:
from PyPDF2 import PdfReader
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [23]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('apjspeech.pdf')

In [24]:
from typing_extensions import Concatenate
# read text from pdf
text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [26]:
template = '''Write a concise and short summary of the following speech.
Speech: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [31]:
docs = [Document(page_content=text)]
docs

[Document(metadata={}, page_content=' Cisco Conﬁden+al Distinguished Guests Ladies and Gentlemen. It is both an honour and a pleasure for me to be in your midst in the historic capital city of Sofia. Bulgaria is an ancient land, with a long history and a glorious cultural heritage. We know of its literary, philosophical and cultural tradition. The statue of Sofia, in the Centre of this beautiful city, holding the symbols of fame and wisdom in her hands and wearing the crown of the Goddess of fate truly characterizes the aspirations of all mankind. India and Bulgaria have a long-standing friendship. Both have ancient civilizations and our histories have involved contacts with numerous other societies. We have been able to benefit from this by assimilating in our cultures various facets of other civilizations. This has enabled the development of India\'s and Bulgaria\'s multi-culturalism drawn from different religious, ethnic and linguistic groups. This is an abiding gift from our previo

In [32]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.invoke(docs)
output_summary

{'input_documents': [Document(metadata={}, page_content=' Cisco Conﬁden+al Distinguished Guests Ladies and Gentlemen. It is both an honour and a pleasure for me to be in your midst in the historic capital city of Sofia. Bulgaria is an ancient land, with a long history and a glorious cultural heritage. We know of its literary, philosophical and cultural tradition. The statue of Sofia, in the Centre of this beautiful city, holding the symbols of fame and wisdom in her hands and wearing the crown of the Goddess of fate truly characterizes the aspirations of all mankind. India and Bulgaria have a long-standing friendship. Both have ancient civilizations and our histories have involved contacts with numerous other societies. We have been able to benefit from this by assimilating in our cultures various facets of other civilizations. This has enabled the development of India\'s and Bulgaria\'s multi-culturalism drawn from different religious, ethnic and linguistic groups. This is an abiding 

In [35]:
output_summary['output_text']

'The speech highlights the long-standing friendship between India and Bulgaria, emphasizing the need for enhanced cooperation in various sectors such as science, technology, trade, and economics. The speaker also addresses the importance of democratic systems, economic reforms, and the fight against terrorism on a global scale. Cultural ties between the two countries are acknowledged, with a focus on mutual exchange and collaboration in areas such as literature, music, and sports. The speech concludes by expressing optimism for the future of the Indo-Bulgarian partnership and gratitude for the warm hospitality extended by Bulgaria.'

## Map reduce document chain

#### Summarizing Large Documents Using Map Reduce -  that do not fit in the context window of llm , hence stuff document chain is not used

When working with multiple pages of text for summarization, you might encounter token limit constraints. While token limits aren't always an issue, it's important to understand how to manage them effectively when they arise.

One useful approach is the "Map Reduce" chain type. This method involves dividing the text into smaller, manageable chunks that stay within the token limit. Each chunk is summarized individually, and then a final summary is created by combining these smaller summaries. This ensures efficient processing without exceeding token limits.

In [37]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter  ## to split the large document into chunks


In [38]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('apjSpeech.pdf')
from typing_extensions import Concatenate
# read text from pdf
text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [40]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [43]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=20)
chunks = text_splitter.create_documents([text])
len(chunks)

2

In [46]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
summary = chain.invoke(chunks)


In [48]:
summary['output_text']


'The speaker expresses gratitude for being in Sofia, Bulgaria and emphasizes the long-standing friendship between India and Bulgaria. They discuss the importance of cooperation in politics, trade, science, and technology, as well as the need for joint projects and economic reforms to enhance bilateral relations. The speaker also highlights the importance of education and human capital in strengthening cooperation between the two countries. They stress the importance of global cooperation in addressing poverty, unemployment, and disease, and the need for an equitable world order in international affairs. India and Bulgaria share common concerns about terrorism and have a history of cultural exchange, with both countries looking to expand cooperation in areas such as pharmaceuticals, food processing, and power. The relationship between the two countries is strong and based on mutual respect and friendship.'

This summary is a good starting point, but I personally prefer information presented in bullet points. To achieve this, I’ll use tailored prompts (similar to the ones used earlier) to guide the model to generate the output in the desired format.

The map_prompt will remain unchanged (I’m including it here for reference), but I’ll modify the combine_prompt to align with my preference.









## Map Reduce With Custom Prompts
- having separate prompt for individual chunks
- a final prompt for all the outputs of chunks

In [49]:
chunks_prompt="""
Please summarize the below speech:
Speech:`{text}'
Summary:
"""
map_prompt_template=PromptTemplate(input_variables=['text'],
                                    template=chunks_prompt)

In [50]:
final_combine_prompt='''
Provide a final summary of the entire speech with these important points.
Add a Generic Motivational Title,
Start the precise summary with an introduction and provide the
summary in number points for the speech.
Speech: `{text}`
'''
final_combine_prompt_template=PromptTemplate(input_variables=['text'],
                                             template=final_combine_prompt)

In [51]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=final_combine_prompt_template,
    verbose=False
)
output = summary_chain.run(chunks)

In [66]:
print(output)

Title: "Building Bridges: Strengthening India-Bulgaria Relations"

Introduction:
In a speech highlighting the longstanding friendship between India and Bulgaria, the speaker emphasizes the shared cultural heritage and history of cooperation between the two nations. They discuss the importance of identifying core competencies and collaborating on joint projects in various fields, as well as the economic reforms and successes of both countries.

Summary:
1. Emphasis on the importance of education and human capital in strengthening cooperation.
2. Call for increased trade and business between India and Bulgaria.
3. Stress on global cooperation to address poverty, unemployment, and disease.
4. Advocacy for an equitable world order in international affairs.
5. Focus on combating terrorism and promoting peace and security through international collaboration.
6. Highlighting close cultural ties between India and Bulgaria, including mutual influences in literature and sports.
7. Suggestions fo

## Refine chain
- similar to map reduce with one difference
- chains the chunk with ouput of previous chunk before passing to llm
- eg. chunk2_new = chunk2 + output of chunk1

In [53]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.run(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Cisco Conﬁden+al Distinguished Guests Ladies and Gentlemen. It is both an honour and a pleasure for me to be in your midst in the historic capital city of Sofia. Bulgaria is an ancient land, with a long history and a glorious cultural heritage. We know of its literary, philosophical and cultural tradition. The statue of Sofia, in the Centre of this beautiful city, holding the symbols of fame and wisdom in her hands and wearing the crown of the Goddess of fate truly characterizes the aspirations of all mankind. India and Bulgaria have a long-standing friendship. Both have ancient civilizations and our histories have involved contacts with numerous other societies. We have been able to benefit from this by assimilating in our cultures various facets of other civilizations. This has enabled the development of I

In [54]:
output_summary

'The speaker expresses gratitude for being in Sofia, Bulgaria and highlights the long-standing friendship between India and Bulgaria. They discuss the importance of strengthening bilateral cooperation in various areas such as science, technology, and trade, as well as the shared values of democracy and economic reforms in both countries. The speaker expresses optimism for the future of the Indo-Bulgarian partnership and the potential for increased economic and commercial ties. Additionally, the speaker stresses the importance of addressing global issues such as poverty, unemployment, and disease through international cooperation. They also emphasize the need for concerted action to combat international terrorism, which poses a serious threat to global peace and security. The cultural ties between the two countries are highlighted, with mutual influences in literature, music, and sports. The speaker looks forward to further collaboration in areas such as pharmaceuticals, food processing