# Project: Summarizing with Langchain and  OpenAI

#### Summarizing using a basic prompt

In [1]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

### A) Basic prompts

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import AIMessage, SystemMessage, HumanMessage

In [3]:
text ="""
Mojo combines the usability of Python with the performance of C, unlocking unparalleled 
programmability of AI hardware and extensibility of AI models.
Mojo allows you to write Python or scale all the way down to the metal. 
You can program the multitude of low-level AI hardware and no C++ or CUDA is required.

Mojo was designed by the creator of the Swift programming language to bridge the gap 
between research and production by combining Python syntax and ecosystem with systems 
programming and metaprogramming features. Mojo was just released, but the goal is for 
it to become a superset of Python over time.
"""
messages = [
SystemMessage(content='You are an expert copyrwriter with expertise in summarizing documents'),
HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text}') 
]

In [4]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [5]:
llm.get_num_tokens(text)

128

In [6]:
summary_output = llm(messages)

In [7]:
print(summary_output.content)

Mojo is a programming language that combines the usability of Python with the performance of C. It allows for easy programming of AI hardware and models, without the need for C++ or CUDA. Designed by the creator of Swift, Mojo aims to bridge the gap between research and production by combining Python syntax with systems programming features. The goal is for Mojo to eventually become a superset of Python.


### B) Summarizing using prompt Templates

In [8]:
from langchain import PromptTemplate
from langchain.chains import LLMChain
template='''
Write a  concise and short summary of the following text:
TEXT: '{text}'
Translate the ummary to {language}.
'''

prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [9]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

151

In [10]:
chain = LLMChain(llm=llm,prompt=prompt)
summary = chain.run({'text': text, 'language':'germany'})

In [11]:
print(summary)

Mojo kombiniert die Benutzerfreundlichkeit von Python mit der Leistung von C und ermöglicht eine beispiellose Programmierbarkeit von KI-Hardware und Erweiterbarkeit von KI-Modellen. Mit Mojo können Sie in Python schreiben oder bis auf die Hardwareebene skalieren. Sie können die Vielzahl von Low-Level-KI-Hardware programmieren, ohne dass C++ oder CUDA erforderlich sind. Mojo wurde vom Schöpfer der Swift-Programmiersprache entwickelt, um die Lücke zwischen Forschung und Produktion zu schließen, indem Python-Syntax und -Ökosystem mit Systemprogrammierung und Metaprogrammierungsmerkmalen kombiniert werden. Mojo wurde gerade veröffentlicht, aber das Ziel ist es, im Laufe der Zeit zu einer Python-Superset zu werden.


### Summarizing using stuffDocumentsChain

In [12]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [13]:
with open ('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

#text

In [14]:
docs = [Document(page_content=text)]
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [15]:
template='''
Write a  concise and short summary of the following text:
TEXT: '{text}'
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [16]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)

In [17]:
output_summary = chain.run(docs)

In [18]:
print(output_summary)

The text is a speech given by Steve Jobs at a commencement ceremony. He shares three stories from his life, emphasizing the importance of connecting the dots, following one's passion, and embracing the reality of death. He encourages the graduates to trust in their own intuition, not be limited by others' opinions, and to stay hungry and foolish in their pursuit of success.


### Summarizing Large Documents Using map_reduce


In [19]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [20]:
with open ('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

#text

In [21]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [22]:
llm.get_num_tokens(text)

2653

In [23]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter .create_documents([text])

In [24]:
len(chunks)

2

In [25]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)

In [26]:
output_summary = chain.run(docs)

In [27]:
print(output_summary)

Steve Jobs gives a speech to a graduating class, sharing three stories from his life. He talks about how dropping out of college and taking a calligraphy class influenced the design of the Macintosh computer. He reflects on being fired from Apple and starting new ventures. He also discusses his experience with cancer and emphasizes the importance of living each day fully. Jobs encourages the graduates to follow their passions, trust their intuition, and stay hungry and foolish in their pursuits.


In [28]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [29]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### map_reduce with Custom Prompts

In [30]:
map_prompt='''
Write a short and concise summary of the following:
Text: {text}
CONCISE SUMMARY
'''

map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [31]:
combine_prompt= '''
Write a concise summary of the following text that covers key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPGH that gives an overview of the topic FOLLOWED 
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: '{text}'
'''
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=['text'])

In [32]:
summary_chain = load_summarize_chain(
    llm = llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.run(chunks)

In [33]:
print(output)

Title: Steve Jobs' Speech on Life Lessons and Following Your Passion

Introduction:
In this speech, Steve Jobs shares three stories from his life, highlighting the importance of following one's passion and living life to the fullest. He discusses how dropping out of college and getting fired from Apple ultimately led him to success, and how his experience with cancer reminded him of the brevity of life.

Key Points:
- Dropping out of college led Jobs to take a calligraphy class, which later influenced the design of the Macintosh computer.
- Getting fired from Apple opened doors for Jobs to start new ventures and find success.
- Jobs emphasizes the importance of following one's passion, not settling, and living each day to the fullest.
- The text discusses the inevitability of death and encourages individuals to live their own lives, not influenced by others' opinions.
- Jobs shares a personal anecdote about The Whole Earth Catalog and the message of "Stay Hungry. Stay Foolish."
- The s

### Summary Using Refine chain

In [34]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [35]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [36]:
# !pip install unstructured -q
# !pip install pdf2image -q
# !pip install pikepdf -q
# !pip install pdfminer -q
# !pip install opencv-python
# !pip install "unstructured[all-docs]"

In [37]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\TWARIK\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [38]:
loader = UnstructuredPDFLoader('files/dvlottery.pdf')
data = loader.load()

  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\TWARIK\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!

**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt/english.pickle[0m

  Searched in:
    - 'C:\\Users\\TWARIK/nltk_data'
    - 'C:\\Users\\TWARIK\\Desktop\\LangChain2\\myenv\\nltk_data'
    - 'C:\\Users\\TWARIK\\Desktop\\LangChain2\\myenv\\share\\nltk_data'
    - 'C:\\Users\\TWARIK\\Desktop\\LangChain2\\myenv\\lib\\nltk_data'
    - 'C:\\Users\\TWARIK\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
*************************************************************

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

NameError: name 'data' is not defined

In [6]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [7]:
chain = load_summarize_chain(
    llm = llm,
    chain_type='refine',
    verbose=False
)
output_summary = chain.run(chunks)

NameError: name 'chunks' is not defined

In [None]:
print(output_summary)

### Summary Using refine with Custom Prompts

from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
with open ('files/sj.txt', encoding='utf-8') as f:
    text = f.read()
#text

In [39]:
prompt_template = """Write a concise summary of the following extracting the key information:
Text: {text}
CONCISE SUMMARY:"""

initial_prompt = PromptTemplate(template=prompt_template, input_variables=['text'])

refine_template = '''
Your job is to produce a final summary.
I have provided an existing summary up to a certain point: {existing_answer}.
PLease refine the existing summary with some more content below.
----------
{text}
----------
Start the final Summary with an INTRODUCTION PARAGRAPGH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.

'''
refine_prompt = PromptTemplate(
    template=refine_template, 
    input_variables=['existing_answer', 'text']
)

In [40]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
)

In [41]:
output_summary = chain.run(chunks)

In [42]:
print(output_summary)

Introduction:
In his commencement speech, the speaker shares three personal stories that highlight the importance of following one's passion, embracing failure, and recognizing the brevity of life. These stories include dropping out of college and taking a calligraphy class that later influenced the design of the Macintosh computer, getting fired from Apple and finding success through starting new companies, and facing a life-threatening illness and surviving after surgery.

Key Points:
- The speaker emphasizes the significance of following one's passion and not wasting time living someone else's life.
- He encourages the audience to have the courage to listen to their own inner voice and intuition.
- The speaker shares the story of The Whole Earth Catalog, a publication that inspired him in his youth and taught him the value of staying hungry and foolish.
- The importance of embracing failure and seeing it as an opportunity for growth and success is highlighted through the speaker's e

### Summarizing using langchain Agents

In [43]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [44]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [45]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
wikipedia = WikipediaAPIWrapper()

In [46]:
tools = [
    Tool(
        name='wikipedia',
        func=wikipedia.run,
        description='Useful for when you need to get infromation from wikipeia about a topic')
]

In [47]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

In [49]:
output = agent_executor.run('Can you please provide a short summary of George Washington?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use Wikipedia to find a short summary of George Washington.
Action: Wikipedia
Action Input: George Washington[0m
Observation: Wikipedia is not a valid tool, try one of [wikipedia].
Thought:[32;1m[1;3mI should try using Wikipedia instead.
Action: Wikipedia
Action Input: George Washington[0m
Observation: Wikipedia is not a valid tool, try one of [wikipedia].
Thought:[32;1m[1;3mI should try using Wikipedia instead.
Action: Wikipedia
Action Input: George Washington[0m
Observation: Wikipedia is not a valid tool, try one of [wikipedia].
Thought:

RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-O1VDHELu8wvPIxzLU1OlHLLA on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

In [None]:
print(output)