In [1]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

# Project: Summarization

## A) Basic Prompt

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [3]:
text = """
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.
In functional programming, functions are treated as first-class citizens, meaning that they can be bound to names (including local identifiers), passed as arguments, and returned from other functions, just as any other data type can. This allows programs to be written in a declarative and composable style, where small functions are combined in a modular manner.
Functional programming is sometimes treated as synonymous with purely functional programming, a subset of functional programming which treats all functions as deterministic mathematical functions, or pure functions. When a pure function is called with some given arguments, it will always return the same result, and cannot be affected by any mutable state or other side effects. This is in contrast with impure procedures, common in imperative programming, which can have side effects (such as modifying the program's state or taking input from a user). Proponents of purely functional programming claim that by restricting side effects, programs can have fewer bugs, be easier to debug and test, and be more suited to formal verification.[1][2]
Functional programming has its roots in academia, evolving from the lambda calculus, a formal system of computation based only on functions. Functional programming has historically been less popular than imperative programming, but many functional languages are seeing use today in industry and education, including Common Lisp, Scheme,[3][4][5][6] Clojure, Wolfram Language,[7][8] Racket,[9] Erlang,[10][11][12] Elixir,[13] OCaml,[14][15] Haskell,[16][17] and F#.[18][19] Functional programming is also key to some languages that have found success in specific domains, like JavaScript in the Web,[20] R in statistics,[21][22] J, K and Q in financial analysis, and XQuery/XSLT for XML.[23][24] Domain-specific declarative languages like SQL and Lex/Yacc use some elements of functional programming, such as not allowing mutable values.[25] In addition, many other programming languages support programming in a functional style or have implemented features from functional programming, such as C++11, C#,[26] Kotlin,[27] Perl,[28] PHP,[29] Python,[30] Go,[31] Rust,[32] Raku,[33] Scala,[34] and Java (since Java 8).[35]
"""

messages = [
    SystemMessage(content="You are an expert copywriter with expertise in summarizing documents"),
    HumanMessage(content=f"Please provide a short and concise summary of the following text: \n TEXT: {text}")
]

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [4]:
llm.get_num_tokens(text)

530

In [5]:
summary_output = llm(messages)

In [6]:
print(summary_output.content)

Functional programming is a programming paradigm that focuses on constructing programs using functions. It is a declarative style of programming where functions are treated as first-class citizens, meaning they can be assigned to names, passed as arguments, and returned from other functions. Purely functional programming is a subset of functional programming that treats all functions as deterministic mathematical functions, without any side effects. Functional programming has its roots in academia and has historically been less popular than imperative programming, but many functional languages are now used in industry and education. It is also key to languages used in specific domains, such as JavaScript in web development and R in statistics. Many other programming languages also support functional programming or have implemented features from it.


## Summarizing Using Prompt Templates

In [7]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [8]:
template = """
Write a concise and short summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
"""

prompt = PromptTemplate(
    input_variables=["text", "language"],
    template=template
)

In [9]:
llm.get_num_tokens(prompt.format(text=text, language="English"))

551

In [10]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.run({"text": text, "language": "portuguese"})
print(summary)

A programação funcional é um paradigma de programação em que os programas são construídos aplicando e compondo funções. Nesse paradigma, as funções são tratadas como cidadãos de primeira classe, o que permite que sejam atribuídas a nomes, passadas como argumentos e retornadas de outras funções. A programação funcional é sinônimo de programação funcional pura, que trata todas as funções como funções matemáticas determinísticas e puras. Isso significa que uma função pura sempre retornará o mesmo resultado quando chamada com os mesmos argumentos e não será afetada por nenhum estado mutável ou efeito colateral. A programação funcional tem suas raízes na academia e tem sido cada vez mais utilizada na indústria e na educação. Muitas linguagens funcionais são usadas atualmente, como Common Lisp, Scheme, Clojure, Wolfram Language, Racket, Erlang, Elixir, OCaml, Haskell e F#. Além disso, muitas outras linguagens de programação suportam a programação em estilo funcional ou implementaram recursos

## Summarizing using StuffDocumentChain

In [11]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [17]:
def chunk_data(data, chunk_size=256):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0)
    chunks = text_splitter.split_documents(data)
    return chunks

with open("../files/mysticism_and_logic.txt") as f:
    text = f.read()
        
docs = chunk_data([Document(page_content=text)], chunk_size=512)[:3]
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [18]:
type(docs[0])

langchain.schema.document.Document

In [19]:
prompt_template = '''
Write a concise and short summary of the following text.
TEXT: `{text}`
'''

prompt = PromptTemplate(
    input_variables=["text"],
    template=prompt_template
)

In [20]:
chain = load_summarize_chain(
    llm,
    chain_type="stuff",
    prompt = prompt,
    verbose=True
)

output_summary = chain.run({"input_documents": docs})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise and short summary of the following text.
TEXT: `Mysticism and Logic

Metaphysics, or the attempt to conceive the world as a whole by means of thought, has been developed, from the first, by the union and conflict of two very different human impulses, the one urging men towards mysticism, the other urging them towards science. Some men have achieved greatness through one of these impulses alone, others through the other alone: in Hume, for example, the scientific impulse reigns quite unchecked, while in Blake a strong hostility to science co-exists with profound mystic

insight. But the greatest men who have been philosophers have felt the need both of science and of mysticism: the attempt to harmonise the two was what made their life, and what always must, for all its arduous uncertainty, make philosophy, to some minds, a greater thing 

## Summarizing Large Documents Using map_reduce

In [21]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [31]:
def chunk_data(data, chunk_size=256, chunk_overlap=0):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    chunks = text_splitter.split_documents(data)
    return chunks

with open("../files/mysticism_and_logic.txt") as f:
    text = f.read()
        
docs = chunk_data([Document(page_content=text)], chunk_size=10000, chunk_overlap=50)
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [32]:
print(llm.get_num_tokens(text))
print(len(docs))

11438
6


In [33]:
chain = load_summarize_chain(
    llm,
    chain_type="map_reduce",
    verbose=False
)
output_summary = chain.run(docs)

In [34]:
print(output_summary)

The essay explores the blending of mysticism and science in the philosophies of Heraclitus and Plato, arguing that the harmonization of these two impulses makes philosophy a greater pursuit than either science or religion. It discusses the relationship between philosophy and science in Plato's teachings, critiques Bergson's advocacy of intuition over intellect, and argues against the practical importance of philosophy for animals and most people. The author criticizes the philosophy of evolution for its focus on progress and ethical dualism, and suggests that ethical considerations should be eliminated from philosophy. The essay concludes by advocating for a scientific philosophy that is humble, piecemeal, and capable of accepting the world without imposing human demands.


In [36]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [38]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

## map_Reduce with custom prompts

In [39]:
map_prompt = """
Write a short and concise summary of the following:
Text: `{text}`
CONCISE SUMMARY: 
"""

map_prompt_template = PromptTemplate(
    input_variables=["text"],
    template=map_prompt
)

In [40]:
combine_prompt = """
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
""" 

combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

In [41]:
summary_chain = load_summarize_chain(
    llm = llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=True
)

output = summary_chain.run(docs)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a short and concise summary of the following:
Text: `Mysticism and Logic
Metaphysics, or the attempt to conceive the world as a whole by means of thought, has been developed, from the first, by the union and conflict of two very different human impulses, the one urging men towards mysticism, the other urging them towards science. Some men have achieved greatness through one of these impulses alone, others through the other alone: in Hume, for example, the scientific impulse reigns quite unchecked, while in Blake a strong hostility to science co-exists with profound mystic insight. But the greatest men who have been philosophers have felt the need both of science and of mysticism: the attempt to harmonise the two was what made their life, and what always must, for all its arduous uncertainty, make philosophy, to some minds, a greater thing tha

In [42]:
print(output)

Title: The Relationship Between Mysticism and Science in Philosophy

Introduction:
The text explores the relationship between mysticism and science in the field of philosophy, highlighting the importance of both in understanding the world. It discusses the teachings of Plato and Heraclitus as examples of philosophers who recognized the significance of blending scientific observation with mystical insight.

Key Points:
- Plato's allegory of the cave illustrates his belief in a higher knowledge beyond the senses, emphasizing the role of mysticism in his philosophy.
- The combination of mysticism and science is considered the highest achievement in philosophy.
- The author acknowledges the wisdom to be gained from the mystical way of feeling, suggesting that mysticism can inspire and enhance scientific inquiry.
- Reason is seen as a controlling force that tests and confirms or refutes beliefs, while intuition is considered a form of instinct reliable only in familiar situations.
- The tex

## Summarizing using the refine chain

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [4]:
loader = UnstructuredPDFLoader("../files/Albert Camus - The Myth of Sisyphus_ And Other Essays (1991, Vintage) - libgen.li.pdf")
data = loader.load()

[nltk_data] Downloading package punkt to /home/yabreu/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/yabreu/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [6]:
# print(data[0].page_content)

In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [8]:
len(chunks)

34

In [9]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [10]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f"Total Tokens: {total_tokens}")
    print(f"Embedding Cost in USD: {total_tokens / 1000 * 0.002:.6f}")
    
print_embedding_cost(chunks)

Total Tokens: 69458
Embedding Cost in USD: 0.138916


In [11]:
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    verbose=False
)
output_summary = chain.run(chunks)

In [13]:
print(output_summary)

"The Myth of Sisyphus and Other Essays" by Albert Camus explores the concept of the absurd and its relationship to suicide. Camus argues that the question of whether life is worth living is the most urgent philosophical problem. He discusses the causes of suicide and the feeling of absurdity that can lead to it. Camus suggests that living in a world devoid of illusions and meaning can make one feel like an alien, leading to a longing for death. He explores the connection between the absurd and suicide and questions whether embracing the absurdity of existence should lead to the choice of suicide. The text also delves into the contradictions and obscurities surrounding the relationship between one's opinion about life and the act of suicide. Camus emphasizes the importance of examining the absurdity of existence and the role of hope in escaping it. The book explores the concept of absurdity in various aspects of life, including intelligence, the art of living, and art itself. The additi

## refine with Custom Prompts

In [15]:
prompt_template = """
Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:
"""
initial_prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

refine_template = """
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    -------------
    {text}
    -------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an orverview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
"""
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=["existing_answer", "text"]
    )

chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
)
output_summary = chain.run(chunks)

In [16]:
print(output_summary)

"The Myth of Sisyphus and Other Essays" by Albert Camus is a profound exploration of the human condition and the search for meaning in a seemingly meaningless world. Camus challenges traditional notions of success and victory, urging individuals to confront the absurdity of existence and find purpose in the face of mortality. The book delves into the relationship between philosophy and fiction, highlighting the role of creation and art in the face of the absurd. Camus emphasizes the importance of living authentically and embracing the constant tension between the finite and the infinite. The essays in the collection provide a comprehensive exploration of the human experience and the search for meaning, challenging readers to confront the absurdity of existence and find solace and meaning in the act of creation. Ultimately, Camus encourages individuals to embrace the absurd and live fully in the face of the unknown, finding purpose and solace in the act of creating. The book offers a th

## Summarizing using LangChain Agents

In [17]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [18]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [19]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
wikipedia = WikipediaAPIWrapper()

In [20]:
tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a topic"
    )
]

In [21]:
agent_executor = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

In [27]:
output = agent_executor.run("Can you please provide a short summary of what is alchemy?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should look up the definition of alchemy to provide an accurate summary.
Action: Wikipedia
Action Input: Alchemy[0m
Observation: [36;1m[1;3mPage: Alchemy
Summary: Alchemy (from Arabic: al-kīmiyā; from Ancient Greek: χυμεία, khumeía) is an ancient branch of natural philosophy, a philosophical and protoscientific tradition that was historically practiced in China, India, the Muslim world, and Europe. In its Western form, alchemy is first attested in a number of pseudepigraphical texts written in Greco-Roman Egypt during the first few centuries AD.Alchemists attempted to purify, mature, and perfect certain materials. Common aims were chrysopoeia, the transmutation of "base metals" (e.g., lead) into "noble metals" (particularly gold); the creation of an elixir of immortality; and the creation of panaceas able to cure any disease. The perfection of the human body and soul was thought to result from the alchemical magnum opus 

In [28]:
print(output)

Alchemy is an ancient branch of natural philosophy and a philosophical and protoscientific tradition that was historically practiced in China, India, the Muslim world, and Europe. Alchemists attempted to purify, mature, and perfect certain materials, with common aims including transmuting base metals into noble metals, creating an elixir of immortality, and developing panaceas. Alchemy also involved the concept of creating the philosophers' stone. Islamic and European alchemists developed laboratory techniques, theories, and terms, and the tradition of alchemy played a significant role in the development of early modern science. Modern discussions of alchemy examine its practical applications and esoteric spiritual aspects.
