In [5]:
from dotenv import load_dotenv
load_dotenv(dotenv_path='../')

False

### Summarize a document

In [21]:
input_doc = "https://2os.com/insights/recoveries-strategy-refresh/"

In [7]:
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain

loader = WebBaseLoader(input_doc)
docs = loader.load()

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
chain = load_summarize_chain(llm, chain_type="stuff")

chain.run(docs)

'This white paper discusses the importance of optimizing recoveries (collecting on charged-off loans) for financial institutions during economic downturns. It highlights the need for a robust inventory management strategy and the use of various component operations such as internal staffing, 3rd party agencies, legal firms, and debt buyers. The paper also emphasizes the need to measure recoveries performance based on revenues rather than expenses and recommends strategies such as digital-first recoveries, next-generation models, and collaboration with debt settlement companies. The authors estimate that implementing these strategies can result in a 15-30% increase in recoveries revenue during a downturn. The paper concludes by urging financial institutions to take action and refresh their recoveries strategies to unlock significant value.'

Alternatively, we can use `StuffDocumentsChain`, which should return almost identical response.

In [8]:
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(
    llm_chain=llm_chain, document_variable_name="text"
)

docs = loader.load()
print(stuff_chain.run(docs))

This white paper discusses the importance of optimizing recoveries (collecting on charged-off loans) for financial institutions. It highlights how recoveries are often overlooked during benign economic periods but become crucial during downturns. The paper provides recommendations for updating recoveries strategies, including inventory management, digital transformation, and operational improvements. It also emphasizes the need for a data-driven approach and collaboration with debt settlement companies. The authors estimate that implementing these strategies can lead to a 15-30% increase in recoveries revenue during a downturn. The paper concludes by urging financial institutions to take action and refresh their recoveries strategies to unlock significant value.


Alternatively, we can use the "map-reduce" approach to summarize the document.

In [9]:
from langchain.chains.mapreduce import MapReduceChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain

llm = ChatOpenAI(temperature=0)

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

In [10]:
# Reduce
reduce_template = """The following is set of summaries:
{doc_summaries}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""

reduce_prompt = PromptTemplate.from_template(reduce_template)

In [11]:
# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000,
)

In [None]:
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

In [13]:
print(map_reduce_chain.run(docs))

The main themes of the provided documents revolve around the importance of optimizing recoveries strategies, the evolution of recoveries operations, and the recommended strategies for future economic downturns. These themes highlight the significance of inventory management, digital transformation, next-generation models, litigation usage, debt sales testing, and collaboration with debt settlement companies. The documents emphasize the potential impact of these strategies on recoveries revenue and stress the need for action in the current economic conditions. Additionally, a data-driven and segmented approach to recoveries is emphasized, with 2nd Order Solutions being highlighted as an expert in digital-first recoveries.


This takes much longer than the `stuff` approach above, and it looks like this summary is slightly inferior to the earlier ones.

### Create a knowledge base from this document

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=10000,
    chunk_overlap=100,
    length_function=len)

documents = text_splitter.split_documents(docs)

In [15]:
len(documents)

2

In [20]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [24]:
input_doc.split('/')[-2]

'recoveries-strategy-refresh'

In [25]:
from langchain.vectorstores import Chroma

# we will create a local (vector) database
persist_directory = f"../vector_db/{input_doc.split('/')[-2]}"

In [26]:
# create the vector db
vectordb = Chroma.from_documents(documents=documents, embedding=embeddings, persist_directory=persist_directory)

In [27]:
vectordb.persist()

In [28]:
# create a retriever
doc_retriever = vectordb.as_retriever()

In [32]:
# let's test it
from langchain.chains import RetrievalQA

doc_qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=doc_retriever)
query = "Which observations from the last recession have informed the author's valuations in this article?"
doc_qa.run(query)

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


"The observations from the last recession that have informed the author's valuations in this article are:\n\n1. The significant increase in charged-off volumes overwhelmed agencies.\n2. Unworked paper eventually flooded the debt buyer market, resulting in a cratering of sales prices.\n3. Liquidation rates on most Recoveries strategies dropped, but to a lesser degree.\n4. Legal liquidation rates were surprisingly resilient with just a slight deterioration in performance."

In [33]:
from langchain.agents import Tool, ZeroShotAgent

tools = [
    Tool(
        name = "Document QA Agent",
        func=doc_qa.run,
        description="useful for when you need to answer questions about a document. Input should be a fully formed question."
    )
]

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools, 
    prefix=prefix, 
    suffix=suffix, 
    input_variables=["input", "chat_history", "agent_scratchpad"]
)

In [34]:
from langchain.memory import ConversationBufferMemory
from langchain import LLMChain

llm_chain = LLMChain(llm=llm, prompt=prompt)

memory = ConversationBufferMemory(memory_key="chat_history")

agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

In [35]:
from langchain.agents import AgentExecutor

agent_chain = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, memory=memory)

In [36]:
agent_chain.run(input="Which observations from the last recession have informed the author's valuations in this article?")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


"The observations from the last recession that have informed the author's valuations in this article are: 1) The significant increase in charged-off volumes overwhelmed agencies. 2) Unworked paper eventually flooded the debt buyer market, resulting in a cratering of sales prices. 3) Liquidation rates on most Recoveries strategies dropped, but to a lesser degree. 4) Legal liquidation rates were surprisingly resilient with just a slight deterioration in performance."

In [37]:
agent_chain.run(input="Who wrote this article?")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


'The authors of this article are Matt Jarrell and Dave Wasik.'

In [38]:
agent_chain.run(input="Can you provide brief bio for both of them?")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


'Matt Jarrell is the Head of Client and Enterprise Analytics at Quanta Credit Services and a Senior Advisor at 2nd Order Solutions. Dave Wasik is a Partner at 2nd Order Solutions and a former senior executive at Capital One.'