- This notebook is based on what I learned from:
- https://learn.deeplearning.ai/langchain-chat-with-your-data

In [1]:
import openai
import os
import sys
sys.path.append("../doc")

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key = os.environ["OPENAI_API_KEY"]

# 1. Document Loading
- Goal: to load structured/unstructured data 
- 80+ types of documents you can use in langchain.

In [2]:
from langchain.document_loaders import PyPDFLoader

In [3]:
loader = PyPDFLoader('docs/ltcma-full-report.pdf')

In [4]:
pages = loader.load()

In [5]:
len(pages)

124

In [7]:
page = pages[2]

In [8]:
print(page.page_content[:500])

J.P. Morgan Asset Management  3
Foreword 
By nearly any measure, the early 2020s have been a period of extraordinary challenge. The worst pandemic 
in over a century triggered a short but severe recession and enduring supply disruptions. A generous fiscal 
response, facilitated by unusually easy monetary policy, fueled the highest levels of inflation since the early 
1980s. Russia’s brutal invasion of Ukraine created a devastating humanitarian crisis and further supply 
disruptions and inflation


In [9]:
page.metadata

{'source': 'docs/ltcma-full-report.pdf', 'page': 2}

In [10]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

In [11]:
url="https://www.youtube.com/watch?v=opoHpa67oFU"
save_dir="docs/youtube/"
loader = GenericLoader(
    YoutubeAudioLoader([url], save_dir),
    OpenAIWhisperParser()
)
# If there is an error, try: brew install youtube-dl / brew install ffmpeg

In [None]:
docs = loader.load()

[youtube] Extracting URL: https://www.youtube.com/watch?v=opoHpa67oFU
[youtube] opoHpa67oFU: Downloading webpage
[youtube] opoHpa67oFU: Downloading ios player API JSON
[youtube] opoHpa67oFU: Downloading android player API JSON
[youtube] opoHpa67oFU: Downloading m3u8 information
[info] opoHpa67oFU: Downloading 1 format(s): 140
[download] Destination: docs/youtube//1,000,000 ELO CHESS BOT!!!.m4a
[download] 100% of   24.04MiB in 00:00:01 at 14.93MiB/s    
[FixupM4a] Correcting container of "docs/youtube//1,000,000 ELO CHESS BOT!!!.m4a"
[ExtractAudio] Not converting audio docs/youtube//1,000,000 ELO CHESS BOT!!!.m4a; file is already in target format m4a
Transcribing part 1!
Transcribing part 2!


In [None]:
docs[0].page_content[0:600]

"Ladies and gentlemen, one of the most interesting elements of the game of chess is the fact that AI is better than any human will ever be, ever. I mean, that's not an exaggeration, AI has completely taken over in terms of being excellent at chess. A human being can start at massive, massive favorite in terms of odds and material, computer will still wipe the floor with them. And AI is really fascinating when they play against each other, or when they are nerfed and there are different types of bot forms. But in this video, I'm gonna give you a major piece of news, chess.com, among all the othe"

In [12]:
from langchain.document_loaders import WebBaseLoader


In [None]:
loader = WebBaseLoader("https://medium.com/towards-data-science/understanding-variational-autoencoders-vaes-f70510919f73")

In [None]:
docs = loader.load()

In [None]:
docs[0].page_content[0:600]

'Understanding Variational Autoencoders (VAEs) | by Joseph Rocca | Towards Data ScienceOpen in appSign upSign InWriteSign upSign InUnderstanding Variational Autoencoders (VAEs)Building, step by step, the reasoning that leads to VAEs.Joseph Rocca·FollowPublished inTowards Data Science·23 min read·Sep 24, 2019--96ListenShareCredit: Free-Photos on PixabayThis post was co-written with Baptiste Rocca.IntroductionIn the last few years, deep learning based generative models have gained more and more interest due to (and implying) some amazing improvements in the field. Relying on huge amount of data, '

# 2. Document Splitting

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, Language

In [14]:
chunk_size = 26
chunk_overlap = 4

In [15]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

In [16]:
text1 = 'abcdefghijklmnopqrstuvwxyz'

In [17]:
r_splitter.split_text(text1)

['abcdefghijklmnopqrstuvwxyz']

In [18]:
text2 = "abcdefghijklmnopqrstuvwxyzabcdefg"

In [19]:
r_splitter.split_text(text2)

['abcdefghijklmnopqrstuvwxyz', 'wxyzabcdefg']

In [20]:
text3 = "a b c d e f g h i j k l m n o p q r s t u v w x y z"

In [21]:
r_splitter.split_text(text3)

['a b c d e f g h i j k l m', 'l m n o p q r s t u v w x', 'w x y z']

In [22]:

c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
c_splitter.split_text(text3)

['a b c d e f g h i j k l m n o p q r s t u v w x y z']

In [23]:

c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separator = " "
)
c_splitter.split_text(text3)

['a b c d e f g h i j k l m', 'l m n o p q r s t u v w x', 'w x y z']

In [24]:
some_text = """When writing documents, writers will use document structure to group content.\n \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n  \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns are the "backslash n" you see embedded in this string. \
Sentences have a period at the end, but also, have a space.\
and words are separated by space."""

In [25]:
c_splitter = CharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0,
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0, 
    separators=["\n\n", "\n", " ", ""]
)

In [26]:
c_splitter.split_text(some_text)

["When writing documents, writers will use document structure to group content.\n This can convey to the reader, which idea's are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document.",
 'Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also, have a space.and words are separated by space.']

In [27]:
r_splitter.split_text(some_text )

["When writing documents, writers will use document structure to group content.\n This can convey to the reader, which idea's are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document.",
 'Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also, have a space.and words are separated by space.']

In [28]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/ltcma-full-report.pdf")
pages = loader.load()

In [29]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=150,
    length_function=len
)

In [30]:
docs = text_splitter.split_documents(pages)

In [31]:
len(docs)

570

In [32]:
len(pages)

124

In [33]:
docs[5].page_content[-200:]

's diversification benefits weaken\n104 Portfolio implications \nStriking a balance: Strategic patience, tactical flexibility\nAssumption matrices\n112 U.S. dollar \n2023 Estimates and correlations\n114 Euro'

In [34]:
docs[6].page_content[:200]

'Striking a balance: Strategic patience, tactical flexibility\nAssumption matrices\n112 U.S. dollar \n2023 Estimates and correlations\n114 Euro \n2023 Estimates and correlations\n116 Sterling \n2023 Estimates'

In [35]:
from langchain.text_splitter import TokenTextSplitter

In [36]:
text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)

In [37]:
text1 = "foo bar bazzyfoo"

In [38]:
text_splitter.split_text(text1)

['foo', ' bar', ' b', 'az', 'zy', 'foo']

In [39]:
from langchain.text_splitter import Language

In [40]:
PYTHON_CODE = """
def hello_world():
    print("Hello, World!")

# Call the function
hello_world()
"""
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=50, chunk_overlap=0
)
python_docs = python_splitter.create_documents([PYTHON_CODE])
python_docs

[Document(page_content='def hello_world():\n    print("Hello, World!")', metadata={}),
 Document(page_content='# Call the function\nhello_world()', metadata={})]

# 3. Vertorstores and Embeddings

In [41]:
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings()

In [42]:
sentence1 = "i like dogs"
sentence2 = "i hate dogs"
sentence3 = "i like canines"
sentence4 = "the weather is ugly outside"

In [43]:
embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)
embedding4 = embedding.embed_query(sentence4)

In [44]:
import numpy as np

In [45]:
np.dot(embedding1, embedding2)

0.9140029516454701

In [46]:
np.dot(embedding1, embedding3)

0.9632050183193249

In [47]:
np.dot(embedding2, embedding3)

0.8856142325237184

In [48]:
from langchain.vectorstores import Chroma

In [49]:
persist_directory = 'docs/chroma/'

In [50]:
vectordb = Chroma.from_documents(
    documents=docs,
    embedding=embedding,
    persist_directory=persist_directory
)

In [51]:
vectordb._collection.count()

1140

In [52]:
len(docs)

570

In [55]:
question = "What will be the inflation rate next year?"

In [56]:
out = vectordb.similarity_search(question, k=3)

In [57]:
out[0]

Document(page_content='J.P. Morgan Asset Management  27Transition effects: In most countries, adding to inflation\nFinally, we consider the impact of the starting point for inflation \nrelative to its long-term trend. At publishing time, the monthly \nrunning rate for inflation had backed off from its peak earlier \nin 2022. However, with higher wage growth, higher inflation \nexpectations and the lagged impact of higher home prices, \ninflation in most countries remains significantly above both \ncentral bank targets and our estimates of long-run trend \ninflation. Despite public concern about recent inflation, we expect that \ninflation rates will moderate quite quickly in 2023 and 2024. \nIndeed, the current much more hawkish attitudes and actions \nof central banks suggest that inflation could fall sharply to trend \nrates, undershoot them and then revert to them in the early \nyears of the forecast. \nThe full details of these dynamics are, of course, well beyond', metadata={'page

In [58]:
out[1]

Document(page_content='J.P. Morgan Asset Management  27Transition effects: In most countries, adding to inflation\nFinally, we consider the impact of the starting point for inflation \nrelative to its long-term trend. At publishing time, the monthly \nrunning rate for inflation had backed off from its peak earlier \nin 2022. However, with higher wage growth, higher inflation \nexpectations and the lagged impact of higher home prices, \ninflation in most countries remains significantly above both \ncentral bank targets and our estimates of long-run trend \ninflation. Despite public concern about recent inflation, we expect that \ninflation rates will moderate quite quickly in 2023 and 2024. \nIndeed, the current much more hawkish attitudes and actions \nof central banks suggest that inflation could fall sharply to trend \nrates, undershoot them and then revert to them in the early \nyears of the forecast. \nThe full details of these dynamics are, of course, well beyond', metadata={'page

In [59]:
out[2]

Document(page_content='expect long-term inflation will rise uncontrollably and little \nevidence signaling tolerance of inflation by central banks.\nTo be sure, risks to inflation are considerably more two-sided \ntoday, possibly pointing to more volatility in inflation in the years \nahead. As we discuss in our macroeconomics section, the \nlonger-term disinflationary forces of technology adoption and \nglobalization may have slowed, but they have not disappeared. \nMeanwhile, central banks have clearly rediscovered their \ninflation-fighting zeal, with renewed commitment to achieving \ninflation targets over the next two to three years.Back to basics', metadata={'page': 8, 'source': 'docs/ltcma-full-report.pdf'})

# 4. Retrieval 
- maximum marginal relevance
- query -> top fetch_k response -> top k 

In [60]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'

In [61]:
print(vectordb._collection.count())

1140


In [62]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [63]:
smalldb = Chroma.from_texts(texts, embedding=embedding)

In [64]:
question = "Tell me about all-white mushrooms with large fruiting bodies"

In [65]:
smalldb.similarity_search(question, k=2)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.', metadata={}),
 Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).', metadata={})]

In [66]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.', metadata={}),
 Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.', metadata={})]

In [67]:
question = "what did they say about inflation?"
docs_ss = vectordb.similarity_search(question, k=3)

In [68]:
docs_ss[0].page_content

'expect long-term inflation will rise uncontrollably and little \nevidence signaling tolerance of inflation by central banks.\nTo be sure, risks to inflation are considerably more two-sided \ntoday, possibly pointing to more volatility in inflation in the years \nahead. As we discuss in our macroeconomics section, the \nlonger-term disinflationary forces of technology adoption and \nglobalization may have slowed, but they have not disappeared. \nMeanwhile, central banks have clearly rediscovered their \ninflation-fighting zeal, with renewed commitment to achieving \ninflation targets over the next two to three years.Back to basics'

In [69]:
docs_ss[1].page_content

'expect long-term inflation will rise uncontrollably and little \nevidence signaling tolerance of inflation by central banks.\nTo be sure, risks to inflation are considerably more two-sided \ntoday, possibly pointing to more volatility in inflation in the years \nahead. As we discuss in our macroeconomics section, the \nlonger-term disinflationary forces of technology adoption and \nglobalization may have slowed, but they have not disappeared. \nMeanwhile, central banks have clearly rediscovered their \ninflation-fighting zeal, with renewed commitment to achieving \ninflation targets over the next two to three years.Back to basics'

In [70]:
docs_ss[2].page_content

'• The major question hanging over the outlook is whether the world has \nmoved into a high inflation regime.\n• While many economies are overheating today and inflation expectations \nhave moved up, we think many secular forces that have depressed inflation \nin recent decades remain in place.\n• Additionally, we think most central banks will pursue their price stability \ngoals assiduously over the medium term.\n• As a result, our inflation forecasts have moved up modestly but \nnot dramatically.'

In [71]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [72]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from `docs/ltcma-full-report.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the report",
        type="integer",
    ),
]

In [73]:
document_content_description = "report"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

In [74]:
question = "what did they say about mortgage rates?"

In [75]:
out = retriever.get_relevant_documents(question)



query='mortgage rates' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='source', value='docs/ltcma-full-report.pdf') limit=None


In [76]:
out[0].page_content

'refinancing meant credit spreads outperformed equities. \nPrevailing spreads in both U.S. investment grade (IG) and \nhigh yield (HY) are near our equilibrium spread forecasts of \n160bps and 480bps, respectively, leading to return forecasts \nup 270bps, to 5.50%, for U.S. IG and up 290bps, to 6.80%, \nfor U.S. HY. Back to basics'

In [77]:
out[0].metadata

{'page': 10, 'source': 'docs/ltcma-full-report.pdf'}

In [78]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [80]:
# Load PDF
# loader = PyPDFLoader("docs/ltcma-full-report.pdf")
# pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000,chunk_overlap = 150)

# splits = text_splitter.split_text(joined_page_text)


In [81]:
# Retrieve
svm_retriever = SVMRetriever.from_documents(docs,embedding)
tfidf_retriever = TFIDFRetriever.from_documents(docs)

In [82]:
question = "What is their view on mortgage rates?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]

Document(page_content='predominant material for home construction, also informs our \nview. Even if the U.S. housing market cooled under pressure \nfrom rising rates, we would expect to see ongoing investor \ninterest in forests as a premium-priced source of tradable \ncarbon credits, especially for corporates seeking to meet \ncompliance-based commitments to reduce their carbon \nfootprints. Taken together, these factors create an opportunity \nfor new capital to serve as a resource for timberland \ndevelopment. \nThe U.S. continues to be the largest investible region for \ntimberland, and local housing demand remains strong, \nboosting timberland’s appeal as an inflation-linked asset \nclass. Although affordability headwinds have recently \ndampened new home construction forecasts, we expect to \nsee housing starts stay well above the levels recorded in the \nwake of the global financial crisis (GFC). \nElsewhere, expanding household formation and global \npopulation growth continue 

In [83]:
question = "What is their view on mortgage rates?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]

Document(page_content='J.P. Morgan Asset Management  65We are not explicitly forecasting items such as an economic \nequilibrium rate (R*, an unobservable real cash rate that is \nneither expansionary nor contractionary). While our view on \nwhere R* will settle is very important, central bank policymakers \nset nominal cash rates and will differ from R* depending on \ntheir monetary policy stance. For example, over the last decade, \ncentral banks have spent much more time below what would be \nconsidered the economic neutral rate than above it.\nWe are also not explicitly forecasting a terminal rate, the \nhighest cash rate a central bank will achieve, although, of \ncourse, our view on terminal rates will play an important role in \ndetermining our forecasts for average yields and the slope of \nthe curve.\nNormalization and sensitivity analysis \nUnlike our previous set of assumptions of the last 10 years, we \nno longer see normalization as a drag on long-term returns.', metadata=

# 5. Question Answering

In [84]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo-0301


In [85]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [86]:
print(vectordb._collection.count())

1140


In [87]:
question = "What is their view on emerging market?"
docs = vectordb.similarity_search(question, k=3)
docs

[Document(page_content='72 2023 Long-Term Capital Market Assumptions\nEmerging markets\nAs in previous publications, we expect EM equites to \noutperform DM equities over our investment horizon. \nEmerging market stocks earn revenues in regions where \nnominal GDP is growing faster, supporting their earnings \nstreams. This year, we update our methodology, taking a closer \nlook at the nominal GDP growth pass-through to revenue \ngrowth. In some cases, we acknowledge, high inflation might \nsupport nominal GDP but hurt revenue growth in other ways. \nThis dynamic is particularly relevant in today’s high inflation \nenvironment, especially for countries that face persistent \ninflation issues and steady long-term currency depreciation \n(India, South Africa and Brazil, for example). In such cases, \nwe lower our revenue growth projections. Even with these \nadjustments, revenue expectations in emerging markets', metadata={'page': 71, 'source': 'docs/ltcma-full-report.pdf'}),
 Document(p

In [88]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)

In [89]:
from langchain.chains import RetrievalQA

In [90]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [91]:
result = qa_chain({"query": question})

In [92]:
result

{'query': 'What is their view on emerging market?',
 'result': 'The publication expects emerging market equities to outperform DM equities over their investment horizon. Emerging market stocks earn revenues in regions where nominal GDP is growing faster, supporting their earnings streams. However, they acknowledge that high inflation might support nominal GDP but hurt revenue growth in other ways, especially for countries that face persistent inflation issues and steady long-term currency depreciation. They also see little change in their fair value assumptions for emerging market debt, although the market pricing has changed significantly over the past year.'}

In [93]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [94]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [95]:
result = qa_chain({"query": question})


In [96]:
result

{'query': 'What is their view on emerging market?',
 'result': 'They expect EM equities to outperform DM equities over their investment horizon, and they see little change in their fair value assumptions for emerging market debt, although the market pricing has changed significantly over the past year. Thanks for asking!',
 'source_documents': [Document(page_content='72 2023 Long-Term Capital Market Assumptions\nEmerging markets\nAs in previous publications, we expect EM equites to \noutperform DM equities over our investment horizon. \nEmerging market stocks earn revenues in regions where \nnominal GDP is growing faster, supporting their earnings \nstreams. This year, we update our methodology, taking a closer \nlook at the nominal GDP growth pass-through to revenue \ngrowth. In some cases, we acknowledge, high inflation might \nsupport nominal GDP but hurt revenue growth in other ways. \nThis dynamic is particularly relevant in today’s high inflation \nenvironment, especially for cou

In [97]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result

{'query': 'What is their view on emerging market?',
 'result': 'They expect EM equities to outperform DM equities over their investment horizon. They believe that emerging market stocks earn revenues in regions where nominal GDP is growing faster, supporting their earnings streams. However, they acknowledge that high inflation might support nominal GDP but hurt revenue growth in other ways, especially for countries that face persistent inflation issues and steady long-term currency depreciation. In such cases, they lower their revenue growth projections. They see little change in their fair value assumptions for emerging market debt, although the market pricing has changed significantly over the past year. Within EM sovereign debt, the world has become increasingly heterogeneous.'}

In [98]:
qa_chain_rf = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_rf({"query": question})
result

{'query': 'What is their view on emerging market?',
 'result': 'The authors have a positive view on emerging market equities, expecting them to outperform developed market equities. However, they note that the emerging market debt market has become increasingly heterogeneous, with some regions looking healthy while others are fighting inflationary pressures and political unrest. This has led to an optically high starting yield and spread, particularly for countries going through restructuring or viewed as approaching financial distress. While the authors see little change in their fair value assumptions for emerging market debt, they caution that these high yields may be more reflective of near-term default risk than long-term drivers of return through spread compression. Therefore, while the authors have a positive view on emerging market equities, they are cautious about emerging market debt due to the heterogeneous nature of the market and the high yields being reflective of near-te

# 6. Chat

In [99]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [100]:
question = "What is their view on emerging market?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [101]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)
llm.predict("Hello world!")

'Hello there! How can I assist you today?'

In [102]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

# Run chain
from langchain.chains import RetrievalQA
question = "Is investing in emerging market good?"
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


result = qa_chain({"query": question})
result["result"]

'Based on the information provided, the report expects EM equities to outperform DM equities over the investment horizon, but there are concerns about high inflation and currency depreciation in some countries that may affect revenue growth projections. As for emerging market debt, the world has become increasingly heterogeneous, with some regions looking healthy while others are still fighting inflationary pressures and political unrest. The report sees little change in fair value assumptions for emerging market debt, but notes that there are optically high starting yields and spreads. Overall, the report provides some insights into the opportunities and risks of investing in emerging markets, but it is up to individual investors to make their own decisions based on their risk tolerance and investment goals. Thanks for asking!'

In [103]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [104]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [105]:
question = "Is investing in emerging market good?"
result = qa({"question": question})

In [106]:
result["answer"]

'Based on the information provided, the report expects EM equities to outperform DM equities over their investment horizon. However, it also notes that there are risks and challenges in emerging markets, such as high inflation and political unrest, that could affect revenue growth and debt. Therefore, investing in emerging markets may have potential benefits, but it also involves risks that should be carefully considered. It is recommended to consult with a financial advisor before making any investment decisions.'

In [107]:
question = "Why is it so?"
result = qa({"question": question})

In [108]:
result["answer"]

"The given context does not provide a direct answer to the user's question. However, it does provide some information on emerging market local currencies and credit. It suggests that emerging market local currencies are historically attractive due to the strength in the US dollar and the rise in local yields through tighter monetary policy. However, timing is crucial, and continued dollar strength could create further local, periodic market stress. In emerging market credit, the composition of the J.P. Morgan Broad Diversified Core Index is expected to remain stable, and long-term default and recovery rates are expected to remain close to historical levels. \n\nRegarding the user's question, investing in emerging markets can offer attractive risk-adjusted returns, but it also involves risks such as political instability, currency fluctuations, and liquidity risks. Therefore, it is recommended to consult with a financial advisor before making any investment decisions to understand the p

In [110]:
question = "Your answer is too vague. I need more concrete response. Can you do that?"
result = qa({"question": question})
result["answer"]

'Yes, according to the context provided, investing in emerging markets can offer historically attractive levels due to the strength of the US dollar and the rise in local yields through tighter monetary policy. However, timing is important due to ongoing tightening of financial conditions. Continued dollar strength could create further local, periodic market stress, despite the much-improved valuation levels today. In terms of emerging market credit, the composition of the J.P. Morgan Broad Diversified Core Index (CEMBI CORE) is expected to remain stable, and long-term default and recovery rates are expected to remain close to historical levels. \n\nIn the near term, tilts to credit sectors and international equity may offer attractive risk-adjusted returns, while lower volatility alternatives and short-duration fixed income help to manage risk and preserve liquidity. However, the potential for a more positively correlated market environment requires vigilance around the distribution o