#### Question answering from documents using LLM

In [1]:
import os
GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"]

In [2]:
from langchain.embeddings import GooglePalmEmbeddings
embedding = GooglePalmEmbeddings(google_api_key=GOOGLE_API_KEY)

In [3]:
from langchain.vectorstores import Chroma
persist_directory = "./vector_database/chroma/"

In [4]:
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [5]:
print(vectordb._collection.count())

209


In [6]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [7]:
from langchain.chat_models import ChatGooglePalm
llm = ChatGooglePalm(temperature=0, google_api_key=GOOGLE_API_KEY)

#### RetrievalQA chain

In [8]:
from langchain.chains import RetrievalQA

In [9]:
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectordb.as_retriever())

In [10]:
result = qa_chain({"query": question})

In [11]:
result

{'query': 'What are major topics for this class?',
 'result': 'The major topics for this class are:\n\n* Supervised learning: This is the type of machine learning where you have a set of data with labeled outputs, and you train a model to predict the output for new data.\n* Unsupervised learning: This is the type of machine learning where you have a set of data without labeled outputs, and you train a model to find patterns in the data.\n* Reinforcement learning: This is the type of machine learning where an agent learns to take actions in an environment in order to maximize a reward.\n* Natural language processing: This is the field of computer science that deals with the interaction between computers and human (natural) languages.\n* Computer vision: This is the field of computer science that deals with the extraction of meaningful information from digital images or videos.\n* Speech recognition: This is the field of computer science that deals with the automatic recognition of human

#### Building Prompt
- Prompt takes (a) document as context and (b) question, and passes to a language model

In [12]:
from langchain.prompts import PromptTemplate

In [13]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [14]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [15]:
# new question
question = "Is probability a class topic?"

In [16]:
result = qa_chain({"query":question})

In [17]:
result.keys()

dict_keys(['query', 'result', 'source_documents'])

In [18]:
result["result"]

'Yes, probability is a class topic. In the first lecture, the instructor mentions that they will be covering topics such as "probability distributions, Bayes\' theorem, and hypothesis testing." These are all important concepts in probability theory, and they will be essential for understanding the material in the rest of the class.\n\nProbability is a branch of mathematics that deals with the likelihood of events occurring. It is used in a wide variety of fields, including statistics, machine learning, and finance. Probability theory is a powerful tool that can be used to make predictions about the future, and it is an essential part of many scientific and engineering disciplines.\n\nIf you are interested in learning more about probability, there are many resources available online and in libraries. You can also find many helpful tutorials and videos on YouTube.'

In [19]:
result["source_documents"][0]

Document(page_content="person's face physically attractive. There's a learning algorithm on op tical illusions, and \nso on.  \nAnd it goes on, so lots of fun projects. A nd take a look, then come up with your own \nideas. But whatever you find cool and interest ing, I hope you'll be able to make machine \nlearning a project out of it. Yeah, question?  \nStudent : Are these gro up projects?  \nInstructor (Andrew Ng): Oh, yes, thank you.  \nStudent : So how many people can be in a group?  \nInstructor (Andrew Ng): Right. So projects can be done in  groups of up to three people. \nSo as part of forming study groups, later t oday as you get to know your classmates, I \ndefinitely also encourage you to grab two ot her people and form a group of up to three \npeople for your project, okay? And just start brainstorming ideas for now amongst \nyourselves. You can also come and talk to me or the TAs if you want to brainstorm ideas \nwith us.  \nOkay. So one more organizational ques tion. I'm c

#### If we have too many documents, they all may not be able to fit in the context window.

#### RetrievalQA chain types: the default technique is stuff technique, chain_type="stuff"
#### Map Reduce technique
- In this technique, each of the individual documents is first sent to the language model by itself to obtain an original answer. These individual answers are then composed into a final answer with a final call to the language model. This process involves many more calls to the language model.

In [20]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [21]:
result = qa_chain_mr({"query": question})

Token indices sequence length is longer than the specified maximum sequence length for this model (1420 > 1024). Running this sequence through the model will result in indexing errors


In [22]:
result["result"]

"Yes, probability is a class topic. It is a branch of mathematics that deals with the likelihood of events occurring. Probability is used in many different fields, including statistics, machine learning, and finance.\n\nIn this class, we will learn about the basic concepts of probability, such as probability distributions, random variables, and conditional probability. We will also learn about some of the more advanced topics in probability, such as Bayes' theorem and Markov chains.\n\nProbability is a very important topic, and it is used in many different areas of our lives. By understanding probability, we can make better decisions and understand the world around us better.\n\nHere are some examples of how probability is used in the real world:\n\n* In statistics, probability is used to estimate the likelihood of events occurring. For example, a statistician might use probability to estimate the likelihood of a certain candidate winning an election.\n* In machine learning, probabilit

#### Refine technique
- the refined chain combines information sequentially, and encourages more carrying over of information than the MapReduce chain.

In [23]:
qa_chain_re = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_re({"query": question})
result["result"]

'Based on the new context, it is clear that probability is a class topic. The instructor mentions that "we\'ll ask you to implement a few programs, a few machine learning algorithms as part of the homeworks. And most of them will involve probability." This suggests that probability is an important topic in the class and that students will be expected to have a strong understanding of it in order to succeed.\n\nProbability is the study of chance. It is used in many different fields, including statistics, gambling, and machine learning. In machine learning, probability is used to estimate the likelihood of an event occurring. This information can then be used to make predictions or decisions.\n\nThere are many different ways to calculate probability. One common method is to use a probability distribution. A probability distribution is a mathematical function that describes the probability of a variable taking on different values. For example, the probability distribution of a coin toss i

#### follow up question

In [24]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [25]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

"Yes, probability is a class topic. In fact, it is one of the most important topics in machine learning. Probability is used to model uncertainty, which is a key concept in machine learning. Without probability, it would be very difficult to build accurate machine learning models.\n\nProbability is also used to calculate the confidence of a machine learning model's predictions. This is important because it allows us to understand how likely a model is to be correct.\n\nOverall, probability is a very important topic in machine learning. It is used to model uncertainty, calculate the confidence of a model's predictions, and build accurate machine learning models."

In [26]:
question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

'The prerequisites are needed to ensure that the user has a basic understanding of the concepts that will be covered in the lecture. This will help the user to follow the lecture and to understand the material. Additionally, the prerequisites will help the user to apply the concepts that they learn in the lecture to their own work.'

#### This result is not a correct answer, as the current chain doesn't have any concept of state/memory and fails to preserve conversational history.