# Document Question Answering

An example of using Chroma DB and LangChain to do question answering over documents.

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chains import VectorDBQA
#from langchain.document_loaders import TextLoader
from langchain.document_loaders import UnstructuredAPIFileLoader
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.document_loaders import JSONLoader

## Load documents

Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [2]:
# loader = TextLoader('state_of_the_union.txt')
# documents = loader.load()
%pip install unstructured > /dev/null

Note: you may need to restart the kernel to use updated packages.


In [None]:
from langchain.document_loaders import UnstructuredMarkdownLoader
markdown_path = "qa-test.md"
loader = UnstructuredMarkdownLoader(markdown_path)
documents = loader.load()
documents

## Split documents

Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database.

In [5]:
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(texts, embeddings)

## Create the chain

Initialize the chain we will use for question answering.

In [17]:
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=vectordb)

## Ask questions!

Now we can use the chain to ask questions!

In [7]:
query = "who is the first speaker?"
qa.run(query)

' Speaker Valdez'

In [8]:
query =  "which VEX products are the most used by all speakers? And how many for each?"
qa.run(query)

' VEX IQ and VRC are the most used products among the speakers, with 230 teams using VRC and 48 teams using VEX IQ.'

In [9]:
query =  "what help do repondents ask for relating to their understanding of STEM?"
qa.run(query)

" The respondents ask for help with resources that make it easier to share their message about STEM education, and for packages that are ready to use and don't have to be pieced together. They also ask for help with troubleshooting and understanding coding and robotics."

In [10]:
query =  " how did teachers describe  some examples of transfer learning using vex?"
qa.run(query)

' Teachers described transfer learning using Vex as having older students design or program a robot and then present it to younger students, allowing them to be the teacher and problem solver. This builds confidence and helps younger kids get invested in the process.'

In [15]:
query =  "which aspects of VEX robots do teachers need more help with?"
qa.run(query)

' Teaching the basics of programming, mechanical engineering, and iterative design.'