
# GPT-Index + Langchain Document QA


## FROZEN: further updates in streamlit_app.py. This notebook marks a proof-of-concept milestone of one page

### Set up env

In [2]:
%pip install gpt_index html2text langchain && pip freeze > requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [10]:
from gpt_index import GPTListIndex, SimpleWebPageReader, BeautifulSoupWebReader, GPTSimpleVectorIndex
from IPython.display import Markdown, display
from langchain.agents import load_tools, Tool, initialize_agent, ZeroShotAgent, AgentExecutor
from langchain.llms import OpenAI
from langchain import OpenAI, LLMChain
from pathlib import Path
import hashlib

### Add Document QA

In [11]:
Input_URL = "https://experiencewelcome.zendesk.com/hc/en-us/articles/12461508447124-Hubspot-Integration-Documentation"
hashed_input_url = hashlib.md5(Input_URL.encode("ascii")).hexdigest()
index_filepath = f"./doc_qa_{hashed_input_url}.json"
try:
    index = GPTSimpleVectorIndex.load_from_disk(index_filepath)
except FileNotFoundError:
    documents = SimpleWebPageReader(html_to_text=True).load_data([Input_URL])
    tmp_index = GPTSimpleVectorIndex(documents)
    tmp_index.save_to_disk(index_filepath)
    index = GPTSimpleVectorIndex.load_from_disk(index_filepath)

> Adding chunk: Please stand by, while we are checking your bro...
> [build_index_from_documents] Total LLM token usage: 0 tokens
> [build_index_from_documents] Total embedding token usage: 66 tokens


### Create langchain agent

In [14]:
def querying_db(query: str):
    response = index.query(query, verbose=True)

tools = [
    Tool(
        name="QueryingDB",
        func=querying_db,
        description="Returns most relevant answer from document for query string",
    )
]
llm = OpenAI(temperature=0.1)

query_string = "What kinds of things can I do with the Welcome Hubspot integration?"

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
result = agent.run(query_string)
print(result)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what the Welcome Hubspot integration can do.
Action: QueryingDB
Action Input: What can I do with the Welcome Hubspot integration?[0m> Top 1 nodes:
> [Node ba91841f-a386-41c0-8ada-6574d265fb81] [Similarity score:                     0.70809] Please stand by, while we are checking your browser...

Redirecting...

# Please turn JavaScript ...
> Searching in chunk: Please stand by, while we are checking your bro...
> Initial response: 
Without prior knowledge, it is not possible to answer this question.
> [query] Total LLM token usage: 119 tokens
> [query] Total embedding token usage: 11 tokens

Observation: [36;1m[1;3mNone[0m
Thought:[32;1m[1;3m I need to find out more about the Welcome Hubspot integration
Action: QueryingDB
Action Input: What is the Welcome Hubspot integration?[0m> Top 1 nodes:
> [Node ba91841f-a386-41c0-8ada-6574d265fb81] [Similarity score:                     0.728363] Please stand 