## Sign up for APIs

- Add OPENAI_API_KEY to .env file from OpenAI
- Add SERPAPI_API_KEY to .env file from https://serpapi.com/

In [1]:
from dotenv import load_dotenv
load_dotenv() 

True

In [2]:
from langchain.utilities import ApifyWrapper
from langchain.document_loaders.base import Document
from langchain.indexes import VectorstoreIndexCreator
from langchain.document_loaders import ApifyDatasetLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma




In [3]:
from langchain.agents import load_tools, Tool
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [4]:
from langchain_visualizer.jupyter import visualize
import asyncio
import tqdm as notebook_tqdm

In [5]:
llm_openai_davinci_3 = OpenAI(temperature=0)

In [6]:

llm_agent = llm_openai_davinci_3
llm_eval = llm_openai_davinci_3

Let's use an agent that can search the internet and do Maths in order to answer a simple question about our conference.

In [7]:
question = "When is the PyCon DE & PyData Berlin 2023 conference? How many days are between that date and today?"

In [8]:
tools = load_tools([
    "serpapi",
    "llm-math"],
    llm=llm_agent)
agent = initialize_agent(
    tools,
    llm_agent,
    agent="zero-shot-react-description",
      verbose=True)

output = agent.run(question)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find the date of the conference and then calculate the number of days between that date and today.
Action: Search
Action Input: "PyCon DE & PyData Berlin 2023"[0m[36;1m[1;3mPyConDE & PyData Berlin 2023, Berlin Germany. Where Pythonistas in Germany can meet to learn about new and upcoming Python libraries, tools, software and ...[0m[32;1m[1;3m I need to find the exact date of the conference
Action: Search
Action Input: "PyCon DE & PyData Berlin 2023 date"[0m[36;1m[1;3mPyCon DE & PyData Berlin 2023 · From 17 April through 19 April , 2023 · Explore events · More events at Berlin, Germany.[0m[32;1m[1;3m I need to calculate the number of days between the conference date and today
Action: Calculator
Action Input: 17 April 2023 - today's date[0m



ValueError: invalid syntax (<expr>, line 1). Please try again with a valid numerical expression

What is actually happening under the hood? Let's see it in the trace visualiser

In [None]:
async def search_agent_demo():
    return agent.run(
        question
    )

visualize(search_agent_demo)

2023-04-19 10:22.53.868678 [info     ] Trace: http://0.0.0.0:8935/traces/01GYC9R0NBYMHRQMF6Q41J62HP


[1m> Entering new AgentExecutor chain...[0m
Rendering http://127.0.0.1:8935/traces/01GYC9R0NBYMHRQMF6Q41J62HP in notebook


[32;1m[1;3m I need to find the date of the conference and then calculate the number of days between that date and today.
Action: Search
Action Input: "PyCon DE & PyData Berlin 2023"[0m[36;1m[1;3mPyConDE & PyData Berlin 2023, Berlin Germany. Where Pythonistas in Germany can meet to learn about new and upcoming Python libraries, tools, software and ...[0m[32;1m[1;3m I need to find the exact date of the conference
Action: Search
Action Input: "PyCon DE & PyData Berlin 2023 date"[0m[36;1m[1;3mPyCon DE & PyData Berlin 2023 · From 17 April through 19 April , 2023 · Explore events · More events at Berlin, Germany.[0m[32;1m[1;3m I need to calculate the number of days between the conference date and today
Action: Calculator
Action Input: 17 April 2023 - today's date[0m

Exception in thread Thread-4 (visualize):
Traceback (most recent call last):
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/myenv/lib/python3.10/site-packages/langchain/chains/llm_math/base.py", line 59, in _evaluate_expression
    numexpr.evaluate(
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/myenv/lib/python3.10/site-packages/numexpr/necompiler.py", line 817, in evaluate
    _names_cache[expr_key] = getExprNames(ex, context)
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/myenv/lib/python3.10/site-packages/numexpr/necompiler.py", line 704, in getExprNames
    ex = stringToExpression(text, {}, context)
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/myenv/lib/python3.10/site-packages/numexpr/necompiler.py", line 274, in stringToExpression
    c = compile(s, '<expr>', 'eval', flags)
  File "<expr>", line 1
    17 April 2023 - today()
       ^^^^^
SyntaxError: invalid syntax

During handling 

So it is the Google's fault that it gives us the incorrect date!
Let's build our own index of our conference website, maybe that would help

## Create Index Database from PyCon Website. Only run once

In [12]:
# 
# for a production-ready document index, use HayStack by Deepset

# only run once to create index db
# sign up for Apify.com and add  APIFY_API_TOKEN to .env file
# # costs 50 cents to scrape 704 pages
# apify = ApifyWrapper()
# loader = apify.call_actor(
#     actor_id="apify/website-content-crawler",
#     run_input={"startUrls": [{"url": "https://2023.pycon.de/"}]},
#     dataset_mapping_function=lambda item: Document(
#         page_content=item["text"] or "", metadata={"source": item["url"]}
#     ),  
# )

# persist_directory = 'db'
# embedding = OpenAIEmbeddings()

# loader = ApifyDatasetLoader(
#     dataset_id="jjA7bt8A2W30ekM5t",
#     dataset_mapping_function=lambda dataset_item: Document(
#         page_content=dataset_item["text"], metadata={"source": dataset_item["url"]}
#     ),
# )
# documents = loader.load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# split_docs = text_splitter.split_documents(documents)



# vectordb = Chroma.from_documents(documents=split_docs, embedding=embedding, persist_directory=persist_directory)

# vectordb.persist()

## Load already created index

In [10]:
persist_directory = 'db'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3})
result = retriever.get_relevant_documents("When is the PyCon DE & PyData Berlin 2023 conference?")




In [11]:
str(result[0].page_content)

'🎉 🤗 April 17-19 🤗 🎉 \nPyCon DE & PyData Berlin \n2023 \nThanks to the European Union and Europäischen Fonds für regionale Entwicklung for their support.\nDieses Vorhaben wurde als Teil der Reaktion der Union auf die Covid-19-Pandemie finanziert.\nSee you soon!\nCheck the Blog and FAQ for the most recent infos. \nAbout \nWhat can I expect from the conference? \nOver the span of 3 days, attendees will have the opportunity to participate in workshops, attend live keynote sessions and talks, as well as get to know fellow members of the Python and PyData Communities. We aim to be an accessible, community-driven conference, with novice to advanced level presentations. Tutorials and talks bring attendees the latest project features along with cutting-edge use cases. The conference is organised by experts for other experts providing an excellent quality level of content. Newcomers explictly welcome!\nCheck out the previous 2022 conference\nPyConDE videos\nPyData Berlin videos\nOur amazing ven

The retrieved results look very good. They have the answer. Great hope!

In [None]:
tools = load_tools(["serpapi", "llm-math"], llm=llm_agent)
tools.append(
    Tool(
        name="PyCon and PyData DE 2023 Website Index",
        func=lambda q: ";".join([x.page_content for x in retriever.get_relevant_documents(q)]),
        description=f"Useful when you want answer questions about PyCon and PyData DE 2023.",
    ),
)
index_agent = initialize_agent(tools, llm_agent, agent="zero-shot-react-description", verbose=True)
async def search_agent_with_index_demo():
    return index_agent.run(
        question
    )

visualize(search_agent_with_index_demo)

2023-04-19 00:45.48.740027 [info     ] Trace: http://0.0.0.0:8935/traces/01GYB8QB22VE8KCR1ZHD5NZ1MN


[1m> Entering new AgentExecutor chain...[0m
Rendering http://127.0.0.1:8935/traces/01GYB8QB22VE8KCR1ZHD5NZ1MN in notebook


[32;1m[1;3m I need to find out when the conference is and then calculate the number of days between that date and today.
Action: PyCon and PyData DE 2023 Website Index
Action Input: PyCon DE & PyData Berlin 2023[0m[38;5;200m[1;3mPyConDE & PyData Berlin 2023 Tickets 
Read more 
Jan 13, 2023 by Organisers 
Community Voting on Submissions 
Read more 
Dec 14, 2022 by Organisers 
Financial Aid Programme 
Read more 
Dec 11, 2022 by Organisers 
Call For Proposals 
Read more 
Nov 18, 2022 by Organisers 
Call for Sponsors 
Read more 
Nov 1, 2022 by ORGANISERS 
Call for Conference Chairs, Committee Members & Volunteers 
Read more 
Register now;🎉 🤗 April 17-19 🤗 🎉 
PyCon DE & PyData Berlin 
2023 
Thanks to the European Union and Europäischen Fonds für regionale Entwicklung for their support.
Dieses Vorhaben wurde als Teil der Reaktion der Union auf die Covid-19-Pandemie finanziert.
See you soon!
Check the Blog and FAQ for the most recent infos. 
About 
What can I expect from the conference? 



[32;1m[1;3m I now know the number of days between today and the conference.
Final Answer: The PyCon DE & PyData Berlin 2023 conference is 739 days away from today (April 8, 2021).[0m

[1m> Finished chain.[0m
The PyCon DE & PyData Berlin [1;36m2023[0m conference is [1;36m739[0m days away from today [1m([0mApril [1;36m8[0m, [1;36m2021[0m[1m)[0m.


Yay! We have the correct date. Just a minor problem with todays date. Maybe we just need to give it a better tool for finding out todays date? And then we are done?



In [None]:

tools = load_tools(["serpapi", "llm-math"], llm=llm_agent)
tools.append(
    Tool(
        name="PyCon and PyData DE 2023 Website Index",
        func=lambda q: ";".join([x.page_content for x in retriever.get_relevant_documents(q)]),
        description=f"Useful when you want answer questions about PyCon and PyData DE 2023.",
    ),
    
)

date_finder = load_tools(["llm-math"], llm=llm_agent)[0]
date_finder.name = 'A date finder.'
date_finder.description = 'Useful for when you need to calculate any date.'
tools.append(date_finder)

print(tools)

index_agent_with_date_finder = initialize_agent(tools, llm_agent, agent="zero-shot-react-description", verbose=True)
async def search_agent_with_index_and_date_finder_demo_():
    return index_agent_with_date_finder.run(
        question
    )

visualize(search_agent_with_index_and_date_finder_demo_)

[Tool(name='Search', description='A search engine. Useful for when you need to answer questions about current events. Input should be a search query.', args_schema=None, return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x104bc2250>, func=<bound method get_overridden_call.<locals>.overridden_call of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='0d123db3ed3687301055b015060fdb0287e675bf25c83131175557e6ed4ff9a6', aiosession=None)>, coroutine=<bound method SerpAPIWrapper.arun of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='0d123db3ed3687301055b015060fdb0287e675bf25c83131175557e6ed4ff9a6', aiosession=None)>), Tool(name='Calculator', description='Useful for when you need to



[1m> Entering new AgentExecutor chain...[0m


[32;1m[1;3m I need to find the date of the conference and then calculate the number of days between that date and today.
Action: PyCon and PyData DE 2023 Website Index
Action Input: Date of PyCon DE & PyData Berlin 2023[0m[38;5;200m[1;3mPyConDE & PyData Berlin 2023 Tickets 
Read more 
Jan 13, 2023 by Organisers 
Community Voting on Submissions 
Read more 
Dec 14, 2022 by Organisers 
Financial Aid Programme 
Read more 
Dec 11, 2022 by Organisers 
Call For Proposals 
Read more 
Nov 18, 2022 by Organisers 
Call for Sponsors 
Read more 
Nov 1, 2022 by ORGANISERS 
Call for Conference Chairs, Committee Members & Volunteers 
Read more 
Register now;🎉 🤗 April 17-19 🤗 🎉 
PyCon DE & PyData Berlin 
2023 
Thanks to the European Union and Europäischen Fonds für regionale Entwicklung for their support.
Dieses Vorhaben wurde als Teil der Reaktion der Union auf die Covid-19-Pandemie finanziert.
See you soon!
Check the Blog and FAQ for the most recent infos. 
About 
What can I expect from the conf

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/lib/python3.9/site-packages/langchain/chains/llm_math/base.py", line 59, in _evaluate_expression
    numexpr.evaluate(
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/lib/python3.9/site-packages/numexpr/necompiler.py", line 817, in evaluate
    _names_cache[expr_key] = getExprNames(ex, context)
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/lib/python3.9/site-packages/numexpr/necompiler.py", line 704, in getExprNames
    ex = stringToExpression(text, {}, context)
  File "/Users/levkonstantinovskiy/Documents/GitHub/langchain-agent-qa/lib/python3.9/site-packages/numexpr/necompiler.py", line 289, in stringToExpression
    ex = eval(c, names)
  File "<expr>", line 1, in <module>
AttributeError: 'VariableNode' object has no attribute 'today'

During handling of the above exception, another exception occurred:



Now it messed up the conference date again. However todays date is now correct.
Weird!

Not even clear how to fix this dark magic.... :(

# Self - Evaluation

If we know a correct answer, then GPT is really good at grading predictions and doing root cause analysis.

In [15]:
outputs = ["-2", "2", "blue sky", output]

correct_answer = "On 17 April. There are -2 days"
predictions = [{"input": question, "answer": correct_answer, "output": output} for output in outputs]
dataset = [{"input": d['input'], "answer": d['answer'] } for d in predictions ] 


from langchain.evaluation.qa import QAEvalChain
from langchain.prompts import PromptTemplate

eval_template = """You are a teacher grading a quiz.
You are given a question, the context the question is about, and the student's answer. You are asked to score the student's answer as either CORRECT or INCORRECT, based on the context.
Write out in a step by step manner your reasoning to be sure that your conclusion is correct. Avoid simply stating the correct answer at the outset.

Example Format:
QUESTION: question here
CORRECT ANSWER: correct answer
STUDENT ANSWER: student's answer here
EXPLANATION: step by step reasoning here
GRADE: Grade from 0 to 10, where 0 is the lowest (very low similarity) and 10 is the highest (very high similarity)

Grade the student answers based ONLY on their factual accuracy. Ignore differences in punctuation and phrasing between the student answer and true answer. It is OK if the student answer contains more information than the true answer, as long as it does not contain any conflicting statements. Begin! 



QUESTION: {query}
CORRECT ANSWER: {answer}
STUDENT ANSWER: {result}
EXPLANATION:"""
eval_prompt_template = PromptTemplate(
    input_variables=["query", "answer",  "result"], template=eval_template
)


In [16]:

eval_chain = QAEvalChain.from_llm(llm_eval, prompt=eval_prompt_template)
graded_outputs = eval_chain.evaluate(dataset, predictions, question_key="input", prediction_key="output")




{'text': " The student's answer is correct, as the PyCon DE & PyData Berlin 2023 conference is on 17 April, and there are -2 days between that date and today.\nGRADE: 10"}

Fair grade for its own answer.

In [20]:
print(graded_outputs[3]['text'])

 
1. Check the date of the conference: The correct answer is 17 April, while the student answer is 16 January. 
2. Check the number of days between the conference and today: The correct answer is -2 days, while the student answer is 45017 days. 
GRADE: 0


For more ways to evaluate QA see https://github.com/PineappleExpress808/auto-evaluator