<a href="https://colab.research.google.com/github/taisazero/langchain-crashcourse/blob/main/notebooks/3-retrieval_aug_gen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Lab Activity 2: Retrieval Augmented Generation

In [None]:
!pip install -q lancedb

In [20]:
import lancedb
from langchain.vectorstores import LanceDB
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import LanceDB
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

In [None]:
!pip install gdown

In [None]:
import gdown
gdown.download_folder(
    "https://drive.google.com/drive/folders/1SkI0ttpMNTVHp6ear6cTLDooRmtqvmVo?usp=sharing",
      quiet=True,
      output="./") 

In [None]:
db = lancedb.connect('.lancedb')
table = db.open_table('hf_docs')
# TODO: Create an OpenAIEmbeddings object and a LanceDB vectorstore instantiated with the table and OpenAIEmbeddings.
embedding_fn = OpenAIEmbeddings(chunk_size=200)
vectorstore = LanceDB(table, embedding_fn)

## Experiment: Retrieval-Augmented Generation

TODO: Write a python snippet that creates an llm using the ChatOpenAI object with a temperature of 0, and prompt it with the question:

> "Can you describe PEFT from the transformers library?"

Store this question in a variable called `query`.

Then, print the output of the llm.

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
#TODO: Create a RetrievalQA object with the ChatOpenAI model, the vectorstore's retriever, and a chain_type of "stuff".
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(temperature=0, model='gpt-3.5-turbo-16k')
    , chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs=dict(k=5)),
      verbose=True, return_source_documents = True)

In [None]:
# TODO: Ask a question about the transformers library.
query = "Can you describe PEFT from the transformers library?"
answer = qa({"query": query})

In [None]:
answer

In [None]:
print(answer['result'])

## Experiment: Vanilla Generation

TODO: Write a python snippet that creates an llm using the ChatOpenAI object with a temperature of 0, and prompt it with `query`. Then, print the output of the llm. `query` is a string that contains the query you want to prompt the llm with and it should ask the language model a question related to the Huggingface library.

In [None]:
# TODO: Write your code here.
llm=ChatOpenAI(temperature=0, model='gpt-3.5-turbo-16k')
print(llm.predict(query))

## Observations and Insights
Write your observations and insights here. What do you think about the quality and correctness of the generated text from the two experiments?

## More Prompts
Now try querying the vanilla llm and the `RetrievalQA` chain with different questions related to the [**Huggingface** libraries](https://huggingface.co/docs/transformers/index). What do you observe? You are given two prompts below to try out first. Add your own prompts in new cells and try them out.

In [None]:
query = "Can you give me a crash course on using the HF Trainer to train a GPT-style model?"
answer = qa({"query": query})
print(answer['result'])

In [None]:
query = "Can you give me a crash course on using the HF Trainer to train a GPT-style model? Provide code snippets for each step."
answer = qa({"query": query})
print(answer['result'])

In [None]:
query = "Can you give me a crash course on using the HF Trainer to train a GPT-style model?"
print(llm.predict(query))

In [None]:
"Can you give me a crash course on using the HF Trainer to train a GPT-style model? Provide code snippets for each step."
print(llm.predict(query))

## Increasing the Temperature

In [None]:
llm=ChatOpenAI(temperature=0.8, model='gpt-3.5-turbo-16k')
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs=dict(k=5)),
      verbose=True, return_source_documents = True)

In [None]:
#TODO: Add your queries/questions here.
query = "Can you give me a crash course on using the HF Trainer to train a GPT-style model? Provide code snippets for each step."
answer = qa({"query": query})
print(answer['result'])
llm.predict(query)

Now try querying the vanilla llm and the `RetrievalQA` chain with the same questions after we increased the temperature. What do you notice?

In [None]:
#TODO: Add the queries here.

## Observations and Insights
Write your observations and insights here.
Food for thought:
* What was the impact of the instruction/prompt on the quality of the generated text?
* What did you observe when increasing the temperature of the llm in the retrieval-augmented generation setting?
* What do you think are the advantages and disadvantages of using retrieval-augmented generation?

## Using Internet Search Engines to Augment Retrieval

Hypothesis: This will be more useful than the previous method for questions that require a lot of background knowledge from multiple sources, such as "what is the best way to train a language model?", "how do I ... do in HF" and "crash course" requests. This approach will allow us to retrieve more relevant documents from the internet and cite them in our answer. 

In [None]:
!pip install -q google-api-python-client
!pip install -q chromadb

## Setup

1. To create an API key: Go to https://console.cloud.google.com/welcome/new and sign in/up using your personal Google account. 
   - Navigate to the APIs & Services→Credentials panel in Cloud Console. 
   - Select Create credentials, then select API key from the drop-down menu. 
   - The API key created dialog box displays your newly created key. 
   - You now have an API_KEY. Store in it in your `.env` file as `GOOGLE_API_KEY`.

2. Setup Custom Search Engine so you can search the entire web 
   - Create a custom search engine in this [link](http://www.google.com/cse/). 
   -  In Sites to search, select search the entire web. Select also enable SafeSearch and fill in your search engine name (it can be anything).
   - Click on customize.
   - Under Search engine ID you’ll find the search-engine-ID. Copy that and add it to your `.env` file as `GOOGLE_CSE_ID`.

3. Enable the Custom Search API 
   - Go to this [URL](https://console.cloud.google.com/apis/library/customsearch.googleapis.com) & click on Enable.
   - Alternatively, navigate to the APIs & Services→Dashboard panel in Cloud Console. 
   - Click Enable APIs and Services. 
   - Search for Custom Search API and click on it. 
   - Click Enable. 

**Note**: Adapted from [GoogleSearchAPIWrapper Docs](https://api.python.langchain.com/en/latest/utilities/langchain.utilities.google_search.GoogleSearchAPIWrapper.html)

In [21]:
#TODO: Re-load the environment variables again
_ = load_dotenv(find_dotenv())

In [22]:
from langchain.retrievers.web_research import WebResearchRetriever
from langchain.vectorstores import Chroma
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI

# Vectorstore
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(), persist_directory="./chromadb_oai")

# LLM
llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo-16k', max_tokens=512)

search = GoogleSearchAPIWrapper()

In [23]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm, 
    search=search,
    
)

In [None]:
from langchain.chains import RetrievalQAWithSourcesChain
query = "How can I finetune a Llama2 7b model on Google Colab?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm, retriever=web_research_retriever)
result = qa_chain({"question": query})
result

In [None]:
from langchain.chains import RetrievalQAWithSourcesChain
query = "What are some recent news articles about OpenAI?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm, retriever=web_research_retriever)
result = qa_chain({"question": query})
result

## Observations and Insights
Write any observations here.
Food for thought:
* Compare and contrast this approach with the previous huggingface question answering chain. What are the limitations of each approach? Can the previous approach search the entire web? Can this approach generate precise answers?
* If we want to be able to retrieve more relevant documents from the internet but also generate complete answers to the user's question without having the user to read the retrieved documents. What are somethings we can do to make this approach do so? 

In [None]:
%%writefile ret_aug_app.py
from chainlit.config import config
import chainlit as cl
from langchain.memory import ConversationBufferMemory
from langchain.agents import AgentExecutor
from langchain.agents.initialize import initialize_agent
from langchain.agents import load_tools
from chainlit import on_message, on_chat_start
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentType
from langchain.tools import Tool
import lancedb
from langchain.vectorstores import LanceDB
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import LanceDB
from langchain.chains import RetrievalQA


#TODO: load environment variables from your .env file
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

@cl.on_chat_start
async def start():
    # TODO: initialize LLM (we use ChatOpenAI because we'll later define a `chat` agent), 
    # Hint: How can you enable live generation of the agent's messages?
    # BEGIN Writing
    llm = ChatOpenAI(
        temperature= 0,
        model_name='gpt-3.5-turbo-16k',
        max_tokens= 750,
        streaming=True
    )
    #TODO: create a memory, load tools, and initialize an agent with the type `CHAT_CONVERSATIONAL_REACT_DESCRIPTION`
    # Tip: Set `return_messages` to `True` in the memory so that the agent returns the messages it generates
    conversational_memory = ConversationBufferMemory(
        llm=llm,
        input_key='input',
        memory_key='chat_history',
        max_token_limit=250,
        return_messages=True,
    )
    #TODO: load the following tools ['llm-math', 'terminal', 'python_repl', 'serpapi']
    tools = load_tools(['llm-math', 'terminal', 'python_repl', 'serpapi'], llm=llm)
    #TODO: connect to `.lancedb` and open the `hf_docs` table, then create a LanceDB vectorstore with the table 
    # and the OpenAIEmbeddings object with chunk_size 200.
    db = lancedb.connect('.lancedb')
    table = db.open_table('hf_docs')
    embedding_fn = OpenAIEmbeddings(chunk_size=200)
    vectorstore = LanceDB(table, embedding_fn)
    #TODO: initialize your retrievalQA chain using the LanceDB `vectorstore` and the `llm` you initialized above
    # pass in chain_type of "stuff", and retrieve the top 5 documents.
    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs=dict(k=5)),
        verbose=True
    )
    # TODO: Create a tool that uses the `qa` chain we just initialized
    # Hint: Use the Tool.from_function method
    hf_docs_qa_tool = Tool.from_function(
        qa.run, name="huggingface documentation search",
        description="Use when you need to search the Hugging Face documentation for an answer to a question."
    )
    # We add the tool to the tools list
    tools.append(hf_docs_qa_tool)
    #TODO: initialize agent
    # Hint: There is a parameter that allows you to set the maximum number of iterations, 
        # this is useful to limit the number of messages the agent generates and hence $ spent on OAI credits.
    # Hint: CHAT_CONVERSATIONAL_REACT_DESCRIPTION needs the memory to return its messages.
    # Hint: How could you enable the agent to adapt to parsing issues it encounters?
    agent = initialize_agent(
        llm=llm,
        memory=conversational_memory,
        tools=tools,
        verbose=True,
        agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, 
        handle_parsing_errors=True,
        max_iterations=3,
    )
    
    # END Writing
    # create agentexecutor
    cl.user_session.set("agent", agent)

@cl.on_message
async def on_message(message):
    agent = cl.user_session.get("agent")
    response = await cl.make_async(agent.run)(
        input=message, callbacks=[cl.LangchainCallbackHandler()]
    )
    await cl.Message(content=response).send()


Now, let's serve the chainlit web interface with our agent. Run the cell below to start the chainlit web interface. You should see a link to the web interface. Click on the link to open the web interface in a new tab.

In [None]:
# Note if this cell says the port is already in use, you can change the port number to something else e.g. 8001 and so on.
!chainlit run ret_aug_app.py --port 8001

## Colab Setup
If you are using Colab do not run the cell above, instead run this cell below. It will output a link and an ip-address. Use the link to access your chainlit UI application. It will first prompt you to enter the ip-address, enter the ip-address and click on connect. You should now be able to see the chainlit UI.

In [None]:
!npm install localtunnel
!chainlit run /content/ret_aug_app.py &>/content/logs.txt & npx localtunnel --port 8000 & curl ipv4.icanhazip.com

## Additional Tasks
If you finish early, you can try to complete the following tasks:
* Insert a prompt to the RetrievalQAChain that instructs the model to formulate search engine optimal questions given the user query. Use the generated questions to query the search engine and retrieve documents. Try different prompts and instructions.
* Add your best RetrievalQAChain to your agent as a tool from the last lab activity. Hint: Look into defining custom tools [langchain documentation](https://python.langchain.com/docs/modules/agents/tools/custom_tools).
* Re-run the experiments with Llama2, compare and contrast the outputs based on correctness, helpfulness and conciseness.

## References
* https://lancedb.github.io/lancedb/notebooks/code_qa_bot/
* https://github.com/langchain-ai/chat-langchain/tree/master