<a href="https://colab.research.google.com/github/vanderbilt-data-science/ai_summer/blob/main/week3_langchain_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Langchain

These examples are mostly from LangChain's [QuickStart Guide](https://python.langchain.com/en/latest/getting_started/getting_started.html)

## Import Libraries

If you are running this in Google Colab, you will need to install the libraries using the cell below:

In [1]:
# Download required packages to Colab
!pip install -q langchain
!pip install -q openai
!pip install -q chromadb
!pip install -q tiktoken
!pip install -q duckduckgo-search

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m858.2/858.2 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.6/149.6 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.1/49.1 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py)

In [2]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, RetrievalQA
from langchain import ConversationChain
from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.tools import DuckDuckGoSearchRun
from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
import os
from getpass import getpass

### OpenAI API key
Get an OpenAI [API token](https://platform.openai.com/docs/api-reference) and save it somewhere secure. The below cell uses the package `getpass` to allow you to enter the API key securely (like you would a password).

*Note: this cell will not finish running until you enter the API key*

In [3]:
openai_api_key = getpass()
os.environ["OPENAI_API_KEY"] = openai_api_key


··········


## Get predictions from language model

### Choose your model
LangChain has a list of available [models](https://python.langchain.com/en/latest/modules/models.html). Choose the one that is best for your application. In this case, we are using OpenAI because we have the API key for it.

In [4]:
llm = OpenAI()

In [6]:
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))



Brightly Toes Socks


## Manage Prompts for LLM

LangChain allows you to format prompts to better control user input using the [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/prompt_templates/getting_started.html) class.

In [7]:
prompt = PromptTemplate(
    input_variables = ["product"],
    template = "What is a good name for a company that makes {product}?"
)

You can use the `format` method to fill in a value for the parameter in brackets.

In [8]:
new_prompt = prompt.format(product="modern clothing")
print(new_prompt)

What is a good name for a company that makes modern clothing?


In [9]:
print(llm(new_prompt))



Urban Boutique.


## Using chains to create a workflow

[Chains](https://python.langchain.com/en/latest/modules/chains/getting_started.html) allow you to combine models and prompts for efficient workflows. This allows you to experiment with different models and prompts, all while using the same chain workflow. The base chain is LLMChain.

In [10]:
chain = LLMChain(llm = llm, prompt = prompt)

In [11]:
chain.run("colorful socks")

'\n\nWackyWearz Socks.'

Use `chain.apply()` to apply a list of inputs to the model.

In [12]:
input_list = [
    {"product": "socks"},
    {"product": "computer"},
    {"product": "shoes"}
]

chain.apply(input_list)

[{'text': '\n\nSockItAway.'},
 {'text': '\n\nTechsters Inc.'},
 {'text': '\n\nFootFashions.'}]

### Breakout Exercise (10 mins)
Change the Prompt Template to reflect a different question. Use the LLMChain to run a prediction. 

Next, if you have time, create a list of inputs and use the `apply` method to apply a list of inputs to your model. 

Some example questions if you are having trouble thinking of more:

- "What is a good title for a nonfiction book about {topic}?"
- "Give me a recommendation for a {genre} movie."


In [13]:
## YOUR CODE HERE 

## Chain for Memory and State Changes

Use special chains (such as the ConversationChain) to allow your model to remember past inputs. Note that each chain has its own set of methods to send input (in this case, `predict` instead of `run`)

In [14]:
conversation = ConversationChain(llm = llm, verbose = True)

In [15]:
conversation.predict(input = "Hi there!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m


' Hi! How can I help you?'

In [16]:
conversation.predict(input = "My name is Myranda.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi! How can I help you?
Human: My name is Myranda.
AI:[0m

[1m> Finished chain.[0m


' Nice to meet you, Myranda! Is there anything I can do for you today?'

## Other Types of Chains 
There are many chains supported on Langchain for many different tasks. We have just used the general LLMChain and ConversationChain. Read over the complete list of supported chains [here](https://python.langchain.com/en/latest/modules/chains/how_to_guides.html).

If you can't find the chain you're looking for, try making your own [custom chain](https://python.langchain.com/en/latest/modules/chains/generic/custom_chain.html)!

### Breakout Exercise (10 mins)
In your breakout rooms, look over the list of supported chains on Langchain and read their documentation. Do you find any of these chains particularly revelant to your project/interests? Share with your group and discuss what you could use these chains for! Or, describe how you would create a custom chain for your task.

If you would like (and have time), try experimenting with different chains in the code cell below.

In [17]:
## Optional: Experiment with a chain in this cell
## Note: some chains require additional inputs, such as a database

## Agents to Dynamically Call Chains

Agents allow the model to make decisions on the tools needed to respond to user input. The below example uses an agent in combination with the search engine DuckDuckGo API and the math tool llm-math. The model can then use these tools to output the proper response. 

### Tools and Agents

Choose your tools (see available tools [here](https://python.langchain.com/en/latest/modules/agents/tools.html)). In this example we will use ddg-search (to access DuckDuckGo search engine) and llm-math (for mathematical operations). The load_tools function allows you to attach the tools to the LLM model.

In [40]:
# Language model already loaded in previous cells
# Now we add tools to the llm

tools = load_tools(["ddg-search", "llm-math"], llm = llm)

Next, we initialize our Agent by passing the `initialize_agent` function the tools, model, and agent type. You can read more about [agent types](https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html) and their purposes, but for now we will use a general purpose one called ZERO_SHOT_REACT_DESCRIPTION. This Agent decides which tools to use based on the tools' built-in descriptions.

In [41]:
# initialize agent with tools, llm, and agent type
agent = initialize_agent(tools, llm, agent = AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose = True)

In [43]:
agent.run("What was the high temperature in Nashville yesterday in Fahrenheit? What is that number raised to the 0.23 power?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use DuckDuckGo Search to find out the temperature in Nashville.
Action: DuckDuckGo Search
Action Input: High temperature in Nashville yesterday in Fahrenheit[0m
Observation: [36;1m[1;3mNashville Temperature Yesterday. Maximum temperature yesterday: 76 °F (at 12:53 pm) Minimum temperature yesterday: 67 °F (at 3:53 am) Average temperature yesterday: 71 °F. High & Low Weather Summary for the Past Weeks Temperature Humidity Pressure; High: 86 °F (May 7, 1:53 pm) 94% (May 7, 8:23 pm) Current Weather. 6:26 PM. 71° F. RealFeel® Sun 70°. RealFeel Shade™ 70°. Air Quality Fair. Wind NW 5 mph. Wind Gusts 11 mph. Mostly cloudy More Details. The high temperature in Nashville has hit the century mark for the first time in 10 years according to the National Weather Service. ... At 2:44 p.m., Nashville's temperature reached 101°. This ... Temperatures this morning are starting off in the mid to upper 40's which is 20 degrees war

'The temperature in Nashville yesterday raised to the 0.23 power is 2.7076163393689234.'

### EXTRA Breakout Exercise (10 mins)
Using the agent we created, come up with a new question to ask the model. What decisions did the agent make? Did it arrive at the right answer? Be prepared to share interesting results!

If you have time, play around with the tools you give the agent by adding new tools or taking away previous tools. How do its answers change? 

## Indexes

Besides sending a prompt as input to a model, we can also give our model documents for analysis. However, sending an entire document as prompt quickly uses up the maximum context length. To solve this, we use indexes. 

[Indexes](https://python.langchain.com/en/latest/modules/indexes/getting_started.html) are a more memory-efficient way of storing documents that allow Langchain to look up relevant information from the document and only pass that snippet to the model. There are four main components of an Index: **Document Loader** (loading the document), **Text Splitter** (splitting text into smaller chunks), **Vector Store** (storing texts as vectors), and **Retriever** (retrieving the relevant text). There are several ways to implement this which we will further explore later. For now, let's look at an example.  

In [22]:
# Download example text file to Colab
!wget https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt

--2023-05-18 21:40:42--  https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39027 (38K) [text/plain]
Saving to: ‘state_of_the_union.txt’


2023-05-18 21:40:42 (10.8 MB/s) - ‘state_of_the_union.txt’ saved [39027/39027]



### Load Documents
Langchain has many types of Document Loaders, including ones for text files, PDFs, and Python files. In this example, we are working with a text file, so we will use the TextLoader. 

If you look at the cell toward the beginning of the notebook where we imported libraries, you will see we have already imported the TextLoader with the line `from langchain.document_loaders import TextLoader`. 

In [23]:
loader = TextLoader('./state_of_the_union.txt', encoding='utf8')

### Quick Start: VectorStoreIndexCreator

Langchain includes the class `VectorStoreIndexCreator`, which does exactly what the name suggests: it creates the index vector store for you. This function combines the Text Splitter, Vector Store, and Retriever all in one step. Just send it your document loader, and it will return the index created from that document. 

In [29]:
index = VectorstoreIndexCreator().from_loaders([loader])



We will explore exactly what this function does later (and customize our own index), but for now, let's look at how we can use the index we've created. The VectorStoreIndexCreator comes with a method called `query` that lets us pass a prompt to the model.

Now we can use the `query` method to send a prompt to the model.

In [30]:
prompt = "What did the president say about Ketanji Brown Jackson"
index.query(prompt)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

### More Detail: Text Splitter, Vector Store, and Retriever

#### First Step: Document Loader

First, load your documents as you did previously.

In [31]:
loader = TextLoader('./state_of_the_union.txt', encoding='utf8')

#### Second Step: Text Splitter

Next, we need to split up the text into smaller chunks, which will later be mapped to an embedding.

In [32]:
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

#### Third Step: Vector Store

Now you create the Vector Store. We are using one called ChromaDB because it is free to use, and doesn't require an API. There are many other [Vector Stores](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html) worth looking up, however. 

We aren't actually sending the text itself to the vector store. Instead, we send it the model embeddings. In this case, we want to use the OpenAI embeddings. Chroma allows us to upload the embeddings by passing it the split texts we've created as well as the embedding function.

In [33]:
embeddings = OpenAIEmbeddings()

In [34]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(texts, embeddings)



#### Fourth Step: Retriever

Now, we need to expose our database as a retriever so that our model can find relevant information efficiently.

In [35]:
retriever = db.as_retriever()

#### Fifth Step: Create a chain to use our Index!

Now we need to choose a chain to interact with our index. A useful chain for asking questions over a Vector Store is the RetrievalQA chain. 

In [36]:
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

In [37]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

As you can see, VectorStoreIndexCreator wrapped up a lot of the hard work for us! You can see there are a lot of individual parts that can be customized for this process. 

## Finally: AI-assisted Programming with Langchain

We  have covered a lot in this notebook! You could easily spend hours reading all the documentation on Langchain. This is where we could use some AI-assisted programming! Unfortunately, most of the LLMs (like Bard, BingGPT, or ChatGPT) don't have recent enough information (as of writing this notebook...) to know about Langchain.

*But*... what if we could *use Langchain to solve this problem*?? 

**Class Question**: What could be a good workflow to create a bot that could answer questions about Langchain? Tell me your ideas!!

### REVEAL ANSWER

How about this:
1. Use Langchain to create an Index of Langchain documentation.
2. Implement a QA chain to ask questions about the documentation.

*In fact, we already did this! (And so did Langchain...)*

### Introducing: Langchain Assistant

We at the DSI have created a bot to answer Langchain questions and it hosted on HuggingFace: [try it out](https://huggingface.co/spaces/vanderbilt-dsi/langchain-assistant).

Because Langchain is so new and very fast-paced, Langchain developers have also created a chatbot for Langchain documentation (and yes, it's also built with Langchain). To access their bot, go to the [Langchain Documentation](https://python.langchain.com/en/latest/index.html) and click the parrot emoji at the bottom right (or type CTRL-K/CMD-K).

### Breakout Exercise (10 mins)
Try out either the DSI's Langchain Assistant, or Langchain's own version on their website. What tasks would you like to do with Langchain? Does the bot produce code/answers that are helpful? Does one bot perform better than the other (don't worry, you won't hurt our feelings).

*Note: these bots work best with a prompt along the lines of "write Python code that uses Langchain to {do something}"*

If you have time, look at the underlying code for one of these bots ([DSI app](https://huggingface.co/spaces/vanderbilt-dsi/langchain-assistant/blob/main/app.py) or Chat Langchain's [document loader](https://github.com/hwchase17/chat-langchain/blob/master/ingest.py) and [data query](https://github.com/hwchase17/chat-langchain/blob/master/query_data.py). What do you notice about it? Do you recognize any of the concepts we've talked about at AI Summer?