# Building a RAG application from scratch using Langchain

Let's start by loading the environment variables we need to use.

In [28]:
# import packages
import os
from dotenv import load_dotenv
import config as cfg
load_dotenv()

# set keys and fixed variables
OPENAI_API_KEY = cfg.openai['key']
PINECONE_API_KEY = cfg.pinecone['key']
# YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"

## Langchain

[LangChain](https://python.langchain.com/docs/get_started/introduction) is a framework for developing applications powered by language models. It enables applications that are:
* Context Aware
* Reason

For example, to build a RAG system, we can do the following easily:
1. Loading - to load documents to add context
2. Indexing - break down the information into an LLM consumable form
3. Storing - store loaded documents for later use
4. Querying - query from the indexed documents
5. Evaluation - objective measurement of accuracy, speed and faithfulness of the responses

Using LCEL (Langchain Expression Language), we can build complex chains from basic components.


## How do we setup a Chain in Langchain?

We can now chain the prompt with the model and the output parser.

<img src='images/chain2.png' width="1200">

#### Prompt

In [29]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Mary's sister is Susana", question="Who is Mary's sister?")

'Human: \nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: Mary\'s sister is Susana\n\nQuestion: Who is Mary\'s sister?\n'

#### model

In [30]:
from langchain_openai.chat_models import ChatOpenAI
model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

#### parser

In [31]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

In [32]:
chain = prompt | model | parser
chain.invoke({
    "context": "Mary's sister is Susana",
    "question": "Who is Mary's sister?"
})

'Susana'

#### Same input into ChatGPT directly
https://chat.openai.com/share/ea21e649-7f6c-49be-aafe-b993339dbce4

## How does RAG - retrieval augmented generation - work?
1. Embed
2. Search for similar embeddings
3. Send most relevant embeddings as context

Given a particular question, we need to find the relevant chunks from the transcription to send to the model. Here is where the idea of **embeddings** comes into play.

To provide with the most relevant chunks, we can use the embeddings of the question and the chunks of the transcription to compute the similarity between them. We can then select the chunks with the highest similarity to the question and use them as the context for the model:

<img src='images/system3.png' width="1200">

Let's generate embeddings for an arbitrary query:

### Embedding

[Cohere's Embed Playground](https://dashboard.cohere.com/playground/embed)

An embedding is a mathematical representation of the semantic meaning of a word, sentence, or document. It's a projection of a concept in a high-dimensional space. Embeddings have a simple characteristic: The projection of related concepts will be close to each other, while concepts with different meanings will lie far away. You can use the Cohere's Embed Playground to visualize embeddings in two dimensions.

<img src='images/cohere.png' width="1200">

To learn in detail about how embeddings - definitions, calculations, embedding models, being fed into transformer models etc., read Vicki Boykis' 

[Explanatory paper on Embeddings](https://vickiboykis.com/what_are_embeddings/)

#### Setup Embeddings

In [33]:
from langchain_openai.embeddings import OpenAIEmbeddings

# embeddings = OpenAIEmbeddings(openai_api_key = cfg.openai['key'])
embeddings = OpenAIEmbeddings(openai_api_key = OPENAI_API_KEY)
embedded_query = embeddings.embed_query("What is ServiceNow like?")

print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

Embedding length: 1536
[0.008223684762386727, -0.013549694284807609, 0.0005700972575568463, 0.00029737736099228585, -0.03381981806661732, 0.004217274153541164, -0.019467482953493903, -0.027970052269590436, -0.011971617492755784, -0.01582158033966224]


### Set up the documents

In [18]:
from langchain_community.document_loaders import DirectoryLoader

directory = DirectoryLoader('autofocus')

In [19]:
documents = directory.load()

In [47]:
documents[1]

Document(page_content="Now honestly you can't go more than about 60 seconds in New York City without seeing a Cadillac escalade or something like that. And it's not that they're popular to drive or buy. It's just that they're popular to ride in like as Uber blacks, limousines, rideshare services, like these things are everywhere. And I've been saying they should make an electric escalade because all it just matches up too well. Like what do you expect from an escalade when you get in that thing? You're like okay it's an escalade. This should be big. It should be luxurious. It should be smooth. It should be quiet. And it'll probably be pretty expensive. And what do you expect out of a brand new electric vehicle like an SUV? Well it'll probably be big and heavy but also it should be smooth. It should be quiet. And it'll probably be expensive. Like it just matches too well. And so guess what? I just got back from the city where GM's stood there unveiling of the first electric escalade. It

## Setting up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a **vector store**.

A vector store is a database of embeddings that specializes in fast similarity searches. 

<img src='images/system4.png' width="1200">

To understand how a vector store works, let's create one in memory and add a few embeddings to it:

In [34]:
from langchain_community.vectorstores import DocArrayInMemorySearch

In [35]:
vectorstore_autofocus = DocArrayInMemorySearch.from_documents(documents, embeddings)

In [36]:
vectorstore_autofocus.similarity_search_with_score(query="Which EVs are best?", k=10)

[(Document(page_content="Okay, so here's the thing about a minivan. Everyone can agree on one thing, which is a minivan is very practical, very livable. If you need to go on a trip or if you have a lot of people you're taking with you, a big family, car seats, minivans are like the most livable thing, but nobody really wants a minivan. That's beyond us. Like you get one out of practicality, but you're not really desiring a minivan. But what if you could get a minivan in disguise? What if you could get a practical 3-row SUV that has all the features of a minivan, but isn't a minivan? Now we're talking. So this here, this is the Kia EV9. It's actually two different specs of the EV9, but I've been living with this thing for about a week now, driving it every day, and I'm telling you right now, it's not missing anything. 3-row, fully electric SUV, full size, lots of space, and all the things that you think of when you think, okay, minivan, here's what I need, here's the space I need, the f

In [25]:
retriever_autofocus = vectorstore_autofocus.as_retriever()
retriever_autofocus.invoke("Which cars does Tesla make?")

[Document(page_content="All right, today was a fun day because I got to spend some time with two of Lotus' new electric cars. Yeah, that Lotus, the one that's known for making ultra light sports cars and things like that, the ones that made the chassis for the original Tesla Roadster, it's kind of the antithesis of what we know Lotus to be, Lotus is typically making the lightest weight stuff, but this is, it just shows how obvious the electrification of the future of cars is. They're going right to Electric, not even Hybrids. So there's two out, there's an SUV that's been out on the road for a couple months now, and then there's a car, a sedan that is brand new, just got revealed. So they both look really sick, and the names are both really tough, but the sedan is called the Emea, EME YA. I got to sit in it, look around it, take a first look, that's the Super Sporty one, but then the SUV I got to drive. It's a little bit taller, but that's called the Electra, and that's one that's the 

#### Retriever Setup

In [38]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever_autofocus, question=RunnablePassthrough())
setup.invoke("Which cars does Kia make?")

{'context': [Document(page_content="Okay, so here's the thing about a minivan. Everyone can agree on one thing, which is a minivan is very practical, very livable. If you need to go on a trip or if you have a lot of people you're taking with you, a big family, car seats, minivans are like the most livable thing, but nobody really wants a minivan. That's beyond us. Like you get one out of practicality, but you're not really desiring a minivan. But what if you could get a minivan in disguise? What if you could get a practical 3-row SUV that has all the features of a minivan, but isn't a minivan? Now we're talking. So this here, this is the Kia EV9. It's actually two different specs of the EV9, but I've been living with this thing for about a week now, driving it every day, and I'm telling you right now, it's not missing anything. 3-row, fully electric SUV, full size, lots of space, and all the things that you think of when you think, okay, minivan, here's what I need, here's the space I 

Let's now add the setup map to the chain and run it:



## Connecting the vector store to the chain

We can use the vector store to find the most relevant chunks from the transcription to send to the model. Here is how we can connect the vector store to the chain:

<img src='images/chain4.png' width="1200">

We need to configure a [Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/). The retriever will run a similarity search in the vector store and return the most similar documents back to the next step in the chain.

We can get a retriever directly from the vector store we created before: 

In [48]:
chain = setup | prompt | model | parser
chain.invoke("Which cars does Tesla make?")

'Tesla makes the Model 3, Model S, Model X, Model Y, and Cybertruck.'

Let's invoke the chain using another example:

In [41]:
answer = chain.invoke("Which EV is the best amongst those that Kia makes? And why? Answer with 3 bullets")

In [42]:
print(answer)

- The Kia EV9 is the best EV among those that Kia makes because it offers practicality, livability, and a full-size three-row SUV design.
- It features a 99.8 kilowatt-hour battery, providing an EPA estimated 270 miles of range, making it a compelling option.
- The Kia EV9 has a competitive starting price range, starting at $55,000 for the rear-wheel-drive version, making it an affordable choice for a fully electric three-row SUV.


In [45]:
answer = chain.invoke("Who is Meredith Machovoe?")

In [46]:
print(answer)

I don't know.


#### Pinecone

In [36]:
from pinecone import Pinecone

In [41]:
pc = Pinecone(api_key=cfg.pinecone['key'], environment = 'us-east-1')

In [42]:
index = pc.Index("auto-focus")

In [43]:
import os
os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY

In [None]:
from langchain_pinecone import PineconeVectorStore

index_name = "auto-focus"

pinecone = PineconeVectorStore.from_documents(
    documents, embeddings, index_name=index_name
)

In [None]:
pinecone.similarity_search("What is Tesla doing?")[:3]

In [None]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

answer = chain.invoke("Tell me about Merecedes' work on EVs. Answer in 3 bullet points.")

In [None]:
print(answer)

- Mercedes has been working on extremely efficient electric cars, such as the Vision EQXX, which achieved 1000 kilometers of range on one charge.
- They have specially designed tires with lower rolling resistance and aerodynamically efficient wheels to maximize efficiency.
- Mercedes has been focusing on weight savings, utilizing materials like recyclable materials and carbon fiber to reduce weight and increase efficiency.
