### Building Gen AI App using Langchain
##### This LLM should respond with answers from specific website.
```https://docs.smith.langchain.com/administration/tutorials/manage_spend```


In [2]:
###  1. Load all the keys 

import os 
from dotenv import load_dotenv
# to load all the environment variables 
load_dotenv()                   
os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")

# for langsmith tracking purpose
os.environ['LANGCHAIN_API_KEY']=os.getenv("LANGCHAIN_API_KEY")

# Tracing need to enabled
os.environ['LANGSMITH_TRACING_V2']="true"

os.environ['LANGCHAIN_PROJECT']=os.getenv("LANGCHAIN_PROJECT")
project_name=os.getenv("LANGCHAIN_PROJECT")
print(project_name)


GenAIAppWithOpenAI


In [None]:
### 2. Install all the required libraries from requirements.txt file using "pip install -r requirements.txt"
    # ipykernel
    # python-dotenv
    # langchain
    # langchain-community
    # langchain_community 
    # pypdf
    # bs4
    # faiss-cpu

In [44]:
### 3. Data Ingestion: From the website need to scrape the data
# Install langchain_community for document loaders using requirements.txt file 
# URL= https://docs.smith.langchain.com/administration/tutorials/manage_spend
# WebBaseLoader() use bs4 internally to scrape the websites 

from langchain_community.document_loaders import WebBaseLoader
url="https://docs.langchain.com/langsmith/billing"
loader=WebBaseLoader(url)
loader

# load the loader 
docs=loader.load()
docs



[Document(metadata={'source': 'https://docs.langchain.com/langsmith/billing', 'title': 'Manage billing in your account - Docs by LangChain', 'language': 'en'}, page_content='Manage billing in your account - Docs by LangChainSkip to main contentDocs by LangChain home pageLangSmithSearch...⌘KAsk AIGitHubTry LangSmithTry LangSmithSearch...NavigationAccount administrationManage billing in your accountGet startedObservabilityEvaluationPrompt engineeringDeploymentAgent BuilderPlatform setupReferenceOverviewPlansCreate an account and API keyAccount administrationOverviewSet up a workspaceManage organizations using the APIManage billingSet up resource tagsUser managementAdditional resourcesPolly (Beta)Data managementAccess control & AuthenticationScalability & resilienceFAQsRegions FAQPricing FAQLangSmith statusOn this pageSet up billing for your accountDeveloper Plan: set up billing on your personal organizationPlus Plan: set up billing on a shared organizationUpdate your information (Paid pl

In [45]:
### 4. Divide the docs (previous step) into chunks using TextSplitters

# import RecursiveCharacterTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter
# assign to any variable with necessary chunk size and chunk overlap
text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=100)
# split the documents after chunking and assign to a variable
documents=text_splitter.split_documents(docs)
documents


[Document(metadata={'source': 'https://docs.langchain.com/langsmith/billing', 'title': 'Manage billing in your account - Docs by LangChain', 'language': 'en'}, page_content='Manage billing in your account - Docs by LangChainSkip to main contentDocs by LangChain home pageLangSmithSearch...⌘KAsk AIGitHubTry LangSmithTry LangSmithSearch...NavigationAccount administrationManage billing in your accountGet startedObservabilityEvaluationPrompt engineeringDeploymentAgent BuilderPlatform setupReferenceOverviewPlansCreate an account and API keyAccount administrationOverviewSet up a workspaceManage organizations using the APIManage billingSet up resource tagsUser managementAdditional resourcesPolly (Beta)Data managementAccess control & AuthenticationScalability & resilienceFAQsRegions FAQPricing FAQLangSmith statusOn this pageSet up billing for your accountDeveloper Plan: set up billing on your personal organizationPlus Plan: set up billing on a shared organizationUpdate your information (Paid pl

In [46]:
### 5. Convert all the documents (or text) into Vectors with the help of OpenAIEmbeddings

from langchain_openai import OpenAIEmbeddings
embeddings=OpenAIEmbeddings()

In [47]:
### 6. Store these embedding into FAISS Vector Database (install faiss-cpu using requirements.txt file)

from langchain_community.vectorstores import FAISS
vectorstoredb=FAISS.from_documents(documents,embeddings)
vectorstoredb


<langchain_community.vectorstores.faiss.FAISS at 0x16071317a70>

In [48]:
### 7. Query from vectorstoredb for the response
query="ILangSmith has two trace tiers: base traces and extended traces."
results=vectorstoredb.similarity_search(query)
results[0].page_content

'For organizations with multiple workspaces only: For simplicity, LangSmith incorporates the free traces into the cost calculation of the first workspace only. In actuality, the free traces can be “consumed” by any workspace. Therefore, although workspace-level spend limits are approximate for multi-workspace organizations, the organization-level spend limit is absolute.\n\u200bConfigure trace tier distrubution\nLangSmith has two trace tiers: base traces and extended traces. Base traces have the base retention and are short-lived (14 days), while extended traces have extended retention and are long-lived (400 days). For more information, refer to the data retention conceptual docs.'

In [50]:
### 7. Retrieval chain, Document chain - ask for meaningful context. We have to use these chains 

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# 1. Prompt template for documents
document_prompt = PromptTemplate(
    input_variables=["context", "input"],
    template="""
Use the following context to answer the question.
If the answer is not in the context, say "I don't know."

context:
{context}

input:
{input}

Answer:
"""
)

# 2. LLM
llm = ChatOpenAI(temperature=0)

# 3. Output parser
output_parser = StrOutputParser()

# 4. Document chain (document chain is responsible to give you the context)
document_chain = (
    {
        "input": RunnablePassthrough(),  # user giving input
        "context": RunnablePassthrough() # context which llm need to give back
    }
    | document_prompt
    | llm
    | output_parser
)

# 5. Invoke
from langchain_core.documents import Document
document_chain.invoke({
    
    "input": "LangSmith has two trace tiers: base traces and extended traces.",
    "context": [Document(page_content="LangSmith has two trace tiers: base traces and extended traces. Base traces have the base retention and are short-lived (14 days), while extended traces have extended retention and are long-lived (400 days). For more information, refer to the data retention conceptual docs.")]

})




'Base traces have the base retention and are short-lived (14 days), while extended traces have extended retention and are long-lived (400 days).'

In [51]:
document_chain

{
  input: RunnablePassthrough(),
  context: RunnablePassthrough()
}
| PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nUse the following context to answer the question.\nIf the answer is not in the context, say "I don\'t know."\n\ncontext:\n{context}\n\ninput:\n{input}\n\nAnswer:\n')
| ChatOpenAI(profile={'max_input_tokens': 16385, 'max_output_tokens': 4096, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': False, 'structured_output': False, 'image_url_inputs': False, 'pdf_inputs': False, 'pdf_tool_message': False, 'image_tool_message': False, 'tool_choice': True}, client=<openai.resources.chat.completions.completions.Completions object at 0x000001607132CBF0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001607132F110>, root_client=<openai.OpenAI object 

In [52]:
# However we want the documents first come from the retriever we just setup. That way, we can use the retriever to dynamically
# select most relevant documents and pass those in for a given question.

### Retriever (Input -->Retriever -->Vectorstoredb)
# pass the input to retriever and get the response from vectorstoredb
vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x16071317a70>

In [53]:
# create retriever & retrieval chain
retriever=vectorstoredb.as_retriever()
from langchain_core.runnables import RunnablePassthrough
retrieval_chain = (
    {
        "context": retriever,
        "input": RunnablePassthrough()
    }
    | document_chain
)

retrieval_chain

{
  context: VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000016071317A70>, search_kwargs={}),
  input: RunnablePassthrough()
}
| {
    input: RunnablePassthrough(),
    context: RunnablePassthrough()
  }
| PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nUse the following context to answer the question.\nIf the answer is not in the context, say "I don\'t know."\n\ncontext:\n{context}\n\ninput:\n{input}\n\nAnswer:\n')
| ChatOpenAI(profile={'max_input_tokens': 16385, 'max_output_tokens': 4096, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': False, 'structured_output': False, 'image_url_inputs': False, 'pdf_inputs': False, 'pdf_tool_message': False, 'image_tool_message': False, 'tool_choice': True}, client=<openai.resou

In [56]:
### 8. Get the response from the LLM

response=retrieval_chain.invoke("LangSmith has two trace tiers: base traces and extended traces.")
print(response)

LangSmith has two trace tiers: base traces and extended traces. Base traces have the base retention and are short-lived (14 days), while extended traces have extended retention and are long-lived (400 days).


In [57]:
print(type(response))


<class 'str'>
