## Simple Gen AI APP Using LangChain

In [13]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGCHAIN_API_KEY')
os.environ['LANGCHAIN_TRACING_V2'] = "true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

In [14]:
## Data Ingestion -- From the Website we need to scrape the data
from langchain_community.document_loaders import WebBaseLoader

In [15]:
loader = WebBaseLoader("https://python.langchain.com/docs/introduction/")
docs = loader.load()
docs

[Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='\n\n\n\n\nIntroduction | 🦜️🔗 LangChain\n\n\n\n\n\n\nSkip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1💬SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval Augmented Generation (RAG) App: Part 2Build an Extraction ChainBuild an AgentTaggingBuild a Retrieval Augmented Generation (RAG) App: Part 1Build a semantic search engineBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow-to guidesHow to use tools in a chainHow to use a vectorstore as a retrieverHow to add memor

## Workflow for Text Processing and Vectorization

1. **Load Data**  
   Load the necessary data from the source files or databases.

2. **Documentation (Docs)**  
   Gather and prepare the relevant documents for processing.

3. **Divide Text into Chunks**  
   Split the text into manageable chunks for efficient processing.

4. **Convert Text into Vectors using Embedding**  
   Use an embedding model to transform the text chunks into numerical vectors.

5. **Store Vectors in a Vector Database (Vector Store DB)**  
   Save the generated vectors in a vector database for future retrieval or analysis.


In [None]:
## Load Data --> Docs --> Divide our Documents into Chunks Documents --> Convert Text into Vectors using Embedding --> Vector Store DB
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
documents = text_splitter.split_documents(docs)

In [17]:
documents

[Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='Introduction | 🦜️🔗 LangChain'),
 Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='Skip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1💬SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval Augmented Generation (RAG) App: Part 2Build an Extraction ChainBuild an AgentTaggingBuild a 

In [18]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [19]:
from langchain_community.vectorstores import FAISS
vectorstoredb = FAISS.from_documents(documents,embeddings)

In [20]:
vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x1ecfeec2a10>

In [23]:
## Query From a vector DB
#query = "LangChain simplifies every stage of the LLM application lifecycle"
#result = vectorstoredb.similarity_search(query)
#result[0].page_content

In [21]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

In [22]:
## Retrieval Chain , Documents Chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    """
Answer the following question based only on the provided context:
<context>
{context}
</context>    
    """
)

documents_chain = create_stuff_documents_chain(llm,prompt)
documents_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the following question based only on the provided context:\n<context>\n{context}\n</context>    \n    '), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x000001ECF1BC6F90>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x000001ECF1780FD0>, root_client=<openai.OpenAI object at 0x000001ECF1B6AA90>, root_async_client=<openai.AsyncOpenAI object at 0x000001ECF3430950>, model_name='gpt-4o', model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser(), kwargs={}, config={'run_name': 'stuf

In [25]:
#from langchain_core.documents import Document

#documents_chain.invoke({
#    "input":"LangChain simplifies every stage of the LLM application lifecycle",
#    "context":[Document(page_content="LangChain simplifies every stage of the LLM application lifecycle")]
#})

However, We want the documents to first come from the retriever we just set up. That way, we can use the retriever to dynamically select the most relevant documents and pass those in for a given question.

In [26]:
### Input --> Retriever ---> vectorstroredb
vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x1ecfeec2a10>

In [27]:
retriever = vectorstoredb.as_retriever()
from langchain.chains import create_retrieval_chain
retrieval_chain=create_retrieval_chain(retriever,documents_chain)

In [28]:
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001ECFEEC2A10>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the following question based only on the provided context:\n<context>\n{context}\n</context>    \n    '), additional_kwargs={})])
          

In [29]:
## Get the Response from the LLM
response = retrieval_chain.invoke({"input":"LangChain simplifies every stage of the LLM application lifecycle"})
response['answer']

'LangChain is a framework designed for developing applications powered by large language models (LLMs). It simplifies the lifecycle of LLM applications by providing open-source libraries and tools for building, integrating, and deploying these applications. Key components include `langchain-core` for base abstractions, integration packages for connecting with platforms like OpenAI, and `LangGraph` for creating stateful multi-actor applications. LangSmith is a developer platform within the framework for debugging, testing, and monitoring LLM applications to ensure they are optimized and production-ready.'

In [30]:
response['context']

[Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='LangChain is a framework for developing applications powered by large language models (LLMs).\nLangChain simplifies every stage of the LLM application lifecycle:'),
 Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='Skip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1💬SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt t

In [31]:
print("The End")

The End
