### Hybrid Retriever - Combining Dense And Sparse Retriever

In [1]:
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
from langchain.schema import Document

In [2]:
# Step 1: Sample documents
docs = [
    Document(page_content="LangChain helps build LLM applications."),
    Document(page_content="Pinecone is a vector database for semantic search."),
    Document(page_content="The Eiffel Tower is located in Paris."),
    Document(page_content="Langchain can be used to develop agentic ai application."),
    Document(page_content="Langchain has many types of retrievers.")
]

# Step 2: Dense Retriever (FAISS + HuggingFace)
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
dense_vectorstore = FAISS.from_documents(docs, embedding_model)
dense_retriever = dense_vectorstore.as_retriever()

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
###Sparse Retriever(BM25)
sparse_retriever = BM25Retriever.from_documents(docs)
sparse_retriever.k = 3 ##top- k documents to retriever

### step4 : Combine with Ensemble Retriever
hybrid_retriever = EnsembleRetriever(
    retrievers=[dense_retriever, sparse_retriever], 
    weights=[0.7,0.3]
)

In [5]:
hybrid_retriever

EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000197800FD940>, search_kwargs={}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x00000197800FF620>, k=3)], weights=[0.7, 0.3])

In [None]:
### Step 6: Query and get results 
query= "How can I build an application using LLM?"
results =hybrid_retriever.invoke(query)

# step 6. Print results
for i, doc in enumerate(results):
    print(f"\n Document {i+1}:\n {doc.page_content}")


 Document 1:
 LangChain helps build LLM applications.

 Document 2:
 Langchain can be used to develop agentic ai application.

 Document 3:
 Langchain has many types of retrievers.

 Document 4:
 Pinecone is a vector database for semantic search.


### RAG pipeline with Hybrid Retriever

In [8]:
from langchain.chat_models import ChatOpenAI, init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

In [12]:
# Step 5 : Prompt Template 
prompt = PromptTemplate.from_template("""
Answer the question based on the context below.
                                      
Context:
{context}

Question:{input}                                                                         
""")

#step6 - llm
llm = init_chat_model("openai:gpt-4o-mini")
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x00000197837B6490>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x00000197837B6850>, root_client=<openai.OpenAI object at 0x00000197837B6210>, root_async_client=<openai.AsyncOpenAI object at 0x00000197837B65D0>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********'))

In [15]:
### Create stuff document chain
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)


###create Full rag chain
rag_chain = create_retrieval_chain(retriever=hybrid_retriever, combine_docs_chain=document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000197800FD940>, search_kwargs={}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x00000197800FF620>, k=3)], weights=[0.7, 0.3]), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nAnswer the question based on the context below.\n\nContext:\n{context}\n\nQuestion:{input}                                      

In [16]:
## Step 9: Ask a question
query = {"input":"How can I build an app using LLM?"}
response = rag_chain.invoke(query)

#step 10:Outut
print("Answer:\n", response["answer"])
print("\n Source Documnets:")

for i, doc in enumerate(response["context"]):
    print(f"\n Doc {i+1}: {doc.page_content}")

Answer:
 To build an app using a Large Language Model (LLM) with LangChain, you can follow these general steps:

1. **Define Your Application**: Determine the main purpose of your application and how you want to leverage the LLM. Think about the interactions and features you want to include.

2. **Set Up LangChain**: Integrate LangChain into your environment. Install the necessary packages and ensure you have access to an appropriate LLM.

3. **Choose a Retriever**: Depending on your application's requirements, select one of the many types of retrievers offered by LangChain. Consider using a vector database like Pinecone for efficient semantic search if your application involves retrieving relevant information.

4. **Design the Workflow**: Develop the workflow of your application, focusing on how the LLM will interact with user inputs and how the retriever will fetch relevant data.

5. **Implement Agentic AI Features**: If desired, use LangChain to develop agentic AI capabilities that 