##### Retriever and chain with Langchain 

In [1]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("../attention.pdf")
docs = loader.load()
docs[0]

Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On E

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(docs)
documents[:5]

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [4]:
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

db=FAISS.from_documents(documents[:30], OpenAIEmbeddings(
    base_url="https://openrouter.ai/api/v1"))


In [5]:
db

<langchain_community.vectorstores.faiss.FAISS at 0x1aa6f24e270>

In [6]:
query="What is attention mechanism?"
result = db.similarity_search(query)
result[0].page_content

'Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser ∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallelizable and requiring signiﬁcantl

In [7]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="openai/gpt-oss-20b:free", base_url="https://openrouter.ai/api/v1")
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001AA6F2D27B0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001AA6F2D2BA0>, root_client=<openai.OpenAI object at 0x000001AA6F4847D0>, root_async_client=<openai.AsyncOpenAI object at 0x000001AA6F484E10>, model_name='openai/gpt-oss-20b:free', model_kwargs={}, openai_api_key=SecretStr('**********'), openai_api_base='https://openrouter.ai/api/v1')

In [35]:
## Defining our own a prompt template
from langchain_core.prompts import ChatPromptTemplate
prompts= ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context.
Think step by step before providing a detailed answer.
I will tip you 10000$ if the user finds your answer helpful!
<context>
{context}
</context>
Question: {input}""")

In [33]:
##introducing chain 
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
document_chain  = create_stuff_documents_chain(llm,prompts)

In [11]:
#intoducing retriever
retriever=db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001AA6F24E270>, search_kwargs={})

In [39]:
# combining a retriever and chain to create a RAG system
from langchain_classic.chains import create_retrieval_chain
retriever_chain = create_retrieval_chain( retriever, document_chain)

In [40]:
response=retriever_chain.invoke({"input":"what is attention mechanism"})

In [41]:
response['answer']

'### Attention Mechanism – What It Is, in the Context of the Transformer Paper\n\n#### 1. **Short Overview**\n\nAn *attention mechanism* is a neural network component that lets a model **focus on specific parts of an input or output sequence when computing a representation for a position**. Instead of treating every element of the sequence equally, the mechanism assigns a “weight” to each element, indicating how much it should influence the current computation.\n\n---\n\n#### 2. Step‑by‑step Breakdown Using the Transformer Context\n\n| Step | What Happens | Why It Matters |\n|------|--------------|----------------|\n| **1. Input → Linear Transformations** | For each token in a sequence, the model learns three *vectors*: a **query** \\(q\\), a **key** \\(k\\), and a **value** \\(v\\) (via learned weight matrices). | These are the ingredients into which attention can be expressed mathematically. |\n| **2. Compute Similarity (Dot Product)** | The similarity between the query of a target p