In [None]:
## Data Ingestion which is loading the data

from langchain_community.document_loaders import TextLoader # community.document_loaders has all the functions and features to load any sort of data
loader= TextLoader("speech.txt")
text_documents= loader.load()
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='\nThe world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, no dominion. We seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. We are but one of the champions of the rights of mankind. We shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them.\n\nJust because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.\n\n…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness 

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OLLAMA_MODEL"]= "gemma3:1b" #the thing is that ollama models dont require api keys, so we can directly use them. so we dont have to define an api ley while writing them



In [6]:
# web based loader

from langchain_community.document_loaders import WebBaseLoader
import bs4

# load chunk and index the content of the html page

loader= WebBaseLoader(web_path=("https://lilianweng.github.io/posts/2025-05-01-thinking/",),
                      bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                          class_=("post-title", "post-content", "post-header")
                      )),)

text_documents=loader.load()

In [7]:
text_documents

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2025-05-01-thinking/'}, page_content='\n\n      Why We Think\n    \nDate: May 1, 2025  |  Estimated Reading Time: 40 min  |  Author: Lilian Weng\n\n\nSpecial thanks to John Schulman for a lot of super valuable feedback and direct edits on this post.\nTest time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research questions. This post aims to review recent developments in how to effectively use test-time compute (i.e. “thinking time”) and why it helps.\nMotivation#\nEnabling models to think for longer can be motivated in a few different ways.\nAnalogy to Psychology#\nThe core idea is deeply connected to how humans think. We humans cannot immediately provide the answer for "What\'s 12345 times 56789?". Rather, it is natural to spend time pondering and analyzing b

In [8]:
from langchain_community.document_loaders import PyPDFLoader
loader= PyPDFLoader("attention.pdf")
docs= loader.load()



In [9]:
docs

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-08-03T00:07:29+00:00', 'author': '', 'keywords': '', 'moddate': '2023-08-03T00:07:29+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszk

This ends the DataIngestion (Load Data Step) of the RAG Pipeline. We now move to the transform part of the pipeline, which is to convert the huge data into chunks. We will perform that with the trnasform part


In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter= RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)
documents[:5]



[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-08-03T00:07:29+00:00', 'author': '', 'keywords': '', 'moddate': '2023-08-03T00:07:29+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszk

In [12]:
documents

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-08-03T00:07:29+00:00', 'author': '', 'keywords': '', 'moddate': '2023-08-03T00:07:29+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszk

THIS WAS THE TRNASFORM METHOD WHERE WE CONVERT TEXTUAL DATA TO CHUNKS. NOW WE MOVE ON TO THE NEXT STEP WHICH IS EMBED. THIS CONVERTS THESE CHUNKS INTO VECTORS. FOR THIS WE WILL ALSO BE USING DIFFERENT VECTOR EMBEDDING TECHNIQUES. LATER ON THOSE VECTORS HAVE TO BE STORED SOMEHWERE AS WELL. IN A PLACE CALLED THE VECTOR STORE IS WHERE WE STORE THE VECTORS

In [None]:
# Chroma Vector Embedding - techniques to convert text chunks into vectors

from langchain_ollama import OllamaEmbeddings # we use ollama embeddings
from langchain_community.vectorstores import Chroma # we use chroma as our vector store where we store the vectors
emb = OllamaEmbeddings(model="bge-m3") 
db = Chroma.from_documents(documents[:20],embedding=emb)

In [18]:
# Vector database. We write queries to retrieve the relevant data from the vector store, database
query = "What is Latent Variable Modelling"
result = db.similarity_search(query)
result[0].page_content

'1 Introduction\nRecurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks\nin particular, have been firmly established as state of the art approaches in sequence modeling and\ntransduction problems such as language modeling and machine translation [ 35, 2, 5]. Numerous\nefforts have since continued to push the boundaries of recurrent language models and encoder-decoder\narchitectures [38, 24, 15].\nRecurrent models typically factor computation along the symbol positions of the input and output\nsequences. Aligning the positions to steps in computation time, they generate a sequence of hidden\nstates ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently\nsequential nature precludes parallelization within training examples, which becomes critical at longer\nsequence lengths, as memory constraints limit batching across examples. Recent work has achieved'

In [21]:
# We use the Faiss vector store data base now
from langchain_community.vectorstores import FAISS
db1 = FAISS.from_documents(documents,embedding=emb)

In [22]:
query = "What is Latent Variable Modelling"
result = db.similarity_search(query)
result[0].page_content

'1 Introduction\nRecurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks\nin particular, have been firmly established as state of the art approaches in sequence modeling and\ntransduction problems such as language modeling and machine translation [ 35, 2, 5]. Numerous\nefforts have since continued to push the boundaries of recurrent language models and encoder-decoder\narchitectures [38, 24, 15].\nRecurrent models typically factor computation along the symbol positions of the input and output\nsequences. Aligning the positions to steps in computation time, they generate a sequence of hidden\nstates ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently\nsequential nature precludes parallelization within training examples, which becomes critical at longer\nsequence lengths, as memory constraints limit batching across examples. Recent work has achieved'