### Building a Basic RAG Application


#### 1. Importing Libraries

TextLoader helps load your text data from files.
CharacterTextSplitter breaks large text into smaller chunks.
OllamaEmbeddings converts text chunks into numerical vectors (embeddings).
FAISS is a vector database that stores these embeddings and allows fast similarity search.

In [10]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import FAISS


#### 2. Load and Split Data

<i>loader = TextLoader('data.txt')<i>
This creates a loader to read the contents of a file called data.txt. This file should contain your source text or documents.

<i>docs = loader.load()<i>
This actually reads the file and loads the text into a variable called docs. Now docs holds the entire content as a document object.

<i>text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20)<i>
Here, we create a splitter that will divide the big text into smaller pieces (chunks) of 100 characters each. The chunk_overlap=20 means that each chunk will share 20 characters with the previous chunk to keep context.

<i>splits = text_splitter.split_documents(docs) <i>
This applies the splitter to the loaded documents, producing a list of smaller text chunks stored in splits. These smaller chunks are easier to embed and search.



In [11]:
loader = TextLoader('data.txt')
docs = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20)
splits = text_splitter.split_documents(docs)


#### 3. Creating Embeddings and Vector Store


embeddings = OllamaEmbeddings(model="hf.co/CompendiumLabs/bge-base-en-v1.5-gguf")
This initializes an embedding model from Ollama. The model converts text chunks into vectors (arrays of numbers) that capture the meaning of the text. Here, we specify a particular embedding model hosted on Hugging Face.

vectorstore = FAISS.from_documents(splits, embeddings)
This line creates a FAISS vector store (database) by:

Taking each chunk in splits.

Converting it into an embedding using the embeddings model.

Storing all these embeddings in FAISS for fast similarity search later.

vectorstore.save_local("faiss_index")
This saves the built vector database locally on your machine (or Colab environment) in a folder named "faiss_index". This way, you can reuse the index without rebuilding it every time.

In [12]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = FAISS.from_documents(splits, embeddings)
vectorstore.save_local("faiss_index")


#### 4. Setting up the Query Pipeline

from langchain.chains import RetrievalQA
Imports a special chain (pipeline) that combines retrieval and question answering.

from langchain.llms import Ollama
Imports the Ollama wrapper to access a language model for generation.

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
Converts the FAISS vector store into a retriever object.

k=3 means when you ask a question, it will fetch the top 3 most relevant chunks from the vector store.

llm = Ollama(model="hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF")
Initializes the language model (LLM) that will generate answers. Here, it uses a 1-billion parameter Llama 3 model fine-tuned for instructions, hosted on Hugging Face.

qa_chain = RetrievalQA.from_chain_type(...)
Creates a RetrievalQA chain that:

Uses the retriever to find relevant documents.

Passes those documents along with the user query to the llm.

The chain_type="stuff" means it concatenates all retrieved documents into a single prompt for the LLM.

In [15]:
from langchain.chains import RetrievalQA
from langchain.llms import Ollama

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
llm = Ollama(model="llama2")

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type="stuff"
)


#### 5. Running a Query

query = "How many bones do cats have?"
This is the question you want to ask your RAG system.

result = qa_chain({"query": query})
This sends the query to the qa_chain. Internally, it:

Converts the query to an embedding.

Retrieves the top 3 relevant chunks from the vector database.

Passes the query + retrieved chunks to the LLM.

Gets the generated answer.

print(result['result'])
Prints the answer generated by the LLM based on the retrieved information.



In [None]:
query = "Tell me four facts about cats"
result = qa_chain({"query": query})
print(result['result'])
