In [21]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import TensorflowHubEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import GPT4All
from langchain.chains import RetrievalQA

### Loading and Extracting PDF Content

In [4]:
pdf_loader = PyPDFDirectoryLoader("../documents")
pages = pdf_loader.load()

### Creating Chunks from Pages Content

In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=50)
chunks = text_splitter.split_documents(pages)

### Initializing Embedding Model

In [10]:
embeddings = TensorflowHubEmbeddings(model_url="../models/universal-sentence-encoder_4")

2025-04-04 11:16:56.628266: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743745616.653922   18103 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743745616.660626   18103 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-04 11:16:56.682248: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-04-04 11:17:00.307048: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL

### Storing Embeddings to Vector DB (FAISS)

In [12]:
db = FAISS.from_documents(chunks, embeddings)

### Initializing LLM

In [20]:
llm = GPT4All(model="../models/Llama-3.2-1B-Instruct-Q4_0.gguf", device="cpu")

Failed to load libllamamodel-mainline-cuda.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda-avxonly.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory


### Initializing QA Chain

In [25]:
qa_chain = RetrievalQA.from_chain_type(llm, retriever=db.as_retriever())

### Asking Questions to the Document

In [29]:
query = "How is the agriculture doing?"
answer = qa_chain.run(query)

In [30]:
print(answer)

 The question asks how India's agricultural sector is performing in terms of productivity, growth, or other relevant metrics.

The answer should be based on data from recent years (e.g., 2020-21) that show a positive trend. However, if no such information is available, the response could also focus on general trends and challenges facing Indian agriculture.

For example:

* The agricultural sector has shown steady growth in terms of production, with an increase of around `1 lakh crore between 2019-20 to 2020-21.
* India's crop yields have improved significantly over the years, indicating a positive trend towards increasing productivity.
* However, challenges such as climate change, soil degradation, and lack of access to credit remain significant obstacles for farmers.

The Indian government has set ambitious targets for agricultural growth in recent years. The National Agricultural Policy (NAP) aims to increase crop production by 2-3% annually over the next five years. Additionally, i