<a href="https://colab.research.google.com/github/zeeba-tech/LLM/blob/main/NLP_RAG_CHAT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install necessary library and vector database

In [16]:
!pip install --upgrade langchain  faiss-cpu sentence-transformers transformers  --q

Step1:Specify a DocumentLoader to load in your unstructured data as Documents. A Document is a piece of text (the page_content) and associated metadata.
we use WebBaseLoader to load our document

In [17]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

**Step 2**:Split the Document into chunks for embedding and vector storage.

In [18]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

In [19]:
#!git lfs install
#!git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

**Step 3**:

To be able to look up our document splits, we first need to store them where we can later look them up. The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store, with the embedding being used to index the document.
we use hugging face embedding model "all-MiniLM-L6-v2" to convert our word into emmbeddings and store in vector database faiss

In [20]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings':False}
embeddings = HuggingFaceEmbeddings(
  model_name = "all-MiniLM-L6-v2",
  model_kwargs = model_kwargs,
  encode_kwargs=encode_kwargs
)

In [21]:
#faiss = FAISS.from_texts(all_splits, embeddings)
faiss=FAISS.from_documents(all_splits, embeddings)

**Step 4**:**Retrivers**

Retrieve relevant splits for any question using similarity search.

In [22]:
question = "What are the approaches to Task Decomposition?"
docs = faiss.similarity_search(question)
len(docs)

4

**Step 5:Generate**

Distill the retrieved documents into an answer using an LLM/Chat model (e.g.,TheBloke/Llama-2-13B-Chat-GGML) with RetrievalQA chain.

In [23]:
!CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers -q

In [24]:
from langchain.llms import CTransformers

In [10]:
#!git lfs install
#!git clone https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML

In [25]:
llm = CTransformers(model='TheBloke/Llama-2-13B-Chat-GGML',temperature=0.0)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

In [26]:
from langchain.prompts import PromptTemplate

In [27]:
## Default LLaMA-2 prompt style
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

def get_prompt(instruction, new_system_prompt=DEFAULT_SYSTEM_PROMPT ):
    SYSTEM_PROMPT = B_SYS + new_system_prompt + E_SYS
    prompt_template =  B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt_template

sys_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. """

instruction = """CONTEXT:/n/n {context}/n

Question: {question}"""
get_prompt(instruction, sys_prompt)

"[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. \n<</SYS>>\n\nCONTEXT:/n/n {context}/n\n\nQuestion: {question}[/INST]"

In [28]:
prompt_template = get_prompt(instruction, sys_prompt)

llama_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

chain_type_kwargs = {"prompt": llama_prompt}

In [29]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm,retriever=faiss.as_retriever(),chain_type_kwargs=chain_type_kwargs)
qa_chain({"query": question})

{'query': 'What are the approaches to Task Decomposition?',
 'result': '  There are three main approaches to task decomposition:\n\n1. LLM with simple prompting, such as "Steps for XYZ. 1.", "What are the subgoals for achieving XYZ?"\n2. Using task-specific instructions, such as "Write a story outline" for writing a novel.\n3. With human inputs.'}

Some common ways to improve on vector similarity search include:

MultiQueryRetriever generates variants of the input question to improve retrieval

In [31]:
import logging
from langchain.retrievers.multi_query import MultiQueryRetriever

logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=faiss.as_retriever(),
                                                  llm=llm)
question=["1.what is MIPS?","2.what is FAISS?","3. can yo tell about Generative Agents Simulation?"]
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['User Question: What are the similarities and differences between MIPS and FAISS regarding feature selection for generative agents simulation?', '', '---', '', 'Please provide three alternative versions of the user question that can be used to retrieve relevant documents from a vector database. Each version should highlight different aspects of the comparison between MIPS and FAISS. ', '', 'And please explain what each version is capturing.']


16

In [32]:
unique_docs

[Document(page_content='Fig. 9. Comparison of MIPS algorithms, measured in recall@10. (Image source: Google Blog, 2020)\nCheck more MIPS algorithms and performance comparison in ann-benchmarks.com.\nComponent Three: Tool Use#\nTool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent