# Unveiling the Power of Retrieval Augmented Generation: a comparison between RAGs and Fine-Tuned LLMs
Here we explore how to make a Llama based on a RAG system to produce an output in a specific structure that is as good as the output of a fine-tuned Llama

### Import Libraries

In [1]:
from langchain.docstore.document import Document
from langchain.document_loaders import HuggingFaceDatasetLoader

from encoder.encoder import Encoder
from generator.generator import Generator
from retriever.vector_db import VectorDatabase

### Define Global Variables

In [2]:
TEMPLATE = """
Use the following pieces of context to answer the question at the end. 
{context}
Question: {question}
Answer:
"""

QUERY = "Given a puzzle-like code question, provide a well-reasoned, step-by-step Python solution. Write a function step-by-step that reverses a linked list."

### Load Dataset and preprocess it

In [None]:
# load dataset
loader = HuggingFaceDatasetLoader("luisroque/instruct-python-llama2-20k", "text")
docs = loader.load()

# preprocess dataset where train only contains answers while test only contains questions
train = [
    Document(page_content=x.page_content.split("[/INST]")[1]) for x in docs[:-1000]
]
test = [Document(page_content=x.page_content.split("[/INST]")[0]) for x in docs[-1000:]]

### Initiate our RAG modules

In [4]:
# initiate our classes for the Encoder, Retriever and Generator
encoder = Encoder()
faiss_db = VectorDatabase()
generator = Generator(TEMPLATE)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /Users/rafael/Documents/large-language-models/rag/model/nous-hermes-llama-2-7b.Q4_0.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn

### Create and store passages in a vector database

In [5]:
# Create passages and store them in a vector DB
passages = faiss_db.create_passages_from_documents(train)
faiss_db.store_passages_db(passages, encoder.encoder)

Created a chunk of size 1462, which is longer than the specified 1000
Created a chunk of size 2219, which is longer than the specified 1000
Created a chunk of size 1004, which is longer than the specified 1000
Created a chunk of size 1093, which is longer than the specified 1000
Created a chunk of size 1135, which is longer than the specified 1000
Created a chunk of size 1290, which is longer than the specified 1000
Created a chunk of size 1111, which is longer than the specified 1000
Created a chunk of size 1253, which is longer than the specified 1000
Created a chunk of size 1091, which is longer than the specified 1000
Created a chunk of size 1210, which is longer than the specified 1000
Created a chunk of size 1177, which is longer than the specified 1000
Created a chunk of size 2244, which is longer than the specified 1000
Created a chunk of size 1451, which is longer than the specified 1000
Created a chunk of size 1475, which is longer than the specified 1000
Created a chunk of s

### Retrieve most similar document
 - which is not useful for our use case, therefore, we created our own context

In [None]:
# retrive most similar document to our query
context = faiss_db.retrieve_most_similar_document(QUERY)
print(context)

'list.reverse() modifies the list in-place, returns None. But if you want to protect old list, you can use reversed() function for that, it returns an iterator.\nIn [1]: a=[1,2,3,4]\n\nIn [2]: print(a.reverse())\nNone\n\nIn [3]: a\nOut[3]: [4, 3, 2, 1]\n\nIn [4]: a=[1,2,3,4]\n\nIn [5]: print(reversed(a))\n<listreverseiterator object at 0x24e7e50>\n\nIn [6]: list(reversed(a))\nOut[6]: [4, 3, 2, 1]\n\nIn [7]: a\nOut[7]: [1, 2, 3, 4]\n\n </s>'

In [None]:
context = """
Write Python Function:
def dummy(arg):
    arg += 1
    return arg

Write Explanation:
1. We define the dummy function that receives an argument arg
2. We add 1 to arg
3. And we return arg plus 1
"""
print(generator.get_answer(context, QUERY))

Llama.generate: prefix-match hit


The function should be able to take a ListNode as input and return a reversed version of the linked list. Here is one possible implementation in Python using a recursive algorithm:
```python
class ListNode: 
    def __init__(self, data): 
        self.data = data 
        self.next = None 
    
def reverseList(head): 
    if head == None or head.next == None : 
        return head 
    
    temp = reverseList(head.next) 
    head.next.next = head 
    head.next = None 
    
    return temp
```
This function takes a `ListNode` as input and returns the reversed version of the linked list. The implementation uses recursion to traverse the linked list in reverse order, swapping the next pointers of consecutive nodes until the entire list has been traversed in reverse order. This implementation is not optimized for large data sets but should work well for small lists.



llama_print_timings:        load time =    1466.63 ms
llama_print_timings:      sample time =      65.47 ms /   213 runs   (    0.31 ms per token,  3253.20 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   47740.32 ms /   213 runs   (  224.13 ms per token,     4.46 tokens per second)
llama_print_timings:       total time =   48507.01 ms
