<a href="https://colab.research.google.com/github/xuan1905/misc/blob/main/memorag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This notebook demonstrates the usage of [MemoRAG](https://github.com/qhjqhj00/MemoRAG/tree/main), showcasing its capabilities for memory-augmented retrieval and generation.

### Please install dependencies first.

In [None]:
!pip install memorag==0.1.3
!pip install faiss-gpu # please install faiss using conda to obtain the latest version. Here using pip as example
!pip install flash_attn
!pip install -U bitsandbytes

##Downloading model files from HuggingFace may take a few minutes. Please be patient while the files are being downloaded.

In [None]:
from memorag import MemoRAG

pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    beacon_ratio=16,
    load_in_4bit=True,
    enable_flash_attn=False # T4 GPU does not support flash attention
)

[2024-09-09 06:29:42,463] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


## Load example text or use your own data.
### For this demonstration, we are using half of the book’s content to accommodate limited GPU memory. Feel free to experiment with your own data as well.



In [None]:
import requests
import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")

url = 'https://raw.githubusercontent.com/qhjqhj00/MemoRAG/main/examples/harry_potter.txt'
response = requests.get(url)
content = response.text

print(f"The raw database has {len(encoding.encode(content))} tokens...")

small_part = " ".join(content.split()[:50000])
print(f"Using part of the database: with {len(encoding.encode(small_part))} tokens...")

The raw database has 122591 tokens...
Using part of the database: with 67574 tokens...


### Forming memory for a long context can be slow (a few minutes) when using the free T4 GPU. **You can skip this step** and use the next code block to download pre-cached memory instead.

In [None]:
# pipe.memorize(small_part, save_dir="content/harry_potter/", print_stats=True)

## The following codes download the pre-cached memory.

In [None]:
import requests
import tarfile
import os

url = 'https://huggingface.co/datasets/TommyChien/MemoRAG-data/resolve/main/hp_qwen2.tar.bz2'

download_path = '/content/hp_qwen2.tar.bz2'
extract_path = '/content/'

response = requests.get(url, stream=True)
if response.status_code == 200:
    with open(download_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
    print(f"File downloaded successfully: {download_path}")
else:
    print(f"Failed to download file: {response.status_code}")

if os.path.exists(download_path):
    with tarfile.open(download_path, 'r:bz2') as tar:
        tar.extractall(path=extract_path)
    print(f"File extracted successfully to: {extract_path}")
else:
    print("Downloaded file not found.")

File downloaded successfully: /content/hp_qwen2.tar.bz2
File extracted successfully to: /content/


### The following codes load the downloaded pre-cached memory.

In [None]:
import time
start = time.time()
pipe.load("/content/harry_potter_qwen2_ratio16", print_stats=True)
print(f"Loading from cache takes {round(time.time()-start,2)} for the full book.")

  self.memory = torch.load(path)


Memory file size: 0.24 GB
Number of chunks in retrieval corpus: 136
Loading from cache takes 2.54 for the full book.


### In the following, we perform the QA task and retrieval task.

In [None]:
# perform QA task

query = "What's the theme of the book?"

res = pipe(context=small_part, query=query, task_type="qa", max_new_tokens=256)
print(f"Using memory to produce the answer: \n{res} \n\n")
res = pipe(context=small_part, query=query, task_type="memorag", max_new_tokens=256)
print(f"Using MemoRAG to produce the answer: \n{res[0]}")



Using memory to produce the answer: 
The theme of the book is Harry Potter's adventures at Hogwarts School of Witchcraft and Wizardry, his friendship with Ron Weasley and Hermione Granger, and his battle against Lord Voldemort. 


Using MemoRAG to produce the answer: 
The theme of the book is the struggle between good and evil, represented by Harry Potter and Voldemort respectively, and the importance of friendship and loyalty in Hogwarts School of Witchcraft and Wizardry.


In [None]:
# perform retrieval task

clues = pipe.mem_model.rewrite(query).split("\n")
clues = [q for q in clues if len(q.split()) > 3]  # Filter out short or irrelevant clues
print("Clues generated from memory:\n", clues)

# Retrieve relevant passages based on the recalled clues
retrieved_passages = pipe._retrieve(clues)
print("Retrieved passages:")
print("\n======\n".join(retrieved_passages[:3]))



Clues generated from memory:
 ['What magical events occur at Hogwarts School during the school year described in the book?', 'What challenges do Harry Potter and his friends face at Hogwarts School?', 'What significant magical artifacts are mentioned in the book?', 'What role does the Chamber of Secrets play in the story?', 'How does Harry Potter discover his ability to communicate with snakes?', "What is the significance of Salazar Slytherin's legacy at Hogwarts?", 'What is the relationship between Harry Potter and the Chamber of Secrets?', 'What is the impact of the Chamber of Secrets on the students at Hogwarts?', "How does Harry Potter's past experiences influence his actions during the events described in the book?"]
Retrieved passages:
For the first couple of weeks back, Harry had enjoyed muttering nonsense words under his breath and watching Dudley tearing out of the room as fast as his fat legs would carry him. But the long silence from Ron and Hermione had made Harry feel so c