# CAGRA Demo with NVIDIA cuVS

Learn more about CAGRA [here](https://arxiv.org/pdf/2308.15136)!

In [1]:
!nvidia-smi

Thu Aug  8 21:15:04 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

Check which version of CUDA you are using, if 11.x -- you will need to use `pylibraft-cu11`.

In [2]:
!pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com
!pip install cupy==13.2.0
!pip install sentence-transformers==3.0.1

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting cuvs-cu12
  Downloading https://pypi.nvidia.com/cuvs-cu12/cuvs_cu12-24.8.0-cp310-cp310-manylinux_2_28_x86_64.whl (1127.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 GB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting pylibraft-cu12==24.8.* (from cuvs-cu12)
  Using cached https://pypi.nvidia.com/pylibraft-cu12/pylibraft_cu12-24.8.1-cp310-cp310-manylinux_2_28_x86_64.whl (783.8 MB)
Collecting rmm-cu12==24.8.* (from pylibraft-cu12==24.8.*->cuvs-cu12)
  Downloading https://pypi.nvidia.com/rmm-cu12/rmm_cu12-24.8.2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rmm-cu12, pylibraft-cu12, cuvs-cu12
  Attempting uninstall: rmm-cu12
    Found existing installation: rmm-cu12 24.4.0
    Uninstalling rmm-cu12-24.4.0

# Dataset

In [3]:
!wget "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip"

--2024-08-08 21:19:52--  https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
Resolving public.ukp.informatik.tu-darmstadt.de (public.ukp.informatik.tu-darmstadt.de)... 130.83.167.186
Connecting to public.ukp.informatik.tu-darmstadt.de (public.ukp.informatik.tu-darmstadt.de)|130.83.167.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2448432 (2.3M) [application/zip]
Saving to: ‘nfcorpus.zip’


2024-08-08 21:19:54 (2.27 MB/s) - ‘nfcorpus.zip’ saved [2448432/2448432]



In [4]:
!unzip nfcorpus.zip

Archive:  nfcorpus.zip
   creating: nfcorpus/
   creating: nfcorpus/qrels/
  inflating: nfcorpus/qrels/train.tsv  
  inflating: nfcorpus/qrels/test.tsv  
  inflating: nfcorpus/qrels/dev.tsv  
  inflating: nfcorpus/corpus.jsonl   
  inflating: nfcorpus/queries.jsonl  


In [5]:
import json

dataset = "nfcorpus"

with open(f"./{dataset}/corpus.jsonl", "r") as json_file:
  json_list = list(json_file)

error_counter = 0

corpus = []
org_docID_to_seq_docID = {} # We use this key-value to match the relevance labels for query-doc pairs

for idx in range(0, len(json_list), 1):
  result = json.loads(json_list[idx])
  new_doc_obj = {}
  new_doc_obj["document"] = result["text"]
  # This might give you trouble, some BEIR datasets are string keys, others int!!
  org_docID_to_seq_docID[(result["_id"])] = idx
  new_doc_obj["DocID"] = idx

  corpus.append(new_doc_obj)

print(len(corpus))

3633


In [6]:
import json

with open(f"./{dataset}/queries.jsonl", "r") as json_file:
  json_list = list(json_file)

queries = []

for json_str in json_list:
  result = json.loads(json_str)
  new_query_obj = {}
  new_query_obj["queryID"] = result["_id"] # NOTE some are string keys others int
  new_query_obj["query"] = result["text"]

  queries.append(new_query_obj)

print(len(queries))

3237


In [7]:
from transformers import AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer

model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L12-v2")
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L12-v2")
model.save_pretrained("all-MiniLM-L12-v2")
tokenizer.save_pretrained("all-MiniLM-L12-v2")

model = SentenceTransformer("all-MiniLM-L12-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



In [8]:
import cupy as cp

corpus_embeddings = []
for item in corpus:
  doc_embedding = model.encode(item["document"])
  corpus_embeddings.append(doc_embedding)

cp_corpus_embeddings = cp.asarray(corpus_embeddings)

In [9]:
from cuvs.neighbors import cagra

In [13]:
%%time
params = cagra.IndexParams(intermediate_graph_degree=128, graph_degree=64)
cagra_index = cagra.build(params, cp_corpus_embeddings)
search_params = cagra.SearchParams()

CPU times: user 926 ms, sys: 330 ms, total: 1.26 s
Wall time: 1.73 s


In [18]:
import time
import torch

def search_cuvs_cagra(query, top_k = 5):
    # Encode the query using the bi-encoder and find potentially relevant passages
    question_embedding = model.encode(query, convert_to_tensor=True)

    start_time = time.time()
    hits = cagra.search(search_params, cagra_index, question_embedding[None], top_k)
    end_time = time.time()

    # Output of top-k hits
    print("Results (after {:.3f} seconds):".format(end_time - start_time))
    print("Input question:", query)

    score_tensor = torch.as_tensor(hits[0], device='cpu')
    index_tensor = torch.as_tensor(hits[1], device='cpu')

    print("CAGRA Search Results: \n")
    for k in range(top_k):
      print("\t{:.3f}\t{}".format(score_tensor[0, k], corpus[index_tensor[0, k]]["document"]))

In [19]:
%%time
search_cuvs_cagra(query="What does taking supplemental B12 vitamins help with?")

Results (after 0.001 seconds):
Input question: What does taking supplemental B12 vitamins help with?
CAGRA Search Results: 

	20.404	Vitamin B12 deficiency anemia may have psychiatric manifestations preceding the hematological symptoms. Although a variety of symptoms are described, there are only sparse data on the role of vitamin B12 in depression. We report a case of vitamin B12 deficiency presenting with recurrent episodes of depression.
	20.468	Elevated total plasma homocysteine has been linked to the development of cognitive impairment and dementia in later life and this can be reliably lowered by the daily supplementation of vitamin B6, B12, and folic acid. We performed a systematic review and meta-analysis of 19 English language randomized, placebo-controlled trials of homocysteine lowering B-vitamin supplementation of individuals with and without cognitive impairment at the time of study entry. We standardized scores to facilitate comparison between studies and to enable us to 