# Getting Started with RAG using Fireworks Fast Inference LLMs

<a href="https://colab.research.google.com/github/fw-ai/cookbook/blob/main/recipes/rag/rag-paper-titles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

While large language models (LLMs) show powerful capabilities that power advanced use cases, they suffer from issues such as factual inconsistency and hallucination. Retrieval-augmented generation (RAG) is a powerful approach to enrich LLM capabilities and improve their reliability. RAG involves combining LLMs with external knowledge by enriching the prompt context with relevant information that helps accomplish a task.


Before getting started, let's first install the libraries we will use:

In [3]:
%%capture
!pip install chromadb tqdm fireworks-ai python-dotenv pandas
!pip install sentence-transformers

Let's download the dataset we will use:

In [4]:
!pip install -U sentence-transformers datasets

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed datasets-

In [5]:
import json
from sentence_transformers import SentenceTransformer, CrossEncoder, util
import time
import gzip
import os
import torch
from datasets import load_dataset
import itertools
import fireworks.client
import dotenv
import chromadb
from tqdm.auto import tqdm
import pandas as pd
import random
from google.colab import userdata

if not torch.cuda.is_available():
  print("Warning: No GPU found. Please add GPU to your notebook")



model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
top_k = 5  # Number of passages we want to retrieve

passages = []

ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')

for movie in ds:
  passages.append([movie['Title'],movie['Plot']])
corpus_embeddings = bi_encoder.encode(passages, convert_to_tensor=True, show_progress_bar=True)

# If you like, you can also limit the number of passages you want to use
print("Passages:", len(passages))



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/540 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/554 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/75.0M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Passages: 1000


In [6]:
#Create a list that only displays the title and plot for each entry
for movie in itertools.islice(ds,5):
  title,plot = movie['Title'],movie['Plot']
  print(title,plot)

Kansas Saloon Smashers A bartender is working at a saloon, serving drinks to customers. After he fills a stereotypically Irish man's bucket with beer, Carrie Nation and her followers burst inside. They assault the Irish man, pulling his hat over his eyes and then dumping the beer over his head. The group then begin wrecking the bar, smashing the fixtures, mirrors, and breaking the cash register. The bartender then sprays seltzer water in Nation's face before a group of policemen appear and order everybody to leave.[1]
Love by the Light of the Moon The moon, painted with a smiling face hangs over a park at night. A young couple walking past a fence learn on a railing and look up. The moon smiles. They embrace, and the moon's smile gets bigger. They then sit down on a bench by a tree. The moon's view is blocked, causing him to frown. In the last scene, the man fans the woman with his hat because the moon has left the sky and is perched over her shoulder to see everything better.
The Mart

Before continuing, you need to obtain a Fireworks API Key to use the mixtral-8x7b-instruct model.

Checkout this quick guide to obtain your Fireworks API Key: https://readme.fireworks.ai/docs

In [7]:
# you can set envs using Colab secrets
fireworks.client.api_key = userdata.get('fireworkapi')

## Getting Started

## RAG Use Case: Efficient Exploration of Historical Movie Database

For the RAG use case, we will be using [a dataset](https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots?resource=download) that contains descriptions of 34,886 movies from around the world.

The user will provide an original query. It will then take that input and then use the dataset to generate a response with context surronding the query.



### Step 1: Turn the dataset into a dictionary list


In [8]:
new_list =  [{"Title": item[0], "Plot": item[1]} for item in passages]


In [9]:
new_list[2]

{'Title': 'The Martyred Presidents',
 'Plot': 'The film, just over a minute long, is composed of two shots. In the first, a girl sits at the base of an altar or tomb, her face hidden from the camera. At the center of the altar, a viewing portal displays the portraits of three U.S. Presidents—Abraham Lincoln, James A. Garfield, and William McKinley—each victims of assassination.\r\nIn the second shot, which runs just over eight seconds long, an assassin kneels feet of Lady Justice.'}

We will be using SentenceTransformer for generating embeddings that we will store to a chroma document store.

In [12]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

embed_fn = MyEmbeddingFunction()

# Initialize the chromadb directory, and client.
client = chromadb.PersistentClient(path="./chromadb")

# create collection
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

We will now generate embeddings for batches:

In [13]:
###TRY 2 with og list
# Generate embeddings, and index titles in batches
batch_size = 50

# loop through batches and generated + store embeddings
for i in tqdm(range(0, len(new_list), batch_size)):

    i_end = min(i + batch_size, len(new_list))
    batch = new_list[i : i + batch_size]

    # Replace title with "No Title" if empty string
    batch_titles = [str(paper["Title"]) if str(paper["Title"]) != "" else "No Title" for paper in batch]
    batch_ids = [str(sum(ord(c) + random.randint(1, 10000) for c in paper["Title"])) for paper in batch]

    # generate embeddings
    batch_embeddings = embedding_model.encode(batch_titles)

    # upsert to chromadb
    collection.upsert(
        ids=batch_ids,
        documents=batch_titles,
        embeddings=batch_embeddings.tolist(),
    )

  0%|          | 0/20 [00:00<?, ?it/s]

Now we can test the retriever:

In [14]:
def get_completion(prompt, model=None, max_tokens=50):

    fw_model_dir = "accounts/fireworks/models/"

    if model is None:
        model = fw_model_dir + "mixtral-8x7b-instruct"
    else:
        model = fw_model_dir + model

    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return [choice.text for choice in completion.choices[0:5]]

In [15]:
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023",
    embedding_function=embed_fn
)

retriever_results = collection.query(
    query_texts=["Documentaries showcasing indigenous peoples survival and daily life in Arctic regions"],
    n_results=2,
)

print(retriever_results["documents"])

[['The Frozen North', 'From Leadville to Aspen: A Hold-Up in the Rockies']]


Now let's put together our final prompts:

In [16]:
get_completion('[/INST]Documentaries showcasing indigenous peoples survival and daily life in Arctic regions')

[', such as Inuit, Yupik, or Sami communities, can be both educational and captivating. These films often highlight the unique customs, challenges, and resilience of these communities in the face of environmental extremes and global']

In [17]:
get_completion('[/INST]Western romance')

[' is a broad genre that encompasses a wide range of themes and styles, but at its core, it typically revolves around the emotional and romantic relationships between the main characters. These stories are often set in the Western world, particularly in the United']

In [18]:
get_completion('[/INST]Silent ﬁlm about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after ﬁnding her family in poverty in Cairo.')

['\n\nThe ﬁlm is a melodrama about a Parisian star, Lea de Cast, who leaves her husband, a writer, for a baron. She moves to Egypt, where she ﬁnds herself in a love']

In [19]:
get_completion("[/INST]Comedy ﬁlm, oﬃce disguises, boss's daughter, elopement.")

["\n\nThe ﬁlm is about a young man named Jack who works as a clerk in a large oﬃce. He is in love with his boss's daughter, Emily, but he is too shy to tell her."]

In [22]:
get_completion('[/INST]Lost ﬁlm, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.')

['\n\nThe answer to the riddle is _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _']

In [21]:
get_completion('[/INST]Denis Gage Deane-Tanner.')

[' Born 1893. Died 1918. Aged twenty-five years.\n"He gave his all for his country and his King.\nErected by his sorrowing father and mother."\n\n']

As you can see, the short titles generated by the LLM are somewhat okay. This use case still needs a lot more work and could potentially benefit from finetuning as well. For the purpose of this tutorial, we have provided a simple application of RAG using open-source models from Firework's blazing-fast models.

Try out other open-source models here: https://app.fireworks.ai/models

Read more about the Fireworks APIs here: https://readme.fireworks.ai/reference/createchatcompletion
