# Using a Large Language Model (LLM) with your data: Retrieval Augmented Generation (RAG) in 5 minutes

I'm William Horton, Staff Machine Learning Engineer at Included Health

@hortonhearsafoo on Twitter

## OpenAI Python SDK

In [3]:
from openai import OpenAI

client = OpenAI()

I'm going to be using the ChatCompletions API

Main parameters are:
- model
- messages

In [4]:
model = "gpt-4-turbo"
messages = [{"role": "user", "content": "Hello!"}]

In [5]:
response = client.chat.completions.create(model=model, messages=messages)
print(response.choices[0].message.content)

Hello! How can I help you today?


`messages` can represent the conversation history, so I can add to it and call the API again

In [6]:
# recording the model's response
messages.append({"role": "assistant", "content": response.choices[0].message.content})

# adding my second message
messages.append({"role": "user", "content": "Tell me a joke"})

In [7]:
response = client.chat.completions.create(model=model, messages=messages)
print(response.choices[0].message.content)

Why don't skeletons fight each other? They don't have the guts.


In [8]:
def call_llm(message: str):
    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[
            {"role": "user", "content": message}
        ]
    )
    out = response.choices[0].message.content
    print(out)
    return out

## LLMs know a lot!

Geography

In [9]:
call_llm("What is the capital of Italy?");

The capital of Italy is Rome.


Math

In [10]:
call_llm("Is 2 a prime number?");

Yes, 2 is a prime number. A prime number is defined as a natural number greater than 1 that has no positive divisors other than 1 and itself. Since the only divisors of 2 are 1 and 2, it meets the criteria of being a prime number. Additionally, it is the smallest and the only even prime number.


Language

In [11]:
call_llm("What does the word 'blithesome' mean?");

The word 'blithesome' means showing a casual and cheerful indifference considered to be callous or improper; happy or joyous. It's often used to describe someone's carefree, light-hearted, and merry demeanor.


History

In [12]:
call_llm("Tell me a 50-word biography of Alexander the Great");

Alexander the Great, born in 356 BC in Pella, Macedonia, was a revolutionary military leader who became king at 20. By 30, he had created one of the largest empires in history, spanning from Greece to northwestern India. Known for his undefeated battles, he spread Greek culture, fostering a new Hellenistic civilization before dying in 323 BC.


## But they don't always know what you want them to know

In [13]:
call_llm("What day is it today?");

I'm sorry, but I can't provide real-time information like today's date. However, you can easily check the current date on your computer or smartphone.


In [14]:
call_llm("What is my job title?");

I don't have access to personal data about individuals unless it has been shared with me in the course of our conversation. I can only provide information and answer questions based on general or publicly available knowledge. If you tell me what your job involves, I can try to help you determine an appropriate title.


In [15]:
call_llm("Who is speaking at PyTexas 2024?");

As of my last update in January 2023, I don't have real-time information on events that occur after this date, including PyTexas 2024. For the most accurate and up-to-date details about the speakers at PyTexas 2024, I recommend checking the official PyTexas conference website or their social media channels. These sources typically provide information about speakers, schedules, and other event details closer to the conference date.


## Good news! You can add to their knowledge by just...telling them new things

People often ask me how they can train or finetune models like GPT-4 on their data.

In today's paradigm of massive foundation models, most people aren't actually retraining the model. Instead, they are putting the information in the prompt.

**Just give the model the text you want to work with**!

For this, you can use the *system prompt*.

Let's revisit the questions above, but now giving the model the necessary *context*.

In [16]:
from datetime import datetime

messages=[
    {"role": "system", "content": f"The current time is {datetime.utcnow()}"},
    {"role": "user", "content": "What day is it today?"},
]

In [17]:
response = client.chat.completions.create(model=model, messages=messages)
print(response.choices[0].message.content)

Today is Friday, April 20, 2024.


In [18]:
from typing import Optional

def call_llm_2(message: str, system_prompt: Optional[str] = None) -> str:
    messages = []
    
    if system_prompt is not None:
        messages.append({"role": "system", "content": system_prompt})
        
    messages.append({"role": "user", "content": message})
    
    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=messages
    )
    
    out = response.choices[0].message.content
    print(out)
    return out

In [19]:
system_prompt = "My name is William Horton and I'm a Staff Machine Learning Engineer at Included Health"

call_llm_2("What is my job title?", system_prompt=system_prompt);

Your job title is Staff Machine Learning Engineer at Included Health.


In [20]:
with open("/Users/william/Downloads/Talk Schedule - PyTexas Conference.html") as f:
    site = f.read()

In [21]:
len(site)

35712

In [22]:
print(site)


<!doctype html>
<html lang="en" class="no-js">
  <head>
    
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width,initial-scale=1">
      
        <meta name="description" content="A full schedule grid for the event">
      
      
        <meta name="author" content="PyTexas Foundation">
      
      
        <link rel="canonical" href="https://pytexas.org/2024/schedule/full_schedule/">
      
      
        <link rel="prev" href="../../sponsors/sponsor-us/">
      
      
        <link rel="next" href="../keynotes/">
      
      
      <link rel="icon" href="../../assets/images/favicon.png">
      <meta name="generator" content="mkdocs-1.5.3, mkdocs-material-9.5.16">
    
    
      
        <title>Talk Schedule - PyTexas Conference</title>
      
    
    
      <link rel="stylesheet" href="../../assets/stylesheets/main.bcfcd587.min.css">
      
        
        <link rel="stylesheet" href="../../assets/stylesheets/palette.06af60db.min.css">
      


In [23]:
call_llm_2("Who is speaking at PyTexas 2024?", system_prompt=site);

At PyTexas 2024, the speakers and their respective topics include:

- **Lynn Root** - (Keynote Speaker)
- **Carol Willing** - (Keynote Speaker)
- **Heather Crawford** - "Python Code vs Pythonic Code: What Experienced Developers Find Challenging About Learning Python"
- **Al Sweigart** - "20 GOTO 10: How to Make Scrolling ASCII Art"
- **Sameer Shukla** - "ContainerCraft: Mastering Efficient Integration Testing with Testcontainers"
- **Loren Sands-Ramshaw** - "System Design on Easy Mode"
- **Jason Koo** - "Anarchy to Order - Organizing Assorted Data with Python and LLMs"
- **Avik Basu** - "Building Efficient Containers for Python Applications"
- **Bernat Gabor** - "Lessons Learned Maintaining Open-Source Python Projects"
- **Andy Fundinger** - "Always Use Sets"
- **Peter Sobot** - "Working with Audio in Python"
- **Josh Cannon** - "Oh the (Methods) You Can (Make): By Dunder Seuss"
- **Soundharya Khanapur and Shivani Shetty** - "PySecOps"
- **Oliver Rew** - "Designing a Human-Friendly CLI

## Retrieval Augmented Generation (RAG)

What if we have many documents?

In [26]:
import glob

docs = [open(fname).read() for fname in glob.glob("pytexas_docs/*.txt")]

In [27]:
docs

['{"talk_title": "Always Use Sets!", "talk_description": "In this talk, we\'ll show how to use sets to improve the performance and clarity of our code. We\'ll even show non-set-like cases that gain enough performance by leveraging sets to make this approach worth some amount of extra complexity.", "speaker": "Andy Fundinger", "speaker_bio": "Andy Fundinger is a senior engineer at Bloomberg, where he develops Python applications in the Data Gateway Platform team and supports Python developers throughout the firm through the company\'s Python Guild. Andy has spoken several times at PyGotham, as well as other conferences such as QCon, PyCaribbean, and EuroPython. In the past, Andy has worked on private equity and credit risk applications, web services, and virtual worlds. Andy holds a master\'s degree in engineering from Stevens Institute of Technology."}',
 '{"talk_title": "Anarchy to Order - Organizing assorted data with Python and LLMs", "talk_description": "Bringing structure to chaot

In [28]:
import json

embeddings = [d.embedding for d in client.embeddings.create(model="text-embedding-3-large", input=docs).data]

In [29]:
embeddings

[[0.010094599798321724,
  0.024602724239230156,
  -0.02868027053773403,
  0.005646714475005865,
  0.010926907882094383,
  0.010758919641375542,
  0.01685996726155281,
  0.020906969904899597,
  -0.0004357209545560181,
  -0.008681966923177242,
  0.0048640393652021885,
  0.00855979323387146,
  -0.003875196212902665,
  -0.023716963827610016,
  0.008941586129367352,
  0.011110168881714344,
  0.005585627630352974,
  -0.006337759550660849,
  -0.016768336296081543,
  -0.04825860261917114,
  0.03524710610508919,
  -0.042669154703617096,
  -0.06291944533586502,
  -0.01846349611878395,
  -0.008407075889408588,
  0.0002682094054762274,
  -0.008926314301788807,
  0.00651338417083025,
  -0.017990073189139366,
  -0.012133372947573662,
  0.02130403369665146,
  -0.0028080856427550316,
  0.028664998710155487,
  0.017883172258734703,
  -0.045356977730989456,
  0.006574471015483141,
  0.05119077116250992,
  -0.011881389655172825,
  -0.045876216143369675,
  0.02588554657995701,
  -0.016753064468503,
  -0.0

In [30]:
query = "Where does Andy Fundinger work?"

Just to prove that the model doesn't know right now

In [31]:
call_llm_2(query);

As of the last update, I don't have current, specific information on where Andy Fundinger works. Andy Fundinger could be a private individual or professional whose employment details aren’t widely recognized or readily available in public domain sources. If you are looking for information on a specific person, it might be best to consult more direct sources or platforms that could offer updated professional details like LinkedIn.


In [32]:
query_embedding = client.embeddings.create(model="text-embedding-3-large", input=query).data[0].embedding

We use dot product between the embedding of the query and the embeddings of the documents to determine text similarity.

np.argsort gives us the indexes into the array in sorted order

In [33]:
import numpy as np

idxs = np.argsort(np.dot(embeddings, query_embedding))[::-1]
idxs

array([ 0,  5, 14,  1,  3, 16,  8, 12,  7,  6, 17,  2, 13,  9, 10, 15, 11,
        4])

How many documents do we want to retrieve?

(Current top models do pretty well at ignoring completely irrelevant information.)

In [34]:
top_k = 3

In [35]:
context = "\n".join(np.array(docs)[idxs[:top_k]])

In [36]:
print(context)

{"talk_title": "Always Use Sets!", "talk_description": "In this talk, we'll show how to use sets to improve the performance and clarity of our code. We'll even show non-set-like cases that gain enough performance by leveraging sets to make this approach worth some amount of extra complexity.", "speaker": "Andy Fundinger", "speaker_bio": "Andy Fundinger is a senior engineer at Bloomberg, where he develops Python applications in the Data Gateway Platform team and supports Python developers throughout the firm through the company's Python Guild. Andy has spoken several times at PyGotham, as well as other conferences such as QCon, PyCaribbean, and EuroPython. In the past, Andy has worked on private equity and credit risk applications, web services, and virtual worlds. Andy holds a master's degree in engineering from Stevens Institute of Technology."}
{"talk_title": "Designing a Human-Friendly CLI for API-Driven Infrastructure", "talk_description": "As Bloomberg's infrastructure grows and e

In [37]:
call_llm_2(query, system_prompt=context);

Andy Fundinger works at Bloomberg.


But: it doesn't work without the right document(s)

In [38]:
context = "\n".join(np.array(docs)[idxs[-top_k:]])

In [39]:
print(context)

{"talk_title": "Voice Computing with Python in Jupyter Notebooks", "talk_description": "Jupyter Notebook is a popular platform for writing literate programming documents that contain computer code and its output interleaved with prose that describes the code and the output. Recently, it has become possible to use one's voice to interact with Jupyter notebooks. This capability opens access to those with impaired use of their hands. Voice computing also increases the productivity of workers who are tired of typing and increases the productivity of those workers who speak faster than they can type. I split voice computing into three activities: speech-to-text, speech-to-command, and speech-to-code. Several automated speech recognition software packages operate in Jupyter notebooks and support the three activities to a certain degree. I will provide examples of all three activities as they pertain to applications of Python to our research on the molecular structures of proteins and nucleic

In [40]:
call_llm_2(query, system_prompt=context);

The document does not provide information about where Andy Fundinger works. Please provide further details or check another source for information on Andy Fundinger.


RAG in 17 lines of Python

In [41]:
from typing import Callable

def create_rag_fn(documents: list[str], top_k: Optional[int] = 3) -> Callable[[str], str] :
    embeddings = [d.embedding for d in client.embeddings.create(model="text-embedding-3-large", input=documents).data]

    def _retrieve(query: str) -> str:
        query_embedding = client.embeddings.create(model="text-embedding-3-large", input=query).data[0].embedding
        idxs = np.argsort(np.dot(embeddings, query_embedding))[::-1][:top_k]
        context = "\n".join(np.array(documents)[idxs[:top_k]])
        return context

    def _generate(query: str, context: str) -> str:
        return call_llm_2(query, system_prompt=context)

    def rag(query: str) -> str:
        context = _retrieve(query)
        return _generate(query, context)
        
    return rag

In [42]:
rag_fn = create_rag_fn(docs)

In [43]:
rag_fn(query);

Andy Fundinger works at Bloomberg.


In [44]:
rag_fn("Are there any audio-related talks at the conference?");

Yes, there are audio-related talks at the conference. One such talk includes "Working with Audio in Python (feat. Pedalboard)" by Peter Sobot. The talk will cover how digital audio works, how Python can be utilized to manipulate audio data, and the usage of Pedalboard, a new library designed for performing common audio tasks in Python. This includes applying effects, using VSTs and audio plugins, and handling various audio formats.


## Frameworks

You don't have to write this code yourself!

In [45]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("pytexas_docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query(query)
print(response)

Andy Fundinger works at Bloomberg.


Find me @hortonhearsafoo on Twitter, wdhorton on GitHub