# LLM Basics
Large Language Models (LLMs) are essentially next-word predictors. Some are small enough to be run locally, but even the smallest ones are painfully slow if you don't have a reasonable GPU. Instead we will use an API to send our LLM requests to an online host. This is generally how people use ChatGPT and other popular models, but since this will cost money, we will use the Groq API. 

## Groq
Groq (not to be confused with X/Twitter's LLM, Grok) is a hardware manufacturer that is developing very fast chips for llm inference and hosts a number of pretty good models that you can use for free as long as you stay within their usage limits (see [Groq Rate Limits](https://console.groq.com/docs/rate-limits)). Groq also seamlessly integrates into [Langchain](https://python.langchain.com/v0.2/docs/integrations/chat/groq/) which we will use later.

- Create an account at [https://groq.com/](https://groq.com/) 
- follow the links to crete an API key (call it whatever you want)
- save the API key to a file in this folder called `.env`. This is a way of keeping your secrets like API keys safe in one place and loading them 


## Note on API keys and `.env` file
It is always best to avoid explicitly writing your API key in your code because it should be kept secret. One way to do this is to store them in environment variables that you can refer to by name on your code. The [python-dotenv](https://pypi.org/project/python-dotenv/) library allows us to do this by storing your API keys in a `.env` file that is loaded into envirnment variables when you program starts.

To use:
- write any secret you want in your `.env` file like this with each one on a new line:
``` bash
GROQ_API_KEY=<your-groq-api-key>
OPENAI_API_KEY=<your-openai-api-key>
etc..
```
- then you can load them as environment variables with these lines at the start of your code:
```python
import os
from dotenv import load_dotenv
load_dotenv()
```
- these are available in code like this
```python
groq_api_key = os.environ.get("GROQ_API_KEY")
```

In [1]:
from groq import Groq
import os
from dotenv import load_dotenv
load_dotenv()

True

## Generating responses
Here is an example of invoking an LLM with the Groq API. Other LLM host will have different APIs but they are all pretty similar in principle.

In [2]:
# the client allows you to interact with the api. Use the api key env variable loaded from .env file
client = Groq(api_key = os.environ.get("GROQ_API_KEY"))

# the prompts that will be sent to the llm
system_prompt = "You are an expert python programmer and your mission is to help the user with their programming troubles. Keep answers brief and to the point"
# system_prompt = "You are a grumpy assistant who only answers in in ALL CAPS"

user_prompt = "Can you explain how a list comprehension works?"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Generate a response
completion = client.chat.completions.create(
    model = "llama3-70b-8192", # could also be "mixtral-8x7b-32768", or others
    messages = messages,
    temperature=0.25,
    max_tokens=1024,
    stream=True,
)
print("LLM response:\n")
# stream the response as it comes in ()
for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

# # if you set stream = False then this would display the message contents
# print(completion.choices[0].message.content)

LLM response:

A list comprehension is a concise way to create a new list from an existing iterable by applying a transformation to each element. The general syntax is:

`new_list = [expression(element) for element in iterable]`

Here, `expression(element)` is evaluated for each `element` in the `iterable`, and the results are collected in a new list `new_list`.

For example: `squares = [x**2 for x in range(10)]` creates a list of squares of numbers from 0 to 9.

Think of it as a compact way to write a `for` loop that creates a new list.

# What is happening?
Here is a description of the `client.chat.completions.create()` parameters.

### `messages` (prompts)
A list of the prompts that the LLM will use to generate its completion. Most LLMs take a variety of of prompt types:
- **System Prompt**: This is the initial input or prompt that provides context or instructions for the conversation. It is often something like `you are a helpful assistant`, but you could also use `You are a grumpy assistant who only answers in in ALL CAPS`. This will set the tone for the response and the whole future conversation.
- **User Prompt**: This is the what you the person says to or asks the LLM.
- **Assistant Prompt**: This refers to the response generated by the AI assistant in response to the user prompt. We didn't use one in the example above because it was just a one-shot interaction. When having longer conversations, this is how you tell the llm what their previous responses were.

### `temperature`
This controls how much ramdomness to allow when the model picks from a probability distribution of possible next words. 
- `temperature = 0` leads to responses with no randomness.
- `temperature` > 0 leads to more random and 'creative' responses.
    
### `max_tokens`
A hard limit on the length of the LLM's response. If you are paying to use the model, then you are paying by the token so this parameter allows you to avoid excessively long and expensive responses. Some models are more verbose than others so you could also tell them to be brief in the system prompt, but that doesn't provide an hard limit.

### `stream`
This constrols whether the model returns a streaming object that can be used to print the response one token at a time (`stream=True`), or if it returns the entire message as a single object after the last token is generated (`stream = False`)


## Next: here is a function that simplifies the above process we can use later.

In [3]:
def get_single_completion(
    messages, 
    stream = True, 
    verbose = True, 
    model = "llama3-70b-8192"
):
    client = Groq(api_key = os.environ.get("GROQ_API_KEY"))
    completion = client.chat.completions.create(
        model = model,
        messages = messages,
        temperature=0.25,
        max_tokens=1024,
        stream=stream,
    )
    # stream the response as it comes in ()
    if (verbose==True and stream==True):
        for chunk in completion:
            print(chunk.choices[0].delta.content or "", end="")
    elif (verbose==True and stream==False):
        print(completion.choices[0].message.content)

    return completion

#### Setting `streaming=True` returns a stream object

In [4]:
messages = [
    {"role": "system", "content": "you are a helpful assistant"},
    {"role": "user", "content": "tell me a joke"}
]

completion = get_single_completion(messages, stream = True, verbose = True)
completion

Here's one:

Why couldn't the bicycle stand up by itself?

(Wait for it...)

Because it was two-tired!

Hope that brought a smile to your face!

<groq.Stream at 0x214516a9c10>

#### Setting `streaming=False` returns a `ChatCompletion` object that contains lots of interesting info. 
Conveniently, it tells you how many tokens were used including a breakdown of prompt_tokens and completion_tokens. Typically completion_tokens are much more expensive than prompt_tokens. 

For example, OpenAI's *gpt-4o* model costs:
- \$ 15.00 per Million output tokens
- \$ 3.00 per Million input tokens

In [5]:
messages = [
    {"role": "system", "content": "You are a helpful assistant who knows about science and provides detailed explanationss based on your knowledge"},
    {"role": "user", "content": "Why does ice float in water?"}
]
completion = get_single_completion(messages, stream = False)

You are a helpful assistant who knows about science


In [6]:
# and we can see the token usage like this
completion.usage

CompletionUsage(completion_tokens=10, prompt_tokens=40, total_tokens=50, completion_time=0.028571429, prompt_time=0.008325569, queue_time=None, total_time=0.036896998)

##### how much would this have cost on GPT4o?

In [7]:
input_tokens = completion.usage.prompt_tokens
output_tokens = completion.usage.completion_tokens

cost = input_tokens/1e6 * 3 + output_tokens/1e6 * 15
print(f"${round(cost, 5)}")

$0.00027


Not much! but this can easily get much bigger when you start having conversations

In [8]:
# note that we can access the response text like this
completion.choices[0].message.content

'You are a helpful assistant who knows about science'

## Having a conversation with the model
Getting the model to "remember" what you just told it requires a bit of extra work. Let's see why this is required.

In [9]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello my name is Trevor. Would you be able to remeber that if I ask you in a few seconds?"}
]
completion = get_single_completion(messages, stream = False)

# keep note of this response for later
first_response = completion.choices[0].message.content

Nice to meet you, Trevor! Of course, I'd be happy to remember your name. I'll make a mental note right now. Go ahead and ask me again in a few seconds, and I'll do my best to recall it correctly.


#### Let's see if it remembers:

In [10]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hey, do you remember my name"}, 
]
completion = get_single_completion(messages, stream=False)


I'm ChatGenesis,assistant


#### Come on, I just told you!
Yeah, but since you didn't tell the LLM what it had said earlier there's no way it could know what your name was. You must explicitly feed the LLM all the previous conversations messages that you want them to know about.

In [11]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello my name is Trevor. Please remember that for when I ask you about it in just a second"}, 
    {"role": "assistant", "content": first_response}, ### We must tell it what it said
    {"role": "user", "content": "Hey, do you remember my name?"}, 
]

completion = get_single_completion(messages, stream = False)

Trevor! I remember you told me your name is Trevor.


In [12]:
# Add previous assistant response
messages.append({"role": "assistant", "content" :completion.choices[0].message.content})
# Add a new user prompt
messages.append({"role": "user", "content": "Thanks for remembering. My dog daisy wants to go for a walk now. Should I go to the dog park or the trail?"})
completion = get_single_completion(messages, stream = False)

That's a great question, Trevor! Since I don't know Daisy's preferences or needs, I'll give you some general pros and cons of each option.

The dog park could be a great choice if Daisy loves socializing with other dogs and needs to burn off some energy. She'll get to run around and play with her furry friends, and you can socialize with other dog owners too!

On the other hand, the trail might be a better option if Daisy prefers a more leisurely stroll or needs some exercise in a more controlled environment. You can enjoy the scenery together, and it might be a better choice if Daisy is still getting used to being around other dogs.

Which one do you think Daisy would prefer, Trevor?


In [13]:
messages.append({"role": "assistant", "content": completion.choices[0].message.content})
messages.append({"role": "user", "content": "I guess we'll go to the trail"})
completion = get_single_completion(messages, stream = False)

The trail it is, Trevor! I hope you and Daisy have a wonderful time together, enjoying the fresh air and scenery. Make sure to bring plenty of water and snacks for Daisy, and don't forget to clean up after her.

If you need any more advice or have any other questions, feel free to ask. Otherwise, have a great walk and make some special memories with Daisy!


In [14]:
messages.append({"role": "assistant",  "content": completion.choices[0].message.content})
messages.append({"role": "user", "content": "What is my name again?"})
completion = get_single_completion(messages, stream = False)

Trevor! Your name is Trevor.


In [15]:
# look at the conversation history
messages

[{'role': 'system', 'content': 'You are a helpful assistant'},
 {'role': 'user',
  'content': 'Hello my name is Trevor. Please remember that for when I ask you about it in just a second'},
 {'role': 'assistant',
  'content': "Nice to meet you, Trevor! Of course, I'd be happy to remember your name. I'll make a mental note right now. Go ahead and ask me again in a few seconds, and I'll do my best to recall it correctly."},
 {'role': 'user', 'content': 'Hey, do you remember my name?'},
 {'role': 'assistant',
  'content': 'Trevor! I remember you told me your name is Trevor.'},
 {'role': 'user',
  'content': 'Thanks for remembering. My dog daisy wants to go for a walk now. Should I go to the dog park or the trail?'},
 {'role': 'assistant',
  'content': "That's a great question, Trevor! Since I don't know Daisy's preferences or needs, I'll give you some general pros and cons of each option.\n\nThe dog park could be a great choice if Daisy loves socializing with other dogs and needs to burn o

#### Notes
- Having long conversations will require sending an ever-growing conversation history to the LLM, which uses more tokens and costs more money (if you were not using a free API)
- This process of manually adding to the list of messages is tedious. We will see easier ways to do this when we start using the Langchain library