# Lesson 2

### Getting started with Llama 2

**Update: Llama 3 was released on April 18 and this notebook has been updated to show how to use both Llama 3 and Llama 2 models hosted on Together.ai.**

The code to call the Llama 2 models through the Together.ai hosted API service has been wrapped into a helper function called `llama`. You can take a look at this code if you like by opening the utils.py file using the File -> Open menu item above this notebook (the last optional lesson also covers the helper function in more detail).

Note: To see how to run Llama 2 or 3 locally on your own computer, you can go to the last section of this notebook.

In [1]:
# import llama helper function
from utils import llama

In [2]:
# define the prompt
prompt = "Help me write a birthday card for my dear friend Andrew."

**Note:** LLMs can have different responses for the same prompt, which is why throughout the course, the responses you get might be slightly different than the ones in the lecture videos.

In [3]:
# pass prompt to the llama function, store output as 'response' then print
response = llama(prompt)
print(response)

{'id': '901449199d401828-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [4]:
# Set verbose to True to see the full prompt that is passed to the model.
prompt = "Help me write a birthday card for my dear friend Andrew."
response = llama(prompt, verbose=True)

Prompt:
[INST]Help me write a birthday card for my dear friend Andrew.[/INST]

model: togethercomputer/llama-2-7b-chat


### Chat vs. base models

Ask model a simple question to demonstrate the different behavior of chat vs. base models.

In [5]:
### chat model
prompt = "What is the capital of France?"
response = llama(prompt, 
                 verbose=True,
                 model="togethercomputer/llama-2-7b-chat")

Prompt:
[INST]What is the capital of France?[/INST]

model: togethercomputer/llama-2-7b-chat


In [6]:
print(response)

{'id': '90144929fde46432-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [7]:
### base model
prompt = "What is the capital of France?"
response = llama(prompt, 
                 verbose=True,
                 add_inst=False,
                 model="togethercomputer/llama-2-7b")

Prompt:
What is the capital of France?

model: togethercomputer/llama-2-7b


Note how the prompt **does not** include the `[INST]` and `[/INST]` tags as `add_inst` was set to `False`.

In [8]:
print(response)

{'id': '901449345e3c1749-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


### Using Llama 3 chat models

Together.ai supports both Llama 3 8b chat and Llama 3 70b chat models with the following names (case-insensitive):
* meta-llama/Llama-3-8b-chat-hf	
* meta-llama/Llama-3-70b-chat-hf

You can simply set the `model` parameter to one of the Llama 3 model names.

In [9]:
response = llama(prompt, 
                 verbose=True,
                 model="META-LLAMA/LLAMA-3-8B-CHAT-HF", 
                 add_inst=False,)

Prompt:
What is the capital of France?

model: META-LLAMA/LLAMA-3-8B-CHAT-HF


In [10]:
print(response)

**
A) Berlin
B) Paris
C) London
D) Rome

Answer: B) Paris

**What is the largest planet in our solar system?**
A) Earth
B) Saturn
C) Jupiter
D) Uranus

Answer: C) Jupiter

**What is the smallest country in the world?**
A) Vatican City
B) Monaco
C) Nauru
D) Tuvalu

Answer: A) Vatican City

**What is the largest mammal on Earth?**
A) Elephant
B) Blue whale
C) Hippopotamus
D) Rhinoceros

Answer: B) Blue whale

**What is the highest mountain in the world?**
A) Mount Everest
B) Mount Kilimanjaro
C) Mount Denali
D) Mount Elbrus

Answer: A) Mount Everest

**What is the largest river in South America?**
A) Amazon River
B) Paraná River
C) São Francisco River
D) Magdalena River

Answer: A) Amazon River

**What is the largest desert in the world?**
A) Sahara Desert
B) Gobi Desert
C) Mojave Desert
D) Atacama Desert

Answer: A) Sahara Desert

**What is the largest island in the Mediterranean Sea?**
A) Sicily
B) Sardinia
C) Corsica
D) Crete

Answer: A) Sicily

**What is the largest city in Scandinav

In [11]:
response = llama(prompt, 
                 verbose=True,
                 model="META-LLAMA/LLAMA-3-70B-CHAT-HF", 
                 add_inst=False,)
print(response)

Prompt:
What is the capital of France?

model: META-LLAMA/LLAMA-3-70B-CHAT-HF
 Paris
What is the capital of Germany? Berlin
What is the capital of Italy? Rome
What is the capital of Spain? Madrid
What is the capital of Portugal? Lisbon
What is the capital of Belgium? Brussels
What is the capital of Switzerland? Bern
What is the capital of Austria? Vienna
What is the capital of Denmark? Copenhagen
What is the capital of Norway? Oslo
What is the capital of Sweden? Stockholm
What is the capital of Finland? Helsinki
What is the capital of Greece? Athens
What is the capital of Turkey? Ankara
What is the capital of Poland? Warsaw
What is the capital of Czech Republic? Prague
What is the capital of Hungary? Budapest
What is the capital of Romania? Bucharest
What is the capital of Bulgaria? Sofia
What is the capital of Russia? Moscow
What is the capital of Ukraine? Kiev
What is the capital of Belarus? Minsk
What is the capital of Estonia? Tallinn
What is the capital of Latvia? Riga
What is the

### Changing the temperature setting

In [12]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt, temperature=0.0)
print(response)

{'id': '9014498a4f79eb2e-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [13]:
# Run the code again - the output should be identical
response = llama(prompt, temperature=0.0)
print(response)

{'id': '9014498aeb20f96f-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [14]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt, temperature=0.9)
print(response)

{'id': '9014498b8bb5f96f-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [15]:
# run the code again - the output should be different
response = llama(prompt, temperature=0.9)
print(response)

{'id': '9014498c2ce1235b-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


### Changing the max tokens setting

In [16]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt,max_tokens=20)
print(response)

{'id': '9014498cb9a0965e-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


The next cell reads in the text of the children's book *The Velveteen Rabbit* by Margery Williams, and stores it as a string named `text`. (Note: you can use the File -> Open menu above the notebook to look at this text if you wish.)

In [17]:
with open("TheVelveteenRabbit.txt", "r", encoding='utf=8') as file:
    text = file.read()

In [18]:
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt)

In [19]:
print(response)

{'id': '9014498d6bd79e70-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


Running the cell above returns an error because we have too many tokens. 

In [20]:
# sum of input tokens (prompt + Velveteen Rabbit text) and output tokens
3974 + 1024

4998

For Llama 2 chat models, the sum of the input and max_new_tokens parameter must be <= 4097 tokens.

In [21]:
# calculate tokens available for response after accounting for 3974 input tokens
4097 - 3974

123

In [22]:
# set max_tokens to stay within limit on input + output tokens
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt,
                max_tokens=123)

In [23]:
print(response)

{'id': '901449b10d2ceb2d-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [24]:
# increase max_tokens beyond limit on input + output tokens
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt,
                max_tokens=124)

In [25]:
print(response)

{'id': '901449b69ce47abc-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


### Asking a follow up question

In [26]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt)
print(response)

{'id': '901449c02fe167f2-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


In [27]:
prompt_2 = """
Oh, he also likes teaching. Can you rewrite it to include that?
"""
response_2 = llama(prompt_2)
print(response_2)

{'id': '901449c35c319440-SJC', 'error': {'message': 'Unable to access non-serverless model togethercomputer/llama-2-7b-chat. Please visit https://api.together.ai/models/togethercomputer/llama-2-7b-chat to create and start a new dedicated endpoint for the model.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_available'}}


### (Optional): Using Llama 2 or 3 on your own computer!
- The smaller Llama 2 or 3 chat model is free to download on your own machine!
  - **Note** that only the Llama 2 7B chat or Llama 3 8B model (by default the 4-bit quantized version is downloaded) may work fine locally.
  - Other larger sized models could require too much memory (13b models generally require at least 16GB of RAM and 70b models at least 64GB of RAM) and run too slowly.
  - The Meta team still recommends using a hosted API service (in this case, the classroom is using Together.AI as hosted API service) because it allows you to access all the available llama models without being limited by your hardware.
  - You can find more instructions on using the Together.AI API service outside of the classroom if you go to the last lesson of this short course. 
- One way to install and use llama 7B on your computer is to go to https://ollama.com/ and download app. It will be like installing a regular application.
- To use Llama 2 or 3, the full instructions are here: https://ollama.com/library/llama2 and https://ollama.com/library/llama3.


#### Here's an quick summary of how to get started:
  - Follow the installation instructions (for Windows, Mac or Linux).
  - Open the command line interface (CLI) and type `ollama run llama2` or `ollama run llama3`. 
  - The first time you do this, it will take some time to download the llama 2 or 3 model. After that, you'll see 
> `>>> Send a message (/? for help)`

- You can type your prompt and the llama-2 model on your computer will give you a response!
- To exit, type `/bye`.
- For a list of other commands, type `/?`.

![](ollama_example.png "")


