## Introduction
In this Colab Notebook, we are going to explore Llama-3.2-1B, a model fine-tuned for generating text & chatting.

By the end of this tutorial, you'll be able to interact with this model and use it to generate conversational responses.

Whether you're curious about chatbot technology or simply want to see a machine-generated response to a particular question, this notebook will serve as a comprehensive guide.

## Workflow
1. **Installations**: We'll begin by setting up our environment with the required libraries.
2. **Prerequisites**: Ensure we have access to the Llama-3.2-1B model on Hugging Face.
3. **Loading the Model & Tokenizer**: Retrieve the model and tokenizer for our session.
4. **Creating the Llama Pipeline**: Prepare our model for generating responses.
5. **Interacting with Llama**: Prompt the model for answers and explore its capabilities.

Let's dive in!

**First, change runtime to GPU.**


You can play with Llama-3.2-1B Chat here: https://huggingface.co/spaces/huggingface-projects/meta-llama/Llama-3.2-1B

## Installations

Before we proceed, we need to ensure that the essential libraries are installed:
- `Hugging Face Transformers`: Provides us with a straightforward way to use pre-trained models.
- `PyTorch`: Serves as the backbone for deep learning operations.
- `Accelerate`: Optimizes PyTorch operations, especially on GPU.

In [1]:
!pip install transformers torch accelerate

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

### Prerequisites

To load our desired model, `meta-llama/Llama-3.2-1B`, we first need to authenticate ourselves on Hugging Face. This ensures we have the correct permissions to fetch the model.

1. Gain access to the model on Hugging Face: [Link](https://huggingface.co/meta-llama/Llama-3.2-1B).
2. Use the Hugging Face CLI to login and verify your authentication status.



In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The token `LLema_` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `LLema_`


In [3]:
!huggingface-cli whoami

[1muser: [0m yokeshdxb


### Loading Model & Tokenizer

Here, we are preparing our session by loading both the Llama model and its associated tokenizer.

The tokenizer will help in converting our text prompts into a format that the model can understand and process.

In [4]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-3.2-1B" # meta-llama/Llama-2-13b-chat-hf

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

### Creating the Llama Pipeline

We'll set up a pipeline for text generation.

This pipeline simplifies the process of feeding prompts to our model and receiving generated text as output.

*Note*: This cell takes 2-3 minutes to run

In [5]:
from transformers import pipeline

llama_pipeline = pipeline(
    "text-generation",  # LLM task
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

Device set to use cuda:0


### Getting Responses

With everything set up, let's see how Llama responds to some sample queries.

In [7]:
def get_llama_response(prompt: str) -> None:
    """
    Generate a response from the Llama model.

    Parameters:
        prompt (str): The user's input/question for the model.

    Returns:
        None: Prints the model's response.
    """
    sequences = llama_pipeline(
        prompt,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=256,
    )
    print("Chatbot:", sequences[0]['generated_text'])



prompt = 'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n'
get_llama_response(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?
I have watched Breaking Bad, Band of Brothers and The Sopranos. I think you will like The Wire and The Sopranos.
I have seen Breaking Bad and The Wire. I liked the first season of Breaking Bad, but I found the last season to be a bit weak. I think I would like to see the Wire more. I think it is very good. The Sopranos is a great show, but I have not watched it in a while.
I've seen Breaking Bad, The Wire, and Band of Brothers. I've only seen a couple of episodes of The Sopranos. I liked The Wire and the Sopranos, but I don't like Breaking Bad as much. The Wire is the best of the three.


### More Queries

In [8]:
prompt = """I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.\
Based on that, what language should I learn next?\
Give me 5 recommendations"""
get_llama_response(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.Based on that, what language should I learn next?Give me 5 recommendations of languages you can recommend.
I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.Based on that, what language should I learn next?Give me 5 recommendations of languages you can recommend.
If you want to learn a high-level language that is widely used in industry, go with Python. It's easy to learn, has a large community, and is very useful for web development. If you want to learn a low-level language that can be used in industry, go with C++. It's very useful for systems programming, and can be used in industry. If you want to learn a high-level language that is useful for industry, go with C#. It's very useful for industry, and can be used in industry. If you want to learn a high-level language

In [9]:
prompt = 'How to learn fast?\n'
get_llama_response(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: How to learn fast?
If you are wondering how to learn fast, there are a few things you can do to help you get the most out of your studies. Here are a few tips:
1. Set specific goals for yourself. This will help you stay motivated and focused on what you need to do.
2. Take advantage of the resources available to you. There are many different ways to learn, and it’s important to find what works best for you. If you’re struggling with a subject, don’t be afraid to ask for help.
3. Be patient. Learning takes time, so don’t get discouraged if you don’t see results right away. Keep at it, and eventually you’ll start to see the results you want.
4. Make a plan. If you have a clear idea of what you want to learn, you’ll be able to focus your time and energy on the most important things.
5. Don’t be afraid to ask questions. If you’re not sure about something, don’t be afraid to ask for help.
6. Take breaks. Learning is a process, and it’s important to take breaks to recharge your batt

In [10]:
prompt = 'I love basketball. Do you have any recommendations of team sports I might like?\n'
get_llama_response(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: I love basketball. Do you have any recommendations of team sports I might like?
I think the most fun team sports are basketball and soccer. Basketball is a lot of fun and soccer is a lot of fun, too. I think it depends on who your friends are. If you have some friends that are really good at basketball or soccer, it would be fun. But if you have friends that are just okay at it, it would be boring. If you have friends that are really good at it, it would be fun. But if you have friends that are just okay at it, it would be boring.
I love basketball and soccer. Basketball is a lot of fun and soccer is a lot of fun, too. I think it depends on who your friends are. If you have some friends that are really good at basketball or soccer, it would be fun. But if you have friends that are just okay at it, it would be boring. If you have friends that are really good at it, it would be fun. But if you have friends that are just okay at it, it would be boring.
I love basketball and socce

In [11]:
prompt = 'How to get rich?\n'
get_llama_response(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: How to get rich?
The question “How to get rich” has been asked by many people. It is a question that has been asked by many people. It is a question that has been asked by many people. It is a question that has been asked by many people.
How to get rich? The answer is not easy. There are many different ways to get rich. Some people say that it is easy to get rich. Some people say that it is not easy to get rich. It depends on the person. It depends on the person.
How to get rich? It depends on the person.
The answer to this question depends on the person. It depends on the person. It depends on the person.
How to get rich? It depends on the person. It depends on the person. It depends on the person.
How to get rich? It depends on the person. It depends on the person. It depends on the person.
How to get rich? It depends on the person. It depends on the person. It depends on the person.
How to get rich? It depends on the person. It depends on the person. It depends on the perso

### Make it conversational
Let's create an interactive chat loop, where you can converse with the Llama model.

Type your questions or comments, and see how the model responds!

In [12]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ["bye", "quit", "exit"]:
        print("Chatbot: Goodbye!")
        break
    get_llama_response(user_input)

You: what is future of AI?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Chatbot: what is future of AI? | AI Future
The future of artificial intelligence (AI) is an exciting and rapidly evolving field that has the potential to transform our world in many ways. AI is a subset of computer science that focuses on developing intelligent machines that can perform tasks that typically require human intelligence, such as problem-solving, decision-making, and natural language processing. AI has the potential to revolutionize many aspects of our lives, from the way we interact with technology to the way we make decisions.
One of the key aspects of AI is its ability to learn and improve over time. As we continue to use AI in different ways, it will become increasingly capable of learning from its experiences and adapting to new situations. This ability to learn and adapt is what sets AI apart from traditional computers, which are limited in their ability to change or improve over time.
Another key aspect of AI is its ability to process and analyze large amounts of da

### Conclusion

Thanks to the Hugging Face Library, creating a pipeline to chat with llama 3.2 (or any other open-source LLM) is quite easy.

But if you worked a lot with much larger models such as GPT-4, you need to adjust your expectations.