<a href="https://colab.research.google.com/github/vishnusureshperumbavoor/chat_with_llms/blob/main/chat_with_llama3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from IPython.display import Markdown, display
display(Markdown("#Chat with LLaMA2 by VSP"))

#Chat with LLaMA2 by VSP

# Installations

Before we proceed, we need to ensure that the essential libraries are installed:
- `Hugging Face Transformers`: Provides us with a straightforward way to use pre-trained models.
- `PyTorch`: Serves as the backbone for deep learning operations.
- `Accelerate`: Optimizes PyTorch operations, especially on GPU.

In [None]:
!pip install transformers torch accelerate

# Prerequisites

To load our desired model, `meta-llama/Llama-2-7b-chat-hf`, we first need to authenticate ourselves on Hugging Face. This ensures we have the correct permissions to fetch the model.

1. Gain access to the model on Hugging Face: [Link](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
2. Add your huggingface API token in the secret tab in left sidebar and allow access.



# Loading Model & Tokenizer

Here, we are preparing our session by loading both the Llama model and its associated tokenizer.

The tokenizer will help in converting our text prompts into a format that the model can understand and process.

In [None]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Meta-Llama-3-8B" # meta-llama/Llama-2-7b-hf

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)



tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Creating the Llama Pipeline

We'll set up a pipeline for text generation.

This pipeline simplifies the process of feeding prompts to our model and receiving generated text as output.

*Note*: This cell takes 2-3 minutes to run

In [None]:
from transformers import pipeline

llama_pipeline = pipeline(
    "text-generation",  # LLM task
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

# Getting Responses

With everything set up, let's see how Llama responds to some sample queries.

In [None]:
def get_llama_response(prompt: str) -> None:
    """
    Generate a response from the Llama model.

    Parameters:
        prompt (str): The user's input/question for the model.

    Returns:
        None: Prints the model's response.
    """
    sequences = llama_pipeline(
        prompt,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=256,
    )
    print("Chatbot:", sequences[0]['generated_text'])



prompt = 'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n'
get_llama_response(prompt)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Chatbot: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

I'm open to a wide range of genres, from drama to comedy to sci-fi/fantasy. Let me know if you have any suggestions!


# More Queries

In [None]:
prompt = """I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.\
Based on that, what language should I learn next?\
Give me 5 recommendations"""
get_llama_response(prompt)

Chatbot: I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.Based on that, what language should I learn next?Give me 5 recommendations for languages that are similar to Python and have a wide range of applications.

Answer: As a programmer who enjoys Python, you may want to consider learning other languages that share similar characteristics and have a wide range of applications. Here are five recommendations for languages that are similar to Python and have a diverse range of uses:

1. JavaScript: JavaScript is a popular language for web development, and is used by over 90% of websites for client-side scripting. It's also used in mobile app development, game development, and server-side programming. JavaScript has a syntax similar to Python, and its versatility makes it a great language to learn for anyone who enjoys Python.
2. Ruby: Ruby is a dynamic language that's known for its simplicity and readability

In [None]:
prompt = 'How to learn fast?\n'
get_llama_response(prompt)

Chatbot: How to learn fast?

Here are some tips on how to learn fast:

1. Set clear goals: Setting clear goals helps you focus your efforts and stay motivated. Write down what you want to achieve and track your progress.
2. Break it down: Break down complex topics into smaller, manageable chunks. This helps you understand the material better and retain it longer.
3. Practice consistently: Consistency is key to learning fast. Set aside a specific time each day or week to practice and review the material.
4. Use active learning techniques: Active learning techniques involve engaging with the material rather than just passively reading or listening. Try summarizing the material in your own words, creating flashcards, or taking practice quizzes.
5. Get enough sleep: Sleep plays an important role in memory consolidation and learning. Aim for 7-9 hours of sleep each night to help your brain process and retain the information you're learning.
6. Teach someone else: Teaching someone else what 

In [None]:
prompt = 'I love basketball. Do you have any recommendations of team sports I might like?\n'
get_llama_response(prompt)

Chatbot: I love basketball. Do you have any recommendations of team sports I might like?
I'm glad you're interested in trying out new sports! While basketball is a great sport, there are plenty of other team sports that you might enjoy. Here are some recommendations based on your interests:

1. Volleyball: Like basketball, volleyball is a fast-paced, high-energy sport that requires good hand-eye coordination and teamwork. It's also a great workout, as it involves a lot of running and jumping.
2. Soccer: If you enjoy running and physical contact, soccer might be the sport for you. It's a great way to stay active and work on your cardiovascular fitness, and it's also a lot of fun to play with a team.
3. Lacrosse: Lacrosse is a sport that combines elements of basketball, soccer, and hockey. It's a great workout that requires good hand-eye coordination and agility, and it's also a lot of fun to play with a team.
4. Field Hockey: Field hockey is a sport that's similar to lac


In [None]:
prompt = 'How to get rich?\n'
get_llama_response(prompt)

Chatbot: How to get rich?

There is no one-size-fits-all formula for becoming rich, as wealth accumulation often involves a combination of hard work, smart financial decisions, and a bit of luck. However, here are some general tips that may help you on your journey to wealth:

1. Start by setting clear financial goals: What do you want to achieve? When do you want to achieve it? How much money do you need to make it happen? Write down your goals and make them specific, measurable, achievable, relevant, and time-bound (SMART).
2. Live below your means: Spend less than you earn. Create a budget that accounts for all your expenses, and make sure you're not overspending. Cut back on unnecessary expenses like dining out or subscription services you don't use.
3. Invest wisely: Invest your money in assets that have a high potential for growth, such as stocks, real estate, or a small business. Do your research, diversify your portfolio, and avoid get-rich-quick schemes.
4. Build multiple inco

# Problems

After 3-4 prompts, the model stops giving responses. It only outputs the user prompt.

To keep talking to the model, you need to restart the notebook: `Runtime -> Restart Runtime` and run the notebook again...

# Make it conversational
Let's create an interactive chat loop, where you can converse with the Llama model.

Type your questions or comments, and see how the model responds!

In [None]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ["bye", "quit", "exit"]:
        print("Chatbot: Goodbye!")
        break
    get_llama_response(user_input)

KeyboardInterrupt: Interrupted by user