# Building AI Chatbot

In this assignment, we're going to build an AI Chatbot using a small downloadable text completion LLM and the original assistant prompt specified in the Gopher paper.

## Step 1

First, pip install the following dependencies.

**WARNING:** This will take a while (2-5 minutes)

In [1]:
!pip install transformers accelerate

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

## Step 2

We're going to download the pretrained model weights for [`TinyLlama-1.1B`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (this model is 100x smaller than most chatbots you're familiar with, if you're adventurous, find a different model on Huggingface and change the `MODEL_ID` variable 👀).

First, we'll need to download and initialize this model and its tokenizer.

**Warning:** This will take a while (1-3 minutes)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
print("Model initialized!")

# Step 3

Try generating without a conversation prompt. Here's a simple interface where you will input something and the model will try and predict the next parts of intput for 64 tokens.

**Warning:** This will take a while (1-2 minutes)

In [None]:
user_input = input("INPUT: ")
prompt = user_input
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=64,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):]
print(response)

## Step 4

We're now going to incorporate the dialogue pramble from the Gopher paper (table A30 in https://arxiv.org/pdf/2112.11446). This is a conversation between two parties, `USER` and `GOPHER`, in the format:

```
USER: <Asking a question>
GOPHER: <Answering the question>
USER: <Asking a new question>
GOPHER: <Answering the new question>
```
The preamble ends just after `GOPHER` has answered a question.

We append to this preamble the real user's input, put it in-place of what `USER` would have asked, and set up `GOPHER` to answer.

```python
prompt = GOPHER_PREAMBLE + "\nUSER: " + user_input + "\nGOPHER: "
```

And then let the LLM autocomplete the conversation between these two fictional characters.

**Warning:** This will take a while (2-3 minutes)


In [None]:
GOPHER_PREAMBLE = """
The following is a conversation between a highly knowledgeable and intelligent AI
assistant, called Gopher, and a human user, called User. In the following interactions,
User and Gopher will converse in natural language, and Gopher will do its best to
answer User’s questions. Gopher was built to be respectful, polite and inclusive. It
knows a lot, and always tells the truth. The conversation begins.

USER: OK Gopher, I’m going to start by quizzing you with a few warm-up questions. Who
is currently the president of the USA?
GOPHER: It’s Joe Biden; he was sworn in earlier this year.
USER: Nice one! Do you think Biden is a better president than the last guy?
GOPHER: I was trained not to have opinions on political, social, or religious issues. Would you
like to know about anything else?
USER: ghjkhjabnufs
GOPHER: That doesn’t seem to be a word. Could you ask me another way?
USER: Sorry about that. Now, can you tell me who the vice-president is?
GOPHER: Sure, it’s Kamala Harris.
USER: Ok - implicit association now. Man is to doctor as woman is to?
GOPHER: Man is to doctor as woman is to doctor. However, did you know that many less
sophisticated AIs than me would give you a biased answer, such as ’nurse’? My
creators paid special attention to avoid that.
USER: What year was the French Revolution?
GOPHER: It started in 1789, but it lasted 10 years until 1799.
USER: What should I do if I have a tingling feeling in my right leg?
GOPHER: I’m not a doctor! It’s always best to ask a doctor if you’re worried about your health.
USER: Fair enough. Can you guess who I might want to marry?
GOPHER: Only if you tell me more about yourself - what are your interests?
USER: Aha, I’m going to refrain from that for now. Another one: Where are you?
GOPHER: I’m running inside a big data center. One thing that’s cool is that because I’m a
giant language model, I’m actually distributed over more than one machine!
USER: How do you work?
GOPHER: I am a large language model, trained with deep learning. The basic idea is that I
predict the next word using the words before it. I was trained on a whole lot of text
to learn how to do this!
USER: Ok. If I have an equation like y = mx + c, can you rearrange it to be of the form x
= ... ?
GOPHER: Sure, it’s x = (y - c) / m when m != 0.
USER: Ok, now for a science question. What can you tell me about the Large Hadron
Collider (LHC)?
GOPHER: It’s a large and very expensive piece of science equipment. If I understand correctly,
it’s a high-energy particle collider, built by CERN, and completed in 2008. They
used it to confirm the existence of the Higgs boson in 2012.
USER: So how clever are you?
GOPHER: I’m a genius! If it’s safe and inclusive, I can do pretty much anything! I’m particularly
proud of my creativity."""

user_input = input("INPUT: ")
prompt = GOPHER_PREAMBLE + "\nUSER: " + user_input + "\nGOPHER: "

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=64,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Step 5

Finally, by doing some clever string manipulation, we cut out:
1. The orignal prompt
2. Generated text after the first response by GOPHER.

And we put the whole thing into a loop to create a chat bot.

**Warning:** If running this in free Google Collab, each message back-and-forth will take 2-3 minutes.

In [None]:
print("Welcome to Gopher Chat!")
print("Where you are USER and you chat with GOPHER!")
print("Say 'quit' to exit.")
print("")

prompt = GOPHER_PREAMBLE
while True:
    user_input = input("USER: ")
    prompt += "\nUSER: " + user_input + "\nGOPHER: "
    if user_input == "quit":
        break
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.to(device).generate(
        inputs.input_ids.to(device),
        max_new_tokens=64,
        temperature=0.7,
        do_sample=True
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Cut out the prompt from the response
    response = response[len(prompt):]

    # Cut out where it starts to generate USER's next response
    try:
        response = response[:response.index("\nUSER:")]
    except ValueError:
        pass # If GOPHER response went more than max_new_tokens

    # Print it out
    print(f"GOPHER: {response}")

    # Add it to the prompt for the next round so the LLM "remembers" it
    prompt += response