![grafik](https://github.com/tum-ai/ai-school-2022/blob/main/images/tum-ai-logo.png?raw=true)

# Goal
Our goal in this notebook is to create a simple chatbot.

We will do this by using an already trained model from hugging face and creating a small interface to be able to write with the model. Once this is done, you can yourself do a small "Turing test" and see how well the model is able to communicate like a human.

# Step 0: Preparing Hugging Face 🤗

In [None]:
# Installing Hugging Face
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.3-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 4.0 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 43.5 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 8.7 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.3


# Step 1: Selecting the model
To create a chatbot, we need a model which is able to take in our previous text messages and then uses this information to create a new text response. Therefore we are using a Text Generation model.

Here we are going to use the already trained model gpt2, which stands for generative Pre-trained Transformer.

In [None]:
# import huggingface
from transformers import pipeline, GPT2Tokenizer

# load GPT-2 model
generator = pipeline('text-generation', model='gpt2')

# Load tokenizer for GPT-2 model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

In [None]:
# Generating some examples
# Feel free to add any prompt here and test the capabilities of the model
prompt = "Once upon a time,"

# generating sequences and printing them
sequences = generator(prompt, max_new_tokens=20, num_return_sequences=3, eos_token_id=tokenizer.eos_token_id)
for i, sequence in enumerate(sequences):
  print(f'Completion {i+1}:')
  print(sequence['generated_text'], '\n')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completion 1:
Once upon a time, the entire cosmos was made up of millions of tiny galaxies, the vast majority of which are not yet 

Completion 2:
Once upon a time, you see her staring at you in disbelief. How did you feel about this? If you said it 

Completion 3:
Once upon a time, I was living in the shadow of my old self before I became completely immersed in the reality the world 



# Step 2: Designing the prompt
As of now, we can use the model to continue writing a given prompt. But for our chatbot, we do not simply have a long text but instead multiple messages. Thats when we enter the territory of `Prompt Engineering`.

Our first task is to represent multiple messages in a single text prompt. Here we chose to do that by restricting messages to only be one line and writing the auther of the message in each line.

In [None]:
# Feel free to edit the prompt and replace the messages or add new ones
prompt = """Human: Hello, how are you?
AI: I am fine. Thanks.
Human: Do you like pineapple on pizza?
AI: No.
Human: Good, what did you do yesterday?
AI:"""

# generating sequences and printing them
sequences = generator(prompt, max_new_tokens=20, num_return_sequences=3, eos_token_id=tokenizer.eos_token_id)
for i, sequence in enumerate(sequences):
  print(f'Completion {i+1}:')
  print(sequence['generated_text'], '\n')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completion 1:
Human: Hello, how are you?
AI: I am fine. Thanks.
Human: Do you like pineapple on pizza?
AI: No.
Human: Good, what did you do yesterday?
AI: Nothing. I just got into a ruckus with my roommate.
Crazy Planet: He 

Completion 2:
Human: Hello, how are you?
AI: I am fine. Thanks.
Human: Do you like pineapple on pizza?
AI: No.
Human: Good, what did you do yesterday?
AI: I cooked at 1:05.
Human: Oh, sorry. What are you doing?
 

Completion 3:
Human: Hello, how are you?
AI: I am fine. Thanks.
Human: Do you like pineapple on pizza?
AI: No.
Human: Good, what did you do yesterday?
AI: Well… I got my hair done after getting an appointment at the pharmacy.
Humans: How 



Now that the model tries to continue the conversation, we can still do some more prompt engineering. We can, for example, give the bot a more character/personality by adjusting the prompt a bit.

In [None]:
# Adding emojis or changing the answers can have quite an impact on
# the responses "personality"
prompt = """Human: Hello, how are you?
AI: Heeey, I am fine. Thanks :)
Human: Do you like pineapple on pizza?
AI: Haha, no, you better keep that far away from me! :P
Human: Good, what did you do yesterday?
AI:"""

# generating sequences and printing them
sequences = generator(prompt, max_new_tokens=20, num_return_sequences=3, eos_token_id=tokenizer.eos_token_id)
for i, sequence in enumerate(sequences):
  print(f'Completion {i+1}:')
  print(sequence['generated_text'], '\n')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completion 1:
Human: Hello, how are you?
AI: Heeey, I am fine. Thanks :)
Human: Do you like pineapple on pizza?
AI: Haha, no, you better keep that far away from me! :P
Human: Good, what did you do yesterday?
AI: My wife fell asleep.
E-H-H-H-H
E-H- 

Completion 2:
Human: Hello, how are you?
AI: Heeey, I am fine. Thanks :)
Human: Do you like pineapple on pizza?
AI: Haha, no, you better keep that far away from me! :P
Human: Good, what did you do yesterday?
AI: Well, I'm just trying to stay awake, so I don't need your sleep all day! 

Completion 3:
Human: Hello, how are you?
AI: Heeey, I am fine. Thanks :)
Human: Do you like pineapple on pizza?
AI: Haha, no, you better keep that far away from me! :P
Human: Good, what did you do yesterday?
AI: Hmm, a quick stroll around in the village.
Human: Where did your parents come from? 



# Step 3: Creating an interactive chatbot
Now we want to use the output to actually create our chatbot. First, we can reduce

In [None]:
prompt = """Tom: Hi Julian, how are you?
Julian: Hi Tom, I am fine. Thanks.
Tom: I just wanted to ask you a few things. Do you have some time?
Julian: Yea sure. Just ask :D"""

print('Type "exit" to exit the chat\n')
for i in range(100):
  inpt = input('You: ')
  if inpt == 'exit':
    break
  prompt += '\nTom: ' + inpt + '\nJulian:'
  sequence = generator(
      prompt,
      max_new_tokens=50,
      num_return_sequences=1,
      pad_token_id=tokenizer.eos_token_id,
      no_repeat_ngram_size=3,
      do_sample=True,
      top_k=50, 
      top_p=0.7,
      temperature = 0.8
      )[0]
  answer = sequence['generated_text'][len(prompt):]
  answer = answer.split('\n')[0]
  print(f'AI: {answer}')
  prompt += answer


Type "exit" to exit the chat

You: hi
AI:  You have some work to do and I am just wondering if you could help me with that.
You: exit
