# **Deep Natural Language Processing @ PoliTO**



# Simple chatbot architecture using RASA

Chatbots can be defined as a computer program that simulates a conversation with a human user. They are used in a wide range of applications, from customer service to e-commerce. In this practice, we will explore the RASA framework and build a simple chatbot that can answer some questions in specific domains.

The goal of the practice is to explore the usage of intents, stories and domain definitions to add specific properties to our chatbot.

The following cells install the RASA framework and set up the environment to start working on the practice. Please run them before starting the practice and restart the runtime when asked (there is a comment in the cell indicating when to restart).

In [None]:
!pip install ipython
!pip install nest_asyncio
!pip install -U rasa

Collecting jedi>=0.16 (from ipython)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jedi
Successfully installed jedi-0.19.2
Collecting rasa
  Downloading rasa-3.6.20-py3-none-any.whl.metadata (28 kB)
Collecting CacheControl<0.13.0,>=0.12.9 (from rasa)
  Downloading CacheControl-0.12.14-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting SQLAlchemy<1.5.0,>=1.4.0 (from rasa)
  Downloading SQLAlchemy-1.4.54-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting aio-pika<8.2.4,>=6.7.1 (from rasa)
  Downloading aio_pika-8.2.3-py3-none-any.whl.metadata (9.5 kB)
Collecting aiogram<2.26 (from rasa)
  Downloading aiogram-2.25.2-py3-none-any.whl.metadata (3.9 kB)
Collecting aiohttp<3.10,>=3.9.0 (from rasa)


In [None]:
!pip install -U ipython
# restart runtime: Runtime -> Restart runtime

In [None]:
# legacy setup, left for consistency
# !pip install --upgrade pip==20.2
# !pip install ipython
# !pip install nest_asyncio
# !pip install tensoflow <= 2.4
# !pip install -U rasa
# !pip install awscli --ignore-installed six

# !pip install -U ipython
# restart runtime: Runtime -> Restart runtime

## **First steps with RASA chatbots**

Before diving into the practice, let's see how to use the RASA framework to build a simple chatbot. Use the simple chatbot example provided in the RASA documentation to build the simplest chatbot possible. The following cells will guide you through the process.

In [None]:
%load_ext autoreload
%autoreload 2

import os
import rasa
import nest_asyncio
from rasa.cli.scaffold import create_initial_project

os.chdir("/content/")

nest_asyncio.apply()
print("Event loop ready.")

project = "my-chatbot"
create_initial_project(project)
os.chdir(project)

config = "config.yml"
training_files = "data/"
domain = "domain.yml"
output = "models/"
print(config, training_files, domain, output)

model_path = rasa.train(domain, config, [training_files], output)
model_path = model_path.model

Event loop ready.
config.yml data/ domain.yml models/


  Base: DeclarativeMeta = declarative_base()


(0lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk(B
(0x(B Rasa Open Source reports anonymous usage telemetry to help improve the product (0x(B
(0x(B for all its users.                                                             (0x(B
(0x(B                                                                                (0x(B
(0x(B If you'd like to opt-out, you can use `rasa telemetry disable`.                (0x(B
(0x(B To learn more, check out https://rasa.com/docs/rasa/telemetry/telemetry.       (0x(B
(0mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj(B
[94mThe configuration for pipeline and policies was chosen automatically. It was written into the config file at 'config.yml'.[0m


  return self.randrange(a, b+1)
Epochs: 100%|██████████| 100/100 [00:33<00:00,  3.01it/s, t_loss=1.1, i_acc=1]
Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 380.55it/s, # trackers=1]
Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 274.32it/s, # trackers=3]
Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 187.46it/s, # trackers=12]
Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 63.12it/s, # trackers=39]
Processed rules: 100%|██████████| 2/2 [00:00<00:00, 475.71it/s, # trackers=1]
Processed trackers: 100%|██████████| 3/3 [00:00<00:00, 165.41it/s, # action=12]
Processed actions: 12it [00:00, 598.27it/s, # examples=12]
Processed trackers: 100%|██████████| 2/2 [00:00<00:00, 168.22it/s, # action=5]
Processed actions: 5it [00:00, 916.19it/s, # examples=4]
Processed trackers: 100%|██████████| 3/3 [00:00<00:00, 196.23it/s, # action=12]
Processed trackers:   0%|          | 0/2 [00:00<?, ?it/s]

2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] user intent: goodbye | previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] user intent: goodbye | previous action name: action_listen
[state 3] user intent: goodbye | previous action name: utter_goodbye
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] user intent: bot_challenge | previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions

Processed trackers: 100%|██████████| 2/2 [00:00<00:00, 207.58it/s]
Processed trackers:   0%|          | 0/5 [00:00<?, ?it/s]

2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: greet | previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: greet | previous action name: action_listen
[state 2] user intent: greet | previous action name: utter_greet
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: greet | previous action name: action_listen
[state 2] user intent: greet | previous action name: utter_greet
[state 3] previous action name: action_listen
2024-11-26 17:43:35 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: greet | previous action name: action_listen
[state 2] user intent: greet | previous action name: utter_greet
[state 3] user intent: mood_great | previous actio

Processed trackers: 100%|██████████| 5/5 [00:00<00:00, 97.65it/s]
Processed trackers: 100%|██████████| 120/120 [00:00<00:00, 1104.26it/s, # action=30]
  return np.array(ragged_array)
Epochs: 100%|██████████| 100/100 [00:17<00:00,  5.78it/s, t_loss=2.77, loss=2.61, acc=0.967]
Processed trackers: 100%|██████████| 120/120 [00:00<00:00, 1853.67it/s, # intent=12]
Epochs: 100%|██████████| 100/100 [00:15<00:00,  6.51it/s, t_loss=0.137, loss=0.0222, acc=1]
Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)
  quantile_values = np.quantile(  # type: ignore[call-overload]


[92mYour Rasa model is trained and saved at 'models/20241126-174259-humane-sink.tar.gz'.[0m




In [None]:
from rasa.jupyter import chat

endpoints = None
chat(model_path, endpoints)

  return self.randrange(a, b+1)


Your bot is ready to talk! Type your messages here or send '/stop'.
Hello
2024-11-26 17:45:16 [debug    ] processor.actions.policy_prediction prediction_events=[]
2024-11-26 17:45:16 [debug    ] processor.actions.log          action_name=action_session_start rasa_events=[<rasa.shared.core.events.SessionStarted object at 0x7f6441ad4490>, ActionExecuted(action: action_listen, policy: None, confidence: None)]
2024-11-26 17:45:16 [debug    ] processor.slots.log            slot_values=	session_started_metadata: None
2024-11-26 17:45:17 [debug    ] processor.message.parse        parse_data_entities=[] parse_data_intent={'name': 'greet', 'confidence': 0.9999998807907104} parse_data_text=Hello
2024-11-26 17:45:17 [debug    ] processor.extract.slots        action_extract_slot=action_extract_slots len_extraction_events=0 rasa_events=[]
2024-11-26 17:45:17 [debug    ] memoization.predict.actions    tracker_states=[{}, {'user': {'intent': 'greet'}, 'prev_action': {'action_name': 'action_listen'}}]

## **Your own chatbot**

RASA chatbots exploit the following files to recognize intents and take the corresponding actions:

- `/data/nlu.yml`: it contains the set of intents that are used by the chatbot for recognizing user requests. This file contains examples that are used to generate examples to train the chatbot. See more: NLU in [RASA doc](https://rasa.com/docs/rasa/training-data-format/#nlu-training-data).

- `stories.yml`: this file contains the examples of interactions between the chatbot and the user. They define possible paths of the conversations with corresponding chatbot actions and/or responses for each user input. See more: Stories in [RASA doc](https://rasa.com/docs/rasa/stories).

- `domain.yml`: according to the official documentation: `The domain defines the universe in which your assistant operates. It specifies the intents, entities, slots, responses, forms, and actions your bot should know about.` This file contains a list of information that your chatbot needs to know to operate. See more: Domain in [RASA doc](https://rasa.com/docs/rasa/domain/).


Modify the base chatbot to recognize one or multiple new intents (e.g., user looking for the weather).

In [None]:

import os
import rasa
import nest_asyncio
from rasa.cli.scaffold import create_initial_project


project = "my-chatbot"
if not os.path.exists(project):
    create_initial_project(project)
os.chdir(project)


nest_asyncio.apply()


config = "config.yml"
training_files = "data/"
domain = "domain.yml"
output = "models/"


nlu_data = """
version: "3.1"

nlu:
- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - good morning
    - good evening
    - hey there

- intent: goodbye
  examples: |
    - bye
    - goodbye
    - see you around
    - see you later

- intent: affirm
  examples: |
    - yes
    - indeed
    - of course
    - that sounds good
    - correct

- intent: deny
  examples: |
    - no
    - never
    - I don't think so
    - don't like that
    - no way
    - not really

- intent: mood_great
  examples: |
    - perfect
    - great
    - amazing
    - feeling like a king
    - wonderful
    - I am feeling very good
    - I am great
    - I'm good

- intent: mood_unhappy
  examples: |
    - sad
    - very sad
    - unhappy
    - bad
    - I don't feel very well
    - I am sad
    - I am depressed
    - I'm not feeling very well

- intent: bot_challenge
  examples: |
    - are you a bot?
    - are you a human?
    - am I talking to a bot?
    - am I talking to a human?

- intent: get_weather
  examples: |
    - What's the weather like today?
    - Tell me the weather forecast.
    - How's the weather?
    - Is it going to rain today?
    - Do I need an umbrella today?
    - What's the temperature outside?
"""

with open('data/nlu.yml', 'w') as f:
    f.write(nlu_data)


domain_data = """
version: "3.1"

intents:
  - greet
  - goodbye
  - affirm
  - deny
  - mood_great
  - mood_unhappy
  - bot_challenge
  - get_weather

responses:
  utter_greet:
    - text: "Hey! How are you?"

  utter_cheer_up:
    - text: "Here is something to cheer you up!"
      image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
    - text: "Did that help you?"

  utter_goodbye:
    - text: "Bye! Take care."

  utter_happy:
    - text: "Great, carry on!"

  utter_bot_challenge:
    - text: "I am a bot, powered by Rasa."

  utter_weather:
    - text: "The weather today is sunny with a high of 25 degrees Celsius."

  utter_iamabot:
    - text: "I am a bot, powered by Rasa."

session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: true
"""

with open('domain.yml', 'w') as f:
    f.write(domain_data)


stories_data = """
version: "3.1"

stories:
- story: greet user
  steps:
  - intent: greet
  - action: utter_greet

- story: say goodbye
  steps:
  - intent: goodbye
  - action: utter_goodbye

- story: get weather
  steps:
  - intent: get_weather
  - action: utter_weather

- story: bot challenge
  steps:
  - intent: bot_challenge
  - action: utter_iamabot
"""

with open('data/stories.yml', 'w') as f:
    f.write(stories_data)


rules_data = """
version: "3.1"

rules:
- rule: Fallback rule
  steps:
  - intent: nlu_fallback
  - action: utter_iamabot
"""

with open('data/rules.yml', 'w') as f:
    f.write(rules_data)


model_path = rasa.train(domain, config, [training_files], output)
model_path = model_path.model


from rasa.jupyter import chat

endpoints = None
chat(model_path, endpoints)


[94mThe configuration for pipeline and policies was chosen automatically. It was written into the config file at 'config.yml'.[0m


Epochs: 100%|██████████| 100/100 [00:25<00:00,  3.98it/s, t_loss=1.12, i_acc=1]
Processed story blocks: 100%|██████████| 4/4 [00:00<00:00, 425.05it/s, # trackers=1]
Processed story blocks: 100%|██████████| 4/4 [00:00<00:00, 430.48it/s, # trackers=4]
Processed story blocks: 100%|██████████| 4/4 [00:00<00:00, 95.13it/s, # trackers=20]
Processed story blocks: 100%|██████████| 4/4 [00:00<00:00, 67.59it/s, # trackers=50]
Processed rules: 100%|██████████| 1/1 [00:00<00:00, 523.96it/s, # trackers=1]
Processed trackers: 100%|██████████| 4/4 [00:00<00:00, 215.72it/s, # action=9]
Processed actions: 9it [00:00, 939.02it/s, # examples=9]
Processed trackers: 100%|██████████| 1/1 [00:00<00:00, 150.59it/s, # action=3]
Processed actions: 3it [00:00, 1044.23it/s, # examples=2]
Processed trackers: 100%|██████████| 4/4 [00:00<00:00, 422.32it/s, # action=9]
Processed trackers:   0%|          | 0/1 [00:00<?, ?it/s]

2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] user intent: nlu_fallback | previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: ...
[state 2] user intent: nlu_fallback | previous action name: action_listen
[state 3] user intent: nlu_fallback | previous action name: utter_iamabot


Processed trackers: 100%|██████████| 1/1 [00:00<00:00, 254.11it/s]
Processed trackers:   0%|          | 0/5 [00:00<?, ?it/s]

2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: goodbye | previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: goodbye | previous action name: action_listen
[state 2] user intent: goodbye | previous action name: utter_goodbye
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: bot_challenge | previous action name: action_listen
2024-11-26 17:49:42 [debug    ] rule_policy.actions.find       current_states=
[state 1] user intent: bot_challenge | previous action name: action_listen
[state 2] user intent: bot_challenge | previous action name: utter_iamabot
20

Processed trackers: 100%|██████████| 5/5 [00:00<00:00, 288.26it/s]
Processed trackers: 100%|██████████| 244/244 [00:00<00:00, 539.77it/s, # action=169]
  return np.array(ragged_array)
Epochs: 100%|██████████| 100/100 [00:24<00:00,  4.09it/s, t_loss=0.237, loss=0.073, acc=1]
Processed trackers: 100%|██████████| 244/244 [00:00<00:00, 648.02it/s, # intent=61]
Epochs: 100%|██████████| 100/100 [00:17<00:00,  5.64it/s, t_loss=0.111, loss=0.00578, acc=1]


[92mYour Rasa model is trained and saved at 'models/20241126-174910-partial-luggage.tar.gz'.[0m


  return self.randrange(a, b+1)


Your bot is ready to talk! Type your messages here or send '/stop'.
how is weather?
2024-11-26 17:51:18 [debug    ] processor.actions.policy_prediction prediction_events=[]
2024-11-26 17:51:18 [debug    ] processor.actions.log          action_name=action_session_start rasa_events=[<rasa.shared.core.events.SessionStarted object at 0x7f6443537640>, ActionExecuted(action: action_listen, policy: None, confidence: None)]
2024-11-26 17:51:18 [debug    ] processor.slots.log            slot_values=	session_started_metadata: None
2024-11-26 17:51:18 [debug    ] processor.message.parse        parse_data_entities=[] parse_data_intent={'name': 'get_weather', 'confidence': 0.9999924898147583} parse_data_text=how is weather?
2024-11-26 17:51:18 [debug    ] processor.extract.slots        action_extract_slot=action_extract_slots len_extraction_events=0 rasa_events=[]
2024-11-26 17:51:18 [debug    ] memoization.predict.actions    tracker_states=[{}, {'user': {'intent': 'get_weather'}, 'prev_action': {'

# Create a chatbot using Transformers

RASA is a powerful framework that allows to build chatbots with a wide range of functionalities. It can rely on external models for the the generation of new responses. In this section, we will use the Transformers library to create a chatbot that can generate responses to user requests.

![](https://huggingface.co/front/thumbnails/dialogpt.png)

On the other side, [HuggingFace pipeline module](https://huggingface.co/docs/transformers/master/en/main_classes/pipelines) offers an easy interface to use pre-trained models. In particular, we will use the `pipeline` function to create a chatbot that can generate responses to user requests (e.g., DialoGPT). The conversational pipeline allows the implementation of a simple chatbot with carry-on conversations. It exploits the DialoGPT models available on the model hub.

##DialoGPT single answer**

Creating a conversational pipeline. The code that takes a user request and returns the generated response.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")


def generate_response(user_input, chat_history_ids=None):
    """
    Generates a response to the user input using DialoGPT.

    Args:
        user_input (str): The input text from the user.
        chat_history_ids (torch.Tensor, optional): The history of the conversation.

    Returns:
        response (str): The generated response from the model.
        chat_history_ids (torch.Tensor): Updated chat history including the latest user input and model response.
    """

    new_user_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')


    if chat_history_ids is not None:
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids


    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.75
    )


    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

def chat():
    print("DialoGPT Chatbot\nType 'exit' to end the conversation.\n")
    chat_history_ids = None

    while True:

        user_input = input("You: ")
        if user_input.lower() == 'exit':
            print("Chatbot: Goodbye!")
            break


        response, chat_history_ids = generate_response(user_input, chat_history_ids)


        print(f"Chatbot: {response}\n")

if __name__ == "__main__":
    chat()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.75G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.75G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

DialoGPT Chatbot
Type 'exit' to end the conversation.

You: how is weather?


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Chatbot: Not too hot and not too cold

You: do you have any feeling?
Chatbot: I feel like a very warm blanket.

You: exit
Chatbot: Goodbye!


## **Question 4: Conversations**

Extend the previous function to generate a conversation with DialoGPT.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
import torch

class StopSequenceCriteria(StoppingCriteria):
    def __init__(self, stop_sequence_ids):
        self.stop_sequence_ids = stop_sequence_ids

    def __call__(self, input_ids, scores, **kwargs):

        if input_ids.shape[-1] < len(self.stop_sequence_ids):
            return False


        if torch.all(
            input_ids[0, -len(self.stop_sequence_ids):] == torch.tensor(self.stop_sequence_ids)
        ):
            return True
        return False


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

# Define the stop sequence
STOP_SEQUENCE = "Goodbye!"
stop_sequence_ids = tokenizer.encode(STOP_SEQUENCE, add_special_tokens=False)

def generate_response(user_input, chat_history_ids=None):
    """
    Generates a response to the user input using DialoGPT.

    Args:
        user_input (str): The input text from the user.
        chat_history_ids (torch.Tensor, optional): The history of the conversation.

    Returns:
        response (str): The generated response from the model.
        chat_history_ids (torch.Tensor): Updated chat history including the latest user input and model response.
    """
    # encode the user input and add end-of-string token
    new_user_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history (if exists)
    if chat_history_ids is not None:
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids


    stopping_criteria = StoppingCriteriaList([StopSequenceCriteria(stop_sequence_ids)])


    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.75,
        stopping_criteria=stopping_criteria
    )


    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

def chat():
    print("DialoGPT Chatbot\nType 'exit', 'bye', or 'goodbye' to end the conversation.\n")
    chat_history_ids = None


    exit_commands = ['exit', 'bye', 'goodbye']

    while True:

        user_input = input("You: ").strip()


        if user_input.lower() in exit_commands:
            print("Chatbot: Goodbye!")
            break


        response, chat_history_ids = generate_response(user_input, chat_history_ids)


        print(f"Chatbot: {response}\n")


        if STOP_SEQUENCE.lower() in response.lower():
            print("Chatbot: Conversation has ended as per the stop sequence.")
            break

if __name__ == "__main__":
    chat()


DialoGPT Chatbot
Type 'exit', 'bye', or 'goodbye' to end the conversation.

You: hello. how are you?
Chatbot: I'm good

You: bye
Chatbot: Goodbye!


## **Improving interaction**

The conversations in the previous questions show very limited variability. Create a new function to manually conversate with DialoGPT model by setting different parameters (e.g., [beam search](https://en.wikipedia.org/wiki/Beam_search) is disabled by default). To do so, instantiate a new [DialoGPT model](https://huggingface.co/microsoft/DialoGPT-medium), which inherits from the `AutoModelForCausalLM`.

Here it is a simple blog post that shows how to use different decoding strategies with DialoGPT: [https://huggingface.co/blog/how-to-generate](https://huggingface.co/blog/how-to-generate).

**Note 1**: Basic examples on how to use the model are provided [here](https://huggingface.co/microsoft/DialoGPT-medium#how-to-use).

**Note 2**: Take some time to explore the input for the [generate](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) function. Hereinafter some examples of relevant parameters.
```
num_beams (int, optional, defaults to 1) — Number of beams for beam search. 1 means no beam search.
temperature (float, optional, defaults to 1.0) — The value used to module the next token probabilities.
```

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
import torch

class StopSequenceCriteria(StoppingCriteria):
    def __init__(self, stop_sequence_ids):
        """
        Initializes the stopping criteria with the token IDs of the stop sequence.

        Args:
            stop_sequence_ids (list): List of token IDs representing the stop sequence.
        """
        self.stop_sequence_ids = stop_sequence_ids

    def __call__(self, input_ids, scores, **kwargs):
        """
        Determines whether to stop generation based on the stop sequence.

        Args:
            input_ids (torch.Tensor): The sequence of generated token IDs.
            scores (torch.Tensor): The scores for the next token predictions.

        Returns:
            bool: True if the stop sequence is detected; False otherwise.
        """

        if input_ids.shape[-1] < len(self.stop_sequence_ids):
            return False


        last_tokens = input_ids[0, -len(self.stop_sequence_ids):].tolist()

        return last_tokens == self.stop_sequence_ids


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

def generate_response(
    user_input,
    chat_history_ids=None,
    do_sample=True,
    num_beams=1,
    top_k=50,
    top_p=0.95,
    temperature=0.7,
    no_repeat_ngram_size=3,
    max_length=1000,
    stop_sequence=None
):
    """
    Generates a response to the user input using DialoGPT with customizable parameters.

    Args:
        user_input (str): The input text from the user.
        chat_history_ids (torch.Tensor, optional): The history of the conversation.
        do_sample (bool): Whether to use sampling; otherwise, greedy decoding.
        num_beams (int): Number of beams for beam search.
        top_k (int): The number of highest probability vocabulary tokens to keep for top-k-filtering.
        top_p (float): The cumulative probability for nucleus sampling.
        temperature (float): Temperature value for scaling logits.
        no_repeat_ngram_size (int): Prevent repeating n-grams of this size.
        max_length (int): The maximum length of the generated response.
        stop_sequence (str, optional): A phrase that, if generated, will stop further generation.

    Returns:
        response (str): The generated response from the model.
        chat_history_ids (torch.Tensor): Updated chat history including the latest user input and model response.
    """

    new_user_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')


    if chat_history_ids is not None:
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids


    stopping_criteria = None
    if stop_sequence:
        stop_sequence_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
        stopping_criteria = StoppingCriteriaList([StopSequenceCriteria(stop_sequence_ids)])


    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=max_length,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=no_repeat_ngram_size,
        do_sample=do_sample,
        top_k=top_k,
        top_p=top_p,
        temperature=temperature,
        num_beams=num_beams,
        stopping_criteria=stopping_criteria
    )


    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

def interactive_chatbot():
    """
    Runs an interactive chat session with the DialoGPT chatbot, allowing parameter customization.
    """
    print("DialoGPT Enhanced Chatbot\nType 'exit', 'bye', or 'goodbye' to end the conversation.\n")
    chat_history_ids = None


    exit_commands = ['exit', 'bye', 'goodbye']

    while True:

        user_input = input("You: ").strip()


        if user_input.lower() in exit_commands:
            print("Chatbot: Goodbye!")
            break


        print("\n[Optional] Set generation parameters or press Enter to use defaults.")
        try:
            do_sample_input = input("Enable sampling? (True/False) [Default: True]: ").strip()
            if do_sample_input.lower() == 'true':
                do_sample = True
            elif do_sample_input.lower() == 'false':
                do_sample = False
            else:
                do_sample = True  # Default

            num_beams_input = input("Number of beams for beam search [Default: 1]: ").strip()
            num_beams = int(num_beams_input) if num_beams_input.isdigit() else 1

            top_k_input = input("Top-K tokens [Default: 50]: ").strip()
            top_k = int(top_k_input) if top_k_input.isdigit() else 50

            top_p_input = input("Top-P (nucleus sampling) [Default: 0.95]: ").strip()
            top_p = float(top_p_input) if top_p_input.replace('.', '', 1).isdigit() else 0.95

            temperature_input = input("Temperature [Default: 0.7]: ").strip()
            temperature = float(temperature_input) if temperature_input.replace('.', '', 1).isdigit() else 0.7

            no_repeat_ngram_size_input = input("No repeat n-gram size [Default: 3]: ").strip()
            no_repeat_ngram_size = int(no_repeat_ngram_size_input) if no_repeat_ngram_size_input.isdigit() else 3

            stop_sequence = input("Enter a stop sequence (or press Enter to skip): ").strip()
            stop_sequence = stop_sequence if stop_sequence else None

        except ValueError:
            print("Invalid input detected. Using default parameters.\n")
            do_sample = True
            num_beams = 1
            top_k = 50
            top_p = 0.95
            temperature = 0.7
            no_repeat_ngram_size = 3
            stop_sequence = None


        response, chat_history_ids = generate_response(
            user_input,
            chat_history_ids=chat_history_ids,
            do_sample=do_sample,
            num_beams=num_beams,
            top_k=top_k,
            top_p=top_p,
            temperature=temperature,
            no_repeat_ngram_size=no_repeat_ngram_size,
            stop_sequence=stop_sequence
        )


        print(f"Chatbot: {response}\n")


        if stop_sequence and stop_sequence.lower() in response.lower():
            print("Chatbot: Conversation has ended as per the stop sequence.")
            break

if __name__ == "__main__":
    interactive_chatbot()


DialoGPT Enhanced Chatbot
Type 'exit', 'bye', or 'goodbye' to end the conversation.

You: hello

[Optional] Set generation parameters or press Enter to use defaults.
Enable sampling? (True/False) [Default: True]: True
Number of beams for beam search [Default: 1]: 2
Top-K tokens [Default: 50]: 40
Top-P (nucleus sampling) [Default: 0.95]: 
Temperature [Default: 0.7]: 
No repeat n-gram size [Default: 3]: 
Enter a stop sequence (or press Enter to skip): 
Chatbot: Hiya! How are you?

You: good

[Optional] Set generation parameters or press Enter to use defaults.
Enable sampling? (True/False) [Default: True]: 
Number of beams for beam search [Default: 1]: 
Top-K tokens [Default: 50]: 
Top-P (nucleus sampling) [Default: 0.95]: 
Temperature [Default: 0.7]: 
No repeat n-gram size [Default: 3]: 
Enter a stop sequence (or press Enter to skip): 
Chatbot: Good to hear, what is your favorite color?

You: bye
Chatbot: Goodbye!


## **Artificial conversations**

Let the two previous chatbots interact with each other using preferred parameter configuration(s).

In [None]:
import os
import shutil


if os.path.exists("rasa_project"):
    shutil.rmtree("rasa_project")

# Initialize RASA project
!rasa init --no-prompt --project rasa_project

  Base: DeclarativeMeta = declarative_base()
usage: rasa [-h] [--version]
            {init,run,shell,train,interactive,telemetry,test,visualize,data,export,x,evaluate} ...
rasa: error: unrecognized arguments: --project rasa_project


In [None]:
import os


print("Current Directory Contents:")
print(os.listdir('.'))


if 'rasa_project' in os.listdir('.'):
    print("\n'rasa_project' directory exists.")

    print("Contents of 'rasa_project':")
    print(os.listdir('rasa_project'))


    if 'data' in os.listdir('rasa_project'):
        print("\n'data' directory exists within 'rasa_project'.")
    else:
        print("\n'data' directory is missing within 'rasa_project'. Creating it now...")
        os.makedirs('rasa_project/data', exist_ok=True)
        print("'rasa_project/data' directory created.")
else:
    print("\n'rasa_project' directory is missing. Initializing RASA project now...")

    !rasa init --no-prompt --project rasa_project


    if 'rasa_project' in os.listdir('.'):
        print("'rasa_project' directory successfully created.")
        if 'data' in os.listdir('rasa_project'):
            print("'data' directory exists within 'rasa_project'.")
        else:
            print("'data' directory is missing within 'rasa_project'. Creating it now...")
            os.makedirs('rasa_project/data', exist_ok=True)
            print("'rasa_project/data' directory created.")
    else:
        print("Failed to create 'rasa_project' directory. Please check for errors in the RASA initialization step.")


Current Directory Contents:
['config.yml', 'domain.yml', 'data']

'rasa_project' directory is missing. Initializing RASA project now...
  Base: DeclarativeMeta = declarative_base()
usage: rasa [-h] [--version]
            {init,run,shell,train,interactive,telemetry,test,visualize,data,export,x,evaluate} ...
rasa: error: unrecognized arguments: --project rasa_project
Failed to create 'rasa_project' directory. Please check for errors in the RASA initialization step.


In [None]:

if os.path.exists("rasa_project"):
    shutil.rmtree("rasa_project")
    print("'rasa_project' directory removed.")


!rasa init --no-prompt --project rasa_project

if 'rasa_project' in os.listdir('.'):
    print("'rasa_project' directory successfully created.")
    if 'data' in os.listdir('rasa_project'):
        print("'data' directory exists within 'rasa_project'.")
    else:
        print("'data' directory is missing within 'rasa_project'. Creating it now...")
        os.makedirs('rasa_project/data', exist_ok=True)
        print("'rasa_project/data' directory created.")
else:
    print("Failed to create 'rasa_project' directory. Please check for errors in the RASA initialization step.")


  Base: DeclarativeMeta = declarative_base()
usage: rasa [-h] [--version]
            {init,run,shell,train,interactive,telemetry,test,visualize,data,export,x,evaluate} ...
rasa: error: unrecognized arguments: --project rasa_project
Failed to create 'rasa_project' directory. Please check for errors in the RASA initialization step.


In [None]:
nlu_content = """
version: "3.1"

nlu:
- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - good morning
    - good evening
    - hey there

- intent: goodbye
  examples: |
    - bye
    - goodbye
    - see you around
    - see you later

- intent: affirm
  examples: |
    - yes
    - indeed
    - of course
    - that sounds good
    - correct

- intent: deny
  examples: |
    - no
    - never
    - I don't think so
    - don't like that
    - no way
    - not really

- intent: mood_great
  examples: |
    - perfect
    - great
    - amazing
    - feeling like a king
    - wonderful
    - I am feeling very good
    - I am great
    - I'm good

- intent: mood_unhappy
  examples: |
    - sad
    - very sad
    - unhappy
    - bad
    - I don't feel very well
    - I am sad
    - I am depressed
    - I'm not feeling very well

- intent: bot_challenge
  examples: |
    - are you a bot?
    - are you a human?
    - am I talking to a bot?
    - am I talking to a human?

- intent: get_weather
  examples: |
    - What's the weather like today?
    - Tell me the weather forecast.
    - How's the weather?
    - Is it going to rain today?
    - Do I need an umbrella today?
    - What's the temperature outside?
"""


os.makedirs("rasa_project/data", exist_ok=True)


with open("rasa_project/data/nlu.yml", "w") as file:
    file.write(nlu_content)
    print("'nlu.yml' file created successfully.")


'nlu.yml' file created successfully.


In [None]:
domain_content = """
version: "3.1"

intents:
  - greet
  - goodbye
  - affirm
  - deny
  - mood_great
  - mood_unhappy
  - bot_challenge
  - get_weather

responses:
  utter_greet:
    - text: "Hey! How are you?"

  utter_cheer_up:
    - text: "Here is something to cheer you up!"
      image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
    - text: "Did that help you?"

  utter_goodbye:
    - text: "Bye! Take care."

  utter_happy:
    - text: "Great, carry on!"

  utter_bot_challenge:
    - text: "I am a bot, powered by Rasa."

  utter_weather:
    - text: "The weather today is sunny with a high of 25 degrees Celsius."

  utter_iamabot:
    - text: "I am a bot, powered by Rasa."

session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: true
"""

# Write to domain.yml
with open("rasa_project/domain.yml", "w") as file:
    file.write(domain_content)
    print("'domain.yml' file created successfully.")

'domain.yml' file created successfully.


In [None]:
stories_content = """
version: "3.1"

stories:
- story: greet user
  steps:
  - intent: greet
  - action: utter_greet

- story: say goodbye
  steps:
  - intent: goodbye
  - action: utter_goodbye

- story: get weather
  steps:
  - intent: get_weather
  - action: utter_weather

- story: bot challenge
  steps:
  - intent: bot_challenge
  - action: utter_iamabot
"""

# Write to stories.yml
with open("rasa_project/data/stories.yml", "w") as file:
    file.write(stories_content)
    print("'stories.yml' file created successfully.")

'stories.yml' file created successfully.


In [None]:
rules_content = """
version: "3.1"

rules:
- rule: Fallback rule
  steps:
  - intent: nlu_fallback
  - action: utter_iamabot
"""

# Write to rules.yml
with open("rasa_project/data/rules.yml", "w") as file:
    file.write(rules_content)
    print("'rules.yml' file created successfully.")

'rules.yml' file created successfully.


In [None]:
config_content = """
language: en
version: "3.1"

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy
"""

# Write to config.yml
with open("rasa_project/config.yml", "w") as file:
    file.write(config_content)
    print("'config.yml' file created successfully.")

'config.yml' file created successfully.


In [None]:
# navigate to the RASA project directory
os.chdir("rasa_project")

# train the RASA model
!rasa train --quiet

# verify that the model is trained and saved
if os.path.exists("models"):
    print("RASA model trained and saved in the 'models' directory.")
else:
    print("RASA model training failed. Please check the previous steps for errors.")

  Base: DeclarativeMeta = declarative_base()
  import pkg_resources
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_nam

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the pre-trained DialoGPT-large model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

In [None]:
import asyncio
from rasa.core.agent import Agent
from rasa.shared.core.trackers import DialogueStateTracker

# load the trained RASA model
agent = Agent.load("models")

async def get_rasa_response(user_input):
    """
    Sends a user message to the RASA Agent and returns the response.

    Args:
        user_input (str): The input text from the user.

    Returns:
        response (str): The response generated by RASA.
    """
    responses = await agent.handle_text(user_input, sender_id="bot_interaction")

    if responses:
        return responses[0]['text']
    else:
        return "I'm not sure how to respond to that."

  return self.randrange(a, b+1)


In [None]:
from transformers import StoppingCriteria, StoppingCriteriaList
import torch

class StopSequenceCriteria(StoppingCriteria):
    def __init__(self, stop_sequence_ids):
        """
        Initializes the stopping criteria with the token IDs of the stop sequence.

        Args:
            stop_sequence_ids (list): List of token IDs representing the stop sequence.
        """
        self.stop_sequence_ids = stop_sequence_ids

    def __call__(self, input_ids, scores, **kwargs):
        """
        Determines whether to stop generation based on the stop sequence.

        Args:
            input_ids (torch.Tensor): The sequence of generated token IDs.
            scores (torch.Tensor): The scores for the next token predictions.

        Returns:
            bool: True if the stop sequence is detected; False otherwise.
        """
        # check if the last tokens match the stop sequence
        if input_ids.shape[-1] < len(self.stop_sequence_ids):
            return False

        # extract the last tokens equal to the length of the stop sequence
        last_tokens = input_ids[0, -len(self.stop_sequence_ids):].tolist()

        return last_tokens == self.stop_sequence_ids

def generate_dialo_response(
    user_input,
    chat_history_ids=None,
    do_sample=True,
    num_beams=1,
    top_k=50,
    top_p=0.95,
    temperature=0.7,
    no_repeat_ngram_size=3,
    max_length=1000,
    stop_sequence=None
):
    """
    Generates a response to the user input using DialoGPT with customizable parameters.

    Args:
        user_input (str): The input text from the user.
        chat_history_ids (torch.Tensor, optional): The history of the conversation.
        do_sample (bool): Whether to use sampling; otherwise, greedy decoding.
        num_beams (int): Number of beams for beam search.
        top_k (int): The number of highest probability vocabulary tokens to keep for top-k-filtering.
        top_p (float): The cumulative probability for nucleus sampling.
        temperature (float): Temperature value for scaling logits.
        no_repeat_ngram_size (int): Prevent repeating n-grams of this size.
        max_length (int): The maximum length of the generated response.
        stop_sequence (str, optional): A phrase that, if generated, will stop further generation.

    Returns:
        response (str): The generated response from the model.
        chat_history_ids (torch.Tensor): Updated chat history including the latest user input and model response.
    """

    new_user_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')


    if chat_history_ids is not None:
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids


    stopping_criteria = None
    if stop_sequence:
        stop_sequence_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
        stopping_criteria = StoppingCriteriaList([StopSequenceCriteria(stop_sequence_ids)])


    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=max_length,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=no_repeat_ngram_size,
        do_sample=do_sample,
        top_k=top_k,
        top_p=top_p,
        temperature=temperature,
        num_beams=num_beams,
        stopping_criteria=stopping_criteria
    )


    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

In [None]:
async def chat_between_rasa_dialo():
    """
    Facilitates a conversation between RASA and DialoGPT chatbots.
    """

    user_message = "Hello!"


    continue_chat = True


    max_turns = 10
    current_turn = 0


    STOP_SEQUENCE = "Goodbye!"


    dialo_chat_history = None

    while continue_chat and current_turn < max_turns:
        print(f"\n--- Turn {current_turn + 1} ---")


        dialo_response, dialo_chat_history = generate_dialo_response(
            user_input=user_message,
            chat_history_ids=dialo_chat_history,
            do_sample=True,           # Sampling enabled for variability
            num_beams=1,              # Beam search disabled
            top_k=50,
            top_p=0.95,
            temperature=0.7,
            no_repeat_ngram_size=3,
            max_length=1000,
            stop_sequence=STOP_SEQUENCE
        )

        print(f"DialoGPT: {dialo_response}")


        if STOP_SEQUENCE.lower() in dialo_response.lower():
            print("DialoGPT has ended the conversation.")
            break


        rasa_response = await get_rasa_response(dialo_response)
        print(f"RASA: {rasa_response}")


        if rasa_response.strip().lower() in ['goodbye', 'bye', 'exit']:
            print("RASA has ended the conversation.")
            break


        user_message = rasa_response
        current_turn += 1

    print("\nConversation ended.")

In [None]:
# Run the interaction loop
await chat_between_rasa_dialo()

  result = coro.send(None)



--- Turn 1 ---
DialoGPT: Hi!
2024-11-26 18:13:57 [debug    ] processor.actions.policy_prediction prediction_events=[]
2024-11-26 18:13:57 [debug    ] processor.actions.log          action_name=action_session_start rasa_events=[<rasa.shared.core.events.SessionStarted object at 0x7f643ac6a530>, ActionExecuted(action: action_listen, policy: None, confidence: None)]
2024-11-26 18:13:57 [debug    ] processor.slots.log            slot_values=	session_started_metadata: None
2024-11-26 18:13:57 [debug    ] processor.message.parse        parse_data_entities=[] parse_data_intent={'name': 'greet', 'confidence': 0.9964724779129028} parse_data_text=Hi!
2024-11-26 18:13:57 [debug    ] processor.extract.slots        action_extract_slot=action_extract_slots len_extraction_events=0 rasa_events=[]
2024-11-26 18:13:57 [debug    ] memoization.predict.actions    tracker_states=[{}, {'user': {'intent': 'greet'}, 'prev_action': {'action_name': 'action_listen'}}]
2024-11-26 18:13:57 [debug    ] rule_policy.a

## **LLM conversational skills**

I've Created a chatbot using an LLM model. Create and manage a list of messages to keep track of the conversation. Use the keywords "system", "user" and "assistant" to assign a role to each message. More examples can be found [here](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct).

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

In [None]:

conversation = [
    {"role": "system", "content": "You are a helpful assistant."}
]

In [None]:

conversation = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def generate_response(conversation, model, tokenizer, max_length=2048, temperature=0.7, top_p=0.9, stop_sequences=["\nuser:", "\nassistant:"]):
    """
    Generates a response from the assistant based on the conversation history.

    Args:
        conversation (list): List of messages with roles.
        model: The pre-trained language model.
        tokenizer: Corresponding tokenizer.
        max_length (int): Maximum length of the generated response.
        temperature (float): Sampling temperature.
        top_p (float): Nucleus sampling threshold.
        stop_sequences (list): Sequences where generation should stop.

    Returns:
        str: Generated response.
    """

    prompt = ""
    for message in conversation:
        if message["role"] == "system":
            prompt += f"{message['content']}\n"
        elif message["role"] == "user":
            prompt += f"user: {message['content']}\n"
        elif message["role"] == "assistant":
            prompt += f"assistant: {message['content']}\n"

    prompt += "assistant: "  # Indicate that the assistant should respond next


    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)


    output_ids = model.generate(
        input_ids,
        max_length=input_ids.shape[1] + 200,  # Allow up to 200 tokens for the response
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )


    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)


    assistant_response = output_text[len(prompt):].strip()


    for stop in stop_sequences:
        if stop in assistant_response:
            assistant_response = assistant_response.split(stop)[0].strip()

    return assistant_response

def chat():
    print("Chatbot initialized. Type 'exit', 'quit', or 'bye' to end the conversation.\n")
    while True:
        user_input = input("You: ").strip()


        if user_input.lower() in ['exit', 'quit', 'bye']:
            print("Assistant: Goodbye! Have a great day!")
            break


        conversation.append({"role": "user", "content": user_input})


        assistant_response = generate_response(conversation, model, tokenizer)


        conversation.append({"role": "assistant", "content": assistant_response})

        print(f"Assistant: {assistant_response}\n")

# Start the chat
chat()

Chatbot initialized. Type 'exit', 'quit', or 'bye' to end the conversation.

You: hello
Assistant: urn

You: how is it going?
Assistant: urnategothedog.comuser : how are you doing?

You: exit
Assistant: Goodbye! Have a great day!


In [None]:
def generate_response(conversation, model, tokenizer, max_length=2048, temperature=0.7, top_p=0.9, stop_sequences=["\nuser:", "\nassistant:"], max_history=10):
    """
    Generates a response from the assistant based on the conversation history.

    Args:
        conversation (list): List of messages with roles.
        model: The pre-trained language model.
        tokenizer: Corresponding tokenizer.
        max_length (int): Maximum length of the generated response.
        temperature (float): Sampling temperature.
        top_p (float): Nucleus sampling threshold.
        stop_sequences (list): Sequences where generation should stop.
        max_history (int): Maximum number of past messages to retain.

    Returns:
        str: Generated response.
    """
    # retain only the last 'max_history' messages plus the system prompt
    system_message = [msg for msg in conversation if msg["role"] == "system"]
    user_and_assistant = conversation[-max_history:]
    conversation_trimmed = system_message + user_and_assistant

    # format the conversation into a single prompt string
    prompt = ""
    for message in conversation_trimmed:
        if message["role"] == "system":
            prompt += f"{message['content']}\n"
        elif message["role"] == "user":
            prompt += f"user: {message['content']}\n"
        elif message["role"] == "assistant":
            prompt += f"assistant: {message['content']}\n"

    prompt += "assistant: "


    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)

    output_ids = model.generate(
        input_ids,
        max_length=input_ids.shape[1] + 200,  # Allow up to 200 tokens for the response
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )


    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)


    assistant_response = output_text[len(prompt):].strip()


    for stop in stop_sequences:
        if stop in assistant_response:
            assistant_response = assistant_response.split(stop)[0].strip()

    return assistant_response

In [None]:
prompt = "assistant: Hello! How can I assist you today?\nuser: What's the weather like today?\nassistant:"
input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)


output_ids = model.generate(
    input_ids,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)


In [None]:

prompt = "Hello! How are you today?"


input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)


temperature = 0.7
top_p = 0.9


output_ids = model.generate(
    input_ids,
    max_new_tokens=200,
    temperature=temperature,
    top_p=top_p,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)


assistant_response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(f"Assistant: {assistant_response}")

Assistant: Hello! How are you today?


In [None]:

conversation = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def generate_response(conversation, model, tokenizer, max_length=2048, temperature=0.7, top_p=0.9, stop_sequences=["\nuser:", "\nassistant:"]):
    """
    Generates a response from the assistant based on the conversation history.

    Args:
        conversation (list): List of messages with roles.
        model: The pre-trained language model.
        tokenizer: Corresponding tokenizer.
        max_length (int): Maximum length of the generated response.
        temperature (float): Sampling temperature.
        top_p (float): Nucleus sampling threshold.
        stop_sequences (list): Sequences where generation should stop.

    Returns:
        str: Generated response.
    """

    prompt = ""
    for message in conversation:
        if message["role"] == "system":
            prompt += f"{message['content']}\n"
        elif message["role"] == "user":
            prompt += f"user: {message['content']}\n"
        elif message["role"] == "assistant":
            prompt += f"assistant: {message['content']}\n"

    prompt += "assistant: "  # Indicate that the assistant should respond next


    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)


    output_ids = model.generate(
        input_ids,
        max_length=max_length,
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    assistant_response = output_text[len(prompt):].strip()
    for stop in stop_sequences:
        if stop in assistant_response:
            assistant_response = assistant_response.split(stop)[0].strip()

    return assistant_response

user_message = "Can you tell me a joke?"

conversation.append({"role": "user", "content": user_message})

assistant_response = generate_response(conversation, model, tokenizer)

conversation.append({"role": "assistant", "content": assistant_response})

print(f"Assistant: {assistant_response}")

Assistant: urns
