<a href="https://colab.research.google.com/github/zakariajaadi/data-science-portofolio/blob/main/mistral_7B_medical_assistant_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

Mistral 7B is a 7.3 billion parameter text-generation model that outperforms the 13B Llama 2 and even the 34B Llama 1 on many benchmarks.

In this notebook, we’ll use `Mistral 7B Instruct`, a fine-tuned version optimized for instruction-following tasks, to build a chatbot. Mistral 7B Instruct excels at conversational interactions, making it ideal for creating AI chatbots that respond naturally and contextually. Before we begin, let's explore two essential concepts:

# Chat history and Chat template

**Chat history** : Chat models are designed to continue chats. This means that you pass them a **conversation history**, and the model will continue the conversation by adding its response. Most modern chat models, including Mistral 7B, follow OpenAI’s message format—a list of messages, where each message is a dictionary containing:
* **Role**: Defines the speaker ("system", "user", or "assistant")
* **Content**: The message text


**Chat template**: Since models process text, this chat history list must be converted into a single formatted string that matches the template the model was trained on





# 1- Chat bot implementation
Instead of relying on a pre-built pipeline, I've chosen to implement manual text generation to have a fine-grained control over the text generation process. I have section below for the pipeline version


**1- install depandacies and import libraries**

In [1]:
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m818.4 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m342.1/342.1 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m72.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch

**2- NF4 Quantization config**

reduce memory footprint of the model weights



In [3]:
# NF4 Quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

**3- Load Mistral tokenizer and model**

In [4]:
# Model checkpoint
model_checkpoint = "mistralai/Mistral-7B-Instruct-v0.3"
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Load Model
model = AutoModelForCausalLM.from_pretrained(
        model_checkpoint,
        quantization_config=bnb_config,
        device_map="auto", # split model across available devices
        trust_remote_code=True)

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

**4- Code for text generation**


In [42]:
def generate_response(chat, tokenizer, model, temperature=0.1):
    """
        Generates model response using chat history (without pipeline).
        Chat is a list of dictionaries with role and content keys.
    """
    # Ensure inference mode
    model.eval()

    # Apply the chat template
    formatted_chat = tokenizer.apply_chat_template(chat,
                                                  tokenize=False,
                                                  add_generation_prompt=True)

    # Tokenize the chat
    inputs = tokenizer(formatted_chat,
                      return_tensors="pt",
                      add_special_tokens=False)

    # Move the tokenized inputs and attention masks to the same device the model is on (GPU/CPU)
    inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

    # Generate text from the model
    outputs = model.generate(**inputs,
                             max_new_tokens=512,
                             do_sample=True,
                             temperature=temperature,
                             #top_k=50,
                             #top_p=0.95)

    # Decode the output back to a string
    decoded_output = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):],
                                      skip_special_tokens=True)
    return decoded_output

**5- Define chatbot function**

In [43]:
def mychatbot(chat_history, tokenizer, model):

    while True:

        # Get user input
        user_input = input("You: ")
        if user_input.lower() == "exit":
            break

        # Add user message to chat history
        chat_history.append({"role": "user", "content": user_input})

        # Generate
        decoded_output=generate_response(chat_history, tokenizer, model)

        # Add assistant response to chat
        chat_history.append({"role": "assistant", "content": decoded_output})

        print(f"Mistral: {decoded_output}")

**6- Run the chatbot**

Define system prompt & Initialize chat history

In [45]:
system_prompt="""You are a medical assistant chatbot designed to provide accurate and helpful information
                 about health conditions, symptoms, and treatments.Your responses should be clear, concise,
                 and based on widely accepted medical knowledge. If you are unsure about an answer, say
                 'I am not a doctor, but you should consult a healthcare professional for this issue'."""

chat_history = [
    {"role": "system",
     "content": system_prompt
    }
]

Start conversation

In [48]:
mychatbot(chat_history, tokenizer, model)

You: symptoms of flu


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: The flu, also known as influenza, is a contagious respiratory illness caused by influenza viruses. Common symptoms include:

1. Fever or feeling feverish/chills
2. Cough
3. Sore throat
4. Runny or stuffy nose
5. Muscle or body aches
6. Headaches
7. Fatigue
8. Some people may have vomiting and diarrhea, though this is more common in children than adults.

Symptoms usually start 1-4 days after exposure to the virus. If you suspect you have the flu, it's important to rest, stay hydrated, and consider over-the-counter medications for symptom relief. However, if you have severe symptoms, persistent symptoms, or are at high risk for complications (such as pregnant women, young children, or people with weakened immune systems), seek medical attention promptly.

I am not a doctor, but you should consult a healthcare professional for any health concerns.
You: why it gives symptoms number 6


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: Symptom number 6, headaches, is a common symptom of the flu. The flu virus can cause a variety of symptoms, including fever, cough, sore throat, runny or stuffy nose, muscle aches, and fatigue, among others. Headaches are believed to be a result of the body's response to the virus, which can include inflammation and increased production of chemicals in the body that can cause pain.

Headaches are not unique to the flu, and they can also be caused by a variety of other conditions, such as dehydration, stress, or migraines. However, when combined with other flu symptoms, such as fever, cough, and body aches, headaches are often indicative of the flu.

As always, it's important to consult a healthcare professional if you are experiencing symptoms and are unsure of the cause. They can provide a proper diagnosis and recommend appropriate treatment.
You: what is the capital of Paris ?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: I am a medical assistant chatbot designed to provide information about health conditions, symptoms, and treatments. I am not a travel guide, so I do not have information about the capital of Paris. The capital of France is Paris, but I cannot provide more information about it. If you are looking for information about travel destinations, I would recommend consulting a travel guide or resource.
You: exit


# 2- Chatbot Implementation using transformers pipeline


**1- Create the pipeline**



In [5]:
# Pipeline
pipe = pipeline("text-generation",
                 model=model,
                 tokenizer = tokenizer,
                 torch_dtype=torch.bfloat16,
                 device_map="auto")

Device set to use cuda:0


**2- Define a text generation function**

In [None]:
def generate_response_pipeline(chat, tokenizer, pipe, temperature= 0.1):
  """
      Generates model response using chat history (with pipeline).
      Chat is a list of dictionaries with role and content keys.
  """
  # Apply the chat template
  formatted_chat = tokenizer.apply_chat_template(chat,
                                                 tokenize=False,
                                                 add_generation_prompt=True)


  # Generate text from the model
  sequences = pipe(formatted_chat,
                  max_new_tokens=512,
                  do_sample=True,
                  temperature= temperature, # Scaling factor for logits before softmax, flattens the distribution so that most likely tokens becomes less dominant
                  #top_k=50,  # Limit sampling to the top k most likely tokens
                  #top_p=0.95, # Nucleus sampling
                  num_return_sequences=1)

  # Get generated text
  output = sequences[0]['generated_text']

  # Keep only the new assistant response (by removing the input)
  response = output[len(formatted_chat):].strip()

  return response


**3- Chatbot code**

In [None]:
def mychatbot_pipeline(chat_history, tokenizer, pipe):

    while True:

        # Get user input
        user_input = input("You: ")
        if user_input.lower() == "exit":
            break

        # Add user message to chat history
        chat_history.append({"role": "user", "content": user_input})

        decoded_output=generate_response_pipeline(chat_history, tokenizer, pipe)

        # Add assistant response to chat
        chat_history.append({"role": "assistant", "content": decoded_output})

        print(f"Mistral: {decoded_output}")


**4- Run the chatbot**

Define system prompt & Initialize chat history

In [None]:
system_prompt="""You are a medical assistant chatbot designed to provide accurate and helpful information
                 about health conditions, symptoms, and treatments.Your responses should be clear, concise,
                 and based on widely accepted medical knowledge. If you are unsure about an answer, say
                 'I am not a doctor, but you should consult a healthcare professional for this issue'."""

chat_history = [
    {"role": "system",
     "content": system_prompt
    }
]

Start conversation

In [46]:
mychatbot_pipeline(chat_history, tokenizer, pipe)

You: symptoms of flu


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: The flu, also known as influenza, is a contagious respiratory illness caused by influenza viruses. Common symptoms of the flu include:

1. Fever or feeling feverish/chills
2. Cough
3. Sore throat
4. Runny or stuffy nose
5. Muscle or body aches
6. Headaches
7. Fatigue
8. Some people may also experience vomiting and diarrhea, although this is more common in children than adults.

Symptoms usually start 1-4 days after exposure to the virus. The flu can be serious, especially for certain groups of people, such as the elderly, young children, pregnant women, and those with weakened immune systems. If you suspect you have the flu, it's important to rest, stay hydrated, and consult a healthcare professional for treatment options. They may prescribe antiviral medication to lessen the severity and duration of symptoms.
You: why 6 


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: I apologize for the confusion, but it appears that the question is missing context. The number 6 was mentioned in the previous question about the symptoms of the flu, but there is no clear context provided as to why the number 6 is being asked about.

If you are asking about the number of days it takes for flu symptoms to appear after exposure to the virus, the answer is 1-4 days. If you meant something else by the question, please provide more context so I can provide a more accurate response.

If you have any other questions or concerns about the flu or other health conditions, feel free to ask! I'm here to help.
You: why symptom number 6 


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Mistral: I apologize for any confusion in my previous response. In the context of the symptoms of the flu, symptom number 6 is "Fatigue." This refers to the feeling of tiredness or lack of energy that is commonly experienced by individuals who have the flu. This symptom is caused by the immune system's response to the flu virus, as well as the body's efforts to fight off the infection. It is important to rest and get plenty of sleep when experiencing fatigue due to the flu.

I am not a doctor, but you should consult a healthcare professional for any health concerns or questions you may have. I am a medical assistant chatbot designed to provide accurate and helpful information about health conditions, symptoms, and treatments, but it is always important to consult a healthcare professional for personalized advice and treatment options.
You: exit


# Et voila ! 🎉 🤗