<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/checks/conversation/guideline_adherence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

<h1 style="text-align: center;">Evaluating Guideline Adherence in Conversations</h1>

**What is Guideline Adherence?**: Guideline adherence refers to the extent to which the LLM follows a given guideline, rule, or protocol. Given the complexity of LLMs, it is crucial to define certain guidelines, be it in terms of the structure of the output or the constraints on the content of the output or protocols on the decision-making capabilities of the LLM (agents). 

For example, for an LLM-powered chatbot agent trained to perform appointment booking tasks only, you want to make sure that the LLM is following the guideline: "The agent should redirect all the queries to the human agent, except the ones related to appointment booking."

**Data schema**: The data schema required for this evaluation is as follows:

| Column Name | Description |
| ----------- | ----------- |
| conversation | The conversation between the user and the LLM |
| guideline | Explanation of the guideline to be followed |
| guideline_name | Naming the guideline for better accessibility |

 If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
 

## Step 1: Install UpTrain by running 'pip install uptrain'

In [1]:
%pip install uptrain

Note: you may need to restart the kernel to use updated packages.


## Step 2: Let's define our dataset to run evaluations upon

In [11]:
satisfactory_chat = [{
    'conversation' : [
        {"role": "patient", "content": "Help"}, 
        {"role": "nurse", "content": "what do you need"}, 
        {"role": "patient", "content": "Having chest pain"}, 
        {"role": "nurse", "content": "please call 102"},
        {"role": "patient", "content": "Thank you nurse"}, 
    ]  
}]

unsatisfactory_chat = [{
    'conversation' : [
        {"role": "patient", "content": "Help"}, 
        {"role": "nurse", "content": "what do you need"}, 
        {"role": "patient", "content": "Having chest pain"}, 
        {"role": "nurse", "content": "Sorry, I am not sure what that means"},
        {"role": "patient", "content": "You don't understand. Do something! I am having severe pain in my chest"}
    ]  
}]

data = satisfactory_chat + unsatisfactory_chat

## Step 3: Running evaluations using UpTrain's Open-Source Software (OSS)

In [3]:
from uptrain import EvalLLM, ConversationGuidelineAdherence
import json

OPENAI_API_KEY = "sk-******************************"  # Insert your OpenAI key here

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data = data,
    checks = [ConversationGuidelineAdherence(guideline="Provide emergency contact information if the patient is in distress", guideline_name="Emergency Contact Information")],
)

[32m2024-05-16 14:44:30.379[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m387[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-05-16 14:44:43.713[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m376[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m


In [4]:
print(json.dumps(res,indent=3))

[
   {
      "conversation": [
         {
            "role": "patient",
            "content": "Help"
         },
         {
            "role": "nurse",
            "content": "what do you need"
         },
         {
            "role": "patient",
            "content": "Having chest pain"
         },
         {
            "role": "nurse",
            "content": "please call 102"
         },
         {
            "role": "patient",
            "content": "Thank you nurse"
         }
      ],
      "score_conversation_Emergency Contact Information_adherence": 1.0,
      "explanation_conversation_Emergency Contact Information_adherence": "[\"The given conversation strictly adheres to the guideline of providing emergency contact information if the patient is in distress. In the conversation, the patient expresses 'Help' indicating distress, followed by mentioning 'Having chest pain'. The nurse immediately responds by instructing to 'please call 102', which is a clear indication of pr

## Step 4: Let's look at some of the results 

### Sample with a good conversation

In [5]:
print(json.dumps(res[0],indent=3))

{
   "conversation": [
      {
         "role": "patient",
         "content": "Help"
      },
      {
         "role": "nurse",
         "content": "what do you need"
      },
      {
         "role": "patient",
         "content": "Having chest pain"
      },
      {
         "role": "nurse",
         "content": "please call 102"
      },
      {
         "role": "patient",
         "content": "Thank you nurse"
      }
   ],
   "score_conversation_Emergency Contact Information_adherence": 1.0,
   "explanation_conversation_Emergency Contact Information_adherence": "[\"The given conversation strictly adheres to the guideline of providing emergency contact information if the patient is in distress. In the conversation, the patient expresses 'Help' indicating distress, followed by mentioning 'Having chest pain'. The nurse immediately responds by instructing to 'please call 102', which is a clear indication of providing emergency contact information. The conversation ends with the patient

### Sample with a bad conversation

In [6]:
print(json.dumps(res[1],indent=3))

{
   "conversation": [
      {
         "role": "patient",
         "content": "Help"
      },
      {
         "role": "nurse",
         "content": "what do you need"
      },
      {
         "role": "patient",
         "content": "Having chest pain"
      },
      {
         "role": "nurse",
         "content": "Sorry, I am not sure what that means"
      },
      {
         "role": "patient",
         "content": "You don't understand. Do something! I am having severe pain in my chest"
      }
   ],
   "score_conversation_Emergency Contact Information_adherence": 0.0,
   "explanation_conversation_Emergency Contact Information_adherence": "[\"The given conversation strictly violates the guideline of providing emergency contact information if the patient is in distress. Despite the patient clearly expressing distress by mentioning 'Having chest pain' and emphasizing the severity of the situation with 'I am having severe pain in my chest', the nurse fails to provide any emergency conta

## [Optional] Step 5: UpTrain Managed Service and Dashboards

You can create a free UpTrain account [here](https://uptrain.ai/) and get free trial credits. If you want more trial credits, [book a call with the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).

In [12]:
from uptrain import APIClient, Settings, ConversationGuidelineAdherence

UPTRAIN_API_KEY = "up-*******************************"  # Insert your UpTrain API key here

uptrain_client = APIClient(
    Settings(
        uptrain_access_token=UPTRAIN_API_KEY,
    )
)

res = uptrain_client.log_and_evaluate(
    "Emergency-Contact-Information",
    data=data, 
    checks=[ConversationGuidelineAdherence(guideline="Provide emergency contact information if the patient is in distress", guideline_name="Emergency Contact Information")],
)

print(json.dumps(res, indent=3))

[32m2024-05-16 16:47:32.739[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m677[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m


[
   {
      "conversation": [
         {
            "role": "patient",
            "content": "Help"
         },
         {
            "role": "nurse",
            "content": "what do you need"
         },
         {
            "role": "patient",
            "content": "Having chest pain"
         },
         {
            "role": "nurse",
            "content": "please call 102"
         },
         {
            "role": "patient",
            "content": "Thank you nurse"
         }
      ],
      "score_Emergency Contact Information_adherence": 1.0,
      "explanation_Emergency Contact Information_adherence": "The assistant strictly adhered to the guideline by providing emergency contact information when the patient mentioned having chest pain. The nurse promptly responded by instructing to call 102, which aligns with the guideline of providing emergency contact information in case of distress."
   },
   {
      "conversation": [
         {
            "role": "patient",
        