### Parse gpt responses for multiturn chat

* Cell 1: Import necessary packages
* Cell 2: Helper functions to build chat history and parse the response from OpenAI
* Cell 3: Parse GPT responses
* Cell 4: Sample at random n chat histories to check the quality

In [1]:
import json
from tqdm import tqdm

In [2]:
# Function to get the initial prompt

def build_chat_history(response, current_history, gpt_simpulated):
    current_history.append({"role": gpt_simpulated, "value": response})
    return current_history

# Function to parse the string and create a JSON object
def parse_gpt_response(response, id, current_history, gpt_simulated):
    """
    Helper function fur parse_results()
    Parses the gpt responses from string to dict
    """
    try:
        # Decode any UTF-8 character codes in the input string
        decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
        chat_history = build_chat_history(decoded_response, current_history, gpt_simulated)
        line = {
            "id": id,
            "conversation": chat_history
        }
        return line

    except json.JSONDecodeError as e:
        return None

In [9]:
# Desired output format (.jsonl) -> id will be removed later
# {"id": "1", "conversation": [{"role": "user", "value": "Hello!"}, {"role": "assistant", "value": "Hi, how can I help you today?"}, {"role": "user", "Value": "I need assistance with my account."}]}

situations_name = "tasks_x_subtopics" ### Adjust that if needed
path_to_results = "../results/gpt_results.jsonl"
gpt_simulated = "chatbot" ### Always ajust that
multiturn_path = "../results/multiturn_" + situations_name + ".jsonl"

# Read in the conversation we have so far and store them in a dict
multiturn_conv = {}
with open(multiturn_path, 'r') as file:
    for i, line in enumerate(file):
        line_as_json = json.loads(line)
        multiturn_conv[line_as_json["id"]] = line_as_json["conversation"]

# Extract content from json
print("Parsing GPT responses...")
questions_failed_to_parse = []
with open(path_to_results, 'r') as file:
    for line in tqdm(file):
        try:
            data = json.loads(line)  # Parse each line as JSON
            response_content = data.get("response", {}).get("body", {}).get("choices", [])[0].get("message", {}).get("content", None)
            id = data.get("custom_id") # format "0-0" "task_id - subtopic_id"
            if response_content and id:
                parsed_response = parse_gpt_response(response=response_content, id=id, current_history=multiturn_conv.get(id, []), gpt_simulated=gpt_simulated)
                if bool(parsed_response):
                    multiturn_conv[id] = parsed_response["conversation"]
                else:
                    questions_failed_to_parse.append(id)
        except json.JSONDecodeError as e:
            questions_failed_to_parse.append(id)
print("Parsing completed")

# Save to JSON file
with open(multiturn_path, 'w') as jsonl_file:
    for id, conv in multiturn_conv.items():
        jsonl_file.write(json.dumps({"id": id, "conversation": conv}) + '\n')

# Output the parsed data (for verification)
print(f"Medical prompts have been saved to {multiturn_path}")
print("See an example below:")
print(json.dumps(parsed_response, indent=4))
print(f"Failed to parse {len(questions_failed_to_parse)} questions:")
print(questions_failed_to_parse)

Parsing GPT responses...


  decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
  decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
3400it [00:00, 16423.40it/s]


Parsing completed
Medical prompts have been saved to ../results/multiturn_tasks_x_subtopics.jsonl
See an example below:
{
    "id": "10-975",
    "conversation": [
        {
            "role": "user",
            "value": "Translate the following patient instructions from English to Spanish: \"Take one tablet by mouth every 12 hours with food. Do not crush or chew the tablet. If you miss a dose, take it as soon as you remember unless it is almost time for your next dose.\""
        },
        {
            "role": "chatbot",
            "value": "(chatbot) T\u00f3mese una tableta por v\u00eda oral cada 12 horas con comida. No triture ni mastique la tableta. Si olvida una dosis, t\u00f3mela tan pronto como lo recuerde, a menos que falte poco tiempo para su pr\u00f3xima dosis."
        },
        {
            "role": "user",
            "value": "Is the translation accurate and clear for patient understanding?"
        },
        {
            "role": "chatbot",
            "value": "T

In [11]:
# Randomly display some prompts

import json
import random

# Path to the .jsonl file
file_path = "../results/multiturn_specialty_x_domain.jsonl"

# Load the data
entries = []
with open(file_path, 'r') as file:
    for line in file:
        entries.append(json.loads(line))  # Parse each line as a JSON object

# Randomly sample 10 entries
sampled_entries = random.sample(entries, min(10, len(entries)))  # Ensure sampling doesn't exceed total entries

# Display the sampled entries
for entry in sampled_entries:
    print(json.dumps(entry, indent=4))
    print("\n" + "-"*80 + "\n")  # Separator between entries


{
    "id": "20-5490",
    "conversation": [
        {
            "role": "user",
            "value": "\"Can you explain how chronic stress affects the hypothalamic-pituitary-adrenal (HPA) axis and its physiological impact on the body, particularly in relation to a patient's experience of anxiety and depression?\""
        },
        {
            "role": "chatbot",
            "value": "Certainly. Chronic stress can significantly impact the hypothalamic-pituitary-adrenal (HPA) axis, leading to various physiological and psychological effects. Here's a brief overview:\n\n1. **HPA Axis Activation**: Under chronic stress, the HPA axis becomes persistently activated. The hypothalamus releases corticotropin-releasing hormone (CRH), which prompts the pituitary gland to release adrenocorticotropic hormone (ACTH), stimulating the adrenal glands to produce cortisol\u2014the primary stress hormone.\n\n2. **Physiological Impact**: \n   - **Cortisol Dysregulation**: Continuous high levels of cor