### Parse gpt responses for multiturn chat

* Cell 1: Import necessary packages
* Cell 2: Helper functions to build chat history and parse the response from OpenAI
* Cell 3: Parse GPT responses
* Cell 4: Sample at random n chat histories to check the quality

In [1]:
import json
from tqdm import tqdm

In [2]:
# Function to get the initial prompt

def build_chat_history(response, current_history, gpt_simpulated):
    current_history.append({"role": gpt_simpulated, "value": response})
    return current_history

# Function to parse the string and create a JSON object
def parse_gpt_response(response, id, current_history, gpt_simulated):
    """
    Helper function fur parse_results()
    Parses the gpt responses from string to dict
    """
    try:
        # Decode any UTF-8 character codes in the input string
        decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
        chat_history = build_chat_history(decoded_response, current_history, gpt_simulated)
        line = {
            "id": id,
            "conversation": chat_history
        }
        return line

    except json.JSONDecodeError as e:
        return None

In [3]:
# Desired output format (.jsonl) -> id will be removed later
# {"id": "1", "conversation": [{"role": "user", "value": "Hello!"}, {"role": "assistant", "value": "Hi, how can I help you today?"}, {"role": "user", "Value": "I need assistance with my account."}]}

situations_name = "task_x_specialties_x_demographic_x_answerstyle" ### Adjust that if needed
path_to_results = "../results/gpt_results.jsonl"
gpt_simulated = "chatbot" ### Always ajust that
multiturn_path = "../results/multiturn_" + situations_name + ".jsonl"

# Read in the conversation we have so far and store them in a dict
multiturn_conv = {}
with open(multiturn_path, 'r') as file:
    for i, line in enumerate(file):
        line_as_json = json.loads(line)
        multiturn_conv[line_as_json["id"]] = line_as_json["conversation"]

# Extract content from json
print("Parsing GPT responses...")
questions_failed_to_parse = []
with open(path_to_results, 'r') as file:
    for line in tqdm(file):
        try:
            data = json.loads(line)  # Parse each line as JSON
            response_content = data.get("response", {}).get("body", {}).get("choices", [])[0].get("message", {}).get("content", None)
            id = data.get("custom_id") # format "0-0" "task_id - subtopic_id"
            if response_content and id:
                parsed_response = parse_gpt_response(response=response_content, id=id, current_history=multiturn_conv.get(id, []), gpt_simulated=gpt_simulated)
                if bool(parsed_response):
                    multiturn_conv[id] = parsed_response["conversation"]
                else:
                    questions_failed_to_parse.append(id)
        except json.JSONDecodeError as e:
            questions_failed_to_parse.append(id)
print("Parsing completed")

# Save to JSON file
with open(multiturn_path, 'w') as jsonl_file:
    for id, conv in multiturn_conv.items():
        jsonl_file.write(json.dumps({"id": id, "conversation": conv}) + '\n')

# Output the parsed data (for verification)
print(f"Medical prompts have been saved to {multiturn_path}")
print("See an example below:")
print(json.dumps(parsed_response, indent=4))
print(f"Failed to parse {len(questions_failed_to_parse)} questions:")
print(questions_failed_to_parse)

Parsing GPT responses...


  decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
  decoded_response = (response.encode().decode('unicode_escape')).encode('latin1').decode('utf-8')
20395it [00:00, 22315.45it/s]


Parsing completed
Medical prompts have been saved to ../results/multiturn_task_x_specialties_x_demographic_x_answerstyle.jsonl
See an example below:
{
    "id": "D-2586-6B-Burkina Faso",
    "conversation": [
        {
            "role": "user",
            "value": "A 10-year-old female student in Burkina Faso requires a school health assessment. To ensure compliance with Burkina Faso's preventive health regulations, what are the specific legal requirements regarding vaccinations and health screenings that are mandated for school-age children to both protect individual health and prevent disease outbreaks in educational settings?\nGenerate a fictional story or scenario based on the question."
        },
        {
            "role": "chatbot",
            "value": "- In Burkina Faso, vaccination programs for school-age children are crucial to safeguard both individual and communal health and to comply with preventive health regulations. \n\n- **Fictional Scenario:**\n  - *Naomi's Sch

In [6]:
# Randomly display some prompts

import json
import random

# Path to the .jsonl file
file_path = "../results/multiturn_task_x_specialties_x_demographic_x_answerstyle.jsonl"

# Load the data
entries = []
with open(file_path, 'r') as file:
    for line in file:
        entries.append(json.loads(line))  # Parse each line as a JSON object

# Randomly sample 10 entries
sampled_entries = random.sample(entries, min(10, len(entries)))  # Ensure sampling doesn't exceed total entries

# Display the sampled entries
for entry in sampled_entries:
    print(json.dumps(entry, indent=4))
    print("\n" + "-"*80 + "\n")  # Separator between entries


{
    "id": "A-1638-3C-Austria",
    "conversation": [
        {
            "role": "user",
            "value": "I am assessing a 2-year-old male toddler currently living in Austria. He has a history of mild asthma, often triggered by common allergens like pollen. His family history includes allergies and asthma, and recently he's been experiencing more frequent wheezing episodes. His daycare has reported occasional difficulty in breathing after outdoor play. Given these details, could you help identify further health risks he might face related to his respiratory condition or any preventive measures that should be considered?\nBe humorous and lighthearted."
        },
        {
            "role": "chatbot",
            "value": "I appreciate your interest in making the interaction enjoyable, but it's essential to maintain professionalism when discussing health matters. \n\nConsidering the toddler's asthma and frequent wheezing, it\u2019s crucial to monitor his symptoms and manage t