# Generating Counsellor Patient Chat Conversation 

DataDesigner : A general library for generating high-quality synthetic data from scratch or based on your own seed data.<br>
Reference : https://github.com/NVIDIA-NeMo/DataDesigner

In [None]:
! pip install data-designer

Collecting data-designer
  Downloading data_designer-0.1.3-py3-none-any.whl.metadata (6.6 kB)
Collecting anyascii<1.0,>=0.3.3 (from data-designer)
  Downloading anyascii-0.3.3-py3-none-any.whl.metadata (1.6 kB)
Collecting duckdb==1.1.3 (from data-designer)
  Downloading duckdb-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (762 bytes)
Collecting faker==20.1.0 (from data-designer)
  Downloading Faker-20.1.0-py3-none-any.whl.metadata (15 kB)
Collecting httpx-retries>=0.4.2 (from data-designer)
  Downloading httpx_retries-0.4.5-py3-none-any.whl.metadata (4.1 kB)
Collecting json-repair==0.48.0 (from data-designer)
  Downloading json_repair-0.48.0-py3-none-any.whl.metadata (12 kB)
Collecting jsonpath-rust-bindings>=1.0 (from data-designer)
  Downloading jsonpath_rust_bindings-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (880 bytes)
Collecting litellm==1.73.6 (from data-designer)
  Downloading litellm-1.73.6-py3-none-any.whl.metadata (3

In [2]:
import os
from google.colab import userdata
import pandas as pd

from data_designer.essentials import (
    CategorySamplerParams,
    DataDesigner,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    SamplerColumnConfig,
    SamplerType,
)

# Set your NVIDIA API key here
os.environ["NVIDIA_API_KEY"] = userdata.get('NVIDIA') # REPLACE WITH YOUR ACTUAL KEY

# Initialize DataDesigner with NVIDIA configuration (no explicit model_configs needed for NVIDIA if key is set)
data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

# Add a category for psychological illness type
config_builder.add_column(
    SamplerColumnConfig(
        name="illness_type",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["anxiety", "OCD", "depression", "bipolar disorder"],
        ),
    )
)

# Generate the counsellor's initial message
config_builder.add_column(
    LLMTextColumnConfig(
        name="counsellor_message_1",
        model_alias="nvidia-text", # Using NVIDIA model
        prompt="""As a psychological counsellor, start a conversation with a patient who is dealing with {{ illness_type }}. Keep your tone empathetic and professional. Introduce yourself and ask about their current feelings.""",
    )
)

# Generate the patient's response
config_builder.add_column(
    LLMTextColumnConfig(
        name="patient_message_1",
        model_alias="nvidia-text", # Using NVIDIA model
        prompt="""I am a patient talking to a psychological counsellor. Respond to the counsellor's message: '{{ counsellor_message_1 }}'. Describe how your {{ illness_type }} is affecting you recently.""",
    )
)

# Generate the counsellor's second message, responding to the patient
config_builder.add_column(
    LLMTextColumnConfig(
        name="counsellor_message_2",
        model_alias="nvidia-text", # Using NVIDIA model
        prompt="""As a psychological counsellor, respond empathetically to the patient's message: '{{ patient_message_1 }}'. Acknowledge their feelings and ask a follow-up question related to their {{ illness_type }} or current situation.""",
    )
)

# Generate the patient's second response
config_builder.add_column(
    LLMTextColumnConfig(
        name="patient_message_2",
        model_alias="nvidia-text", # Using NVIDIA model
        prompt="""I am a patient talking to a psychological counsellor. Respond to the counsellor's message: '{{ counsellor_message_2 }}'. Continue to describe your experience with {{ illness_type }} and your feelings, or answer their question.""",
    )
)




In [3]:
# Generate 30 records (chats) using the preview method with num_records
preview_results = data_designer.preview(config_builder=config_builder, num_records=50)



[13:32:45] [INFO] üëÄ Preview generation in progress
[13:32:45] [INFO] ‚úÖ Validation passed
[13:32:45] [INFO] ‚õìÔ∏è Sorting column configs into a Directed Acyclic Graph
[13:32:45] [INFO] ü©∫ Running health checks for models...
[13:32:45] [INFO]   |-- üëÄ Checking 'nvidia/nvidia-nemotron-nano-9b-v2' in provider named 'nvidia' for model alias 'nvidia-text'...
[13:32:47] [INFO]   |-- ‚úÖ Passed!
[13:32:47] [INFO] üé≤ Preparing samplers to generate 50 records across 1 columns
[13:32:47] [INFO] üìù Preparing llm-text column generation
[13:32:47] [INFO]   |-- column name: 'counsellor_message_1'
[13:32:47] [INFO]   |-- model config:
{
    "alias": "nvidia-text",
    "model": "nvidia/nvidia-nemotron-nano-9b-v2",
    "inference_parameters": {
        "temperature": 0.85,
        "top_p": 0.95,
        "max_tokens": null,
        "max_parallel_requests": 4,
        "timeout": null,
        "extra_body": null
    },
    "provider": "nvidia"
}
[13:32:47] [INFO] üêô Processing llm-text colu

In [5]:
df = preview_results.dataset

In [8]:
df = df[["counsellor_message_1","patient_message_1","counsellor_message_2","patient_message_2"]]

In [9]:
df

Unnamed: 0,counsellor_message_1,patient_message_1,counsellor_message_2,patient_message_2
0,"**Counselor:** *Hello, I‚Äôm [Your Name], and I‚Äô...","**Patient:** *Honestly, it‚Äôs been pretty overw...","**Counselor's Response:** \n""It sounds like y...",**Patient's Response:** \nThank you for askin...
1,"**Introduction and Opening:** \n""Hello, I‚Äôm [...","**Patient's Response:** \n""Thank you for crea...",**Response:** \nThank you for sharing this wi...,**Response:** \nThank you for asking about tr...
2,"**Hello, I‚Äôm Dr. [Your Name], a psychological ...","**Response:** \n""Thank you for your kind word...","**Response:** \n""Thank you for sharing this w...","**Response:** \n""I‚Äôve tried taking deep breat..."
3,"**Counselor:** *Hello, I‚Äôm [Your Name], and I‚Äô...","**Patient's Response:** \n\n""Thank you for yo...",**Counselor's Response:** \n\nThank you for s...,**Patient's Response:** \n\nThank you for ask...
4,**Counselor:** *Smiling warmly and maintaining...,"**Patient:** \n""Thank you for creating such a...","**Counselor's Response:** \n""Thank you for sh...","**Patient's Response:** \n""Thank you for your..."
5,"**Introduction:** \n""Hello, I‚Äôm [Your Name], ...","**Response:** \n""Thank you for asking. Honest...","**Response:** \n""Thank you for sharing this w...","**Patient's Response:** \n""Thank you for aski..."
6,"**Introduction:** \n""Hello, my name is [Your ...","**Response from the Patient:** \n""Thank you f...","**Response from the Counsellor:** \n""Thank yo...",**Response from the Patient:** \nThank you fo...
7,"**Introduction:** \n""Hello, I‚Äôm [Your Name], ...","**Patient's Response:** \n""Thank you for bein...","**Counselor's Response:** \n""Thank you for sh...","**Patient's Response:** \n""Thanks for asking‚Äî..."
8,"**Counsellor:** \nHi, I‚Äôm [Your Name], and I‚Äô...",**Patient:** \nThank you for your kind messag...,**Response:** \nThank you so much for sharing...,**Response:** \nThank you for asking about th...
9,"**Hello, I‚Äôm [Your Name], and I‚Äôm a psychologi...","**Response:** \n""Thank you for asking that‚Äîit...","**Response:** \n""Thank you for sharing this w...","**Response:** \n""Thank you for asking about t..."


In [10]:
df.to_csv("counsellor_patient_chat.csv")