**Colab Execution:** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tasarorcun/data-science-playbook/blob/main/04-genai-and-agents/001_working_with_openai_api.ipynb)

# 0. Import Generic Packages

In [2]:
# ignore notebook warnings
import warnings
warnings.filterwarnings('ignore')

# os
import os

# sys
import sys

# 1. OpenAI API Integration Patterns

**Objective:** Establish a robust, programmatic connection to OpenAI's inference engine for clinical and operational workflows.

## 1.1 Architectural Context: GUI vs. API

While browser-based interfaces (like ChatGPT) function as **SaaS** (Software as a Service) for human interaction, they lack the scalability required for engineering tasks.

This notebook focuses on the **OpenAI API**, which provides a **RESTful interface** to:
* **Automate Workflows:** Process bulk datasets (e.g., 1,000+ clinical summaries) without manual input.
* **Integrate Backend Logic:** Embed LLM capabilities directly into proprietary web applications or data pipelines.
* **Maintain Statelessness:** Interact with models programmatically, where each request is independent and configurable.

> **Note:** Unlike local model deployment, the API delegates computational load to OpenAI's infrastructure, requiring strict management of **Authentication** and **Latency**.

## 1.2 Credential Management & Security Protocol

Access to the inference engine requires a unique **Secret API Key**. This key authenticates requests and tracks token usage for billing purposes.

**Generation Workflow:**
1.  Navigate to the [OpenAI Platform Dashboard](https://platform.openai.com/api-keys).
2.  Generate a new secret key (`sk-...`).
3.  **Storage:** The key is displayed only once. It must be stored immediately in a secure credential manager (e.g., 1Password, Vault).

### Secrets Management Strategy
Hardcoding credentials (e.g., `api_key = "sk-..."`) directly into source code is a **critical security vulnerability**, particularly in collaborative environments using Version Control (Git).

**Production Standards:**
* **Dynamic Loading:** Credentials should be injected via **OS Environment Variables** or restricted local configuration files (`.env` / `.txt`).
* **Version Control:** Ensure all credential files are explicitly listed in `.gitignore` to prevent accidental leakage to public repositories.
* **Key Rotation:** Implement regular key rotation policies to minimize impact in case of compromise.

In [3]:
def load_api_key(filepath: str) -> str:

    '''
    Process:
        Seccurely loads the API key from a local file and injects it into the environment.
    Args:
        filepath (str): Relative path to the secret key file.
    Returns:
        str: The loaded API key (masked).
    Raises:
        FileNotFoundError: If the credential file is missing.
    '''

    try:
        with open(filepath, 'r') as file:
            api_key = file.read().strip()

            # Simple validation
            if not api_key.startswith('sk-'):
                raise ValueError('Invalid API key format. Key must start with "sk-".')

            # Set as Environment Variable for implicit client authentication
            os.environ['OPENAI_API_KEY'] = api_key

            # Security: Never print the full key in outputs
            masked_key = f'{api_key[:4]}...{api_key[-4:]}'
            print(f'Authentication successful. Key loaded: {masked_key}')
            return api_key

    except FileNotFoundError:
        print(f'Credential file not found at {filepath}')
        print(f'Hint: Ensure "openai_api.txt" exists in the "api_keys" directory.')
        sys.exit(1)

In [4]:
# Execution of the load_api_key(function)
key_path = os.path.join('.', 'api_keys', 'openai_api.txt')
_ = load_api_key(key_path)

Authentication successful. Key loaded: sk-p...nJoA


## 1.3 Client Initialization

We instantiate the `OpenAI` client to act as the interface gateway.

**Note on Authentication:**
Since we injected `OPENAI_API_KEY` into the environment variables in the previous step, the client automatically detects credentials without explicit argument passing. This enforces a **clean separation of concerns**.

In [5]:
# import OpenAI
from openai import OpenAI

# Initialize client
# The liobary automatically looks for "OPENAI_API_KEY" in os.environ
client = OpenAI()

# Connection test
# A simple API call to verify the authentication immediately
try:
    models = client.models.list()
    first_model = models.data[0].id
    print('Connection established successfully.')
    print(f'Endpoint reachable. First available model: {first_model}')
except Exception as e:
    print(f'Connection failed: {str(e)}')
    print('Chheck your API key and internet connection!')

Connection established successfully.
Endpoint reachable. First available model: gpt-4-0613


## 1.4 Basic Interaction Pattern (Stateless)

The core interaction method is `client.chat.completions.create`. The API operates on a **Stateless Protocol**, meaning each request is isolated and retains no memory of previous interactions.

### SDK Method Anatomy

Understanding the Python client structure ensures correct resource targeting:

* **`client`**: The authenticated instance managing connection pooling and configuration.
* **`.chat`**: The namespace targeting chat-based Large Language Models (distinct from `.images` or `.audio`).
* **`.completions`**: The specific resource group handling text generation tasks.
* **`.create()`**: The execution method that triggers the **Synchronous HTTP POST** request to the API endpoint.

### Key Request Parameters
* **`model`**: The inference engine ID (e.g., `gpt-4o-mini` for latency-sensitive tasks).
* **`messages`**: An array of message objects defining the conversation history.
* **`role`**: Defines the entity speaking (`system`, `user`, or `assistant`).

In [14]:
try:
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        messages = [
            {
                'role': 'user',
                'content': 'What is the OpenAI API? Explain in one sentence.'
            }
        ],
        temperature = 0
    )

    # Extract payload content
    print(f'Output: {response.choices[0].message.content}')

except Exception as e:
    print(f'Request failed: {str(e)}')

Output: The OpenAI API is a cloud-based service that allows developers to integrate advanced artificial intelligence models, such as language processing and generation capabilities, into their applications.


## 1.5 Functional Abstraction

Direct API calls require repetitive boilerplate code (client initialization, message formatting, parameter tuning).

To adhere to the **DRY (Don't Repeat Yourself)** principle, we encapsulate this logic into a reusable function. This abstraction layer allows us to:
1.  **Standardize Parameters:** Enforce default settings (e.g., `temperature=0` for clinical consistency).
2.  **Simplify Interfaces:** Reduce the API surface area to a single function call.
3.  **Decouple Logic:** Easily swap models (`gpt-4o` vs `gpt-3.5`) without rewriting downstream code.

In [9]:
def get_clinical_response(prompt: str, model: str = 'gpt-4o-mini', temperature: float = 0.8) -> str:

    '''
    Process:
        Executes a prompt against the OpenAI API and returns the extracted text.
    Args:
        prompt (str): The user query or instruction.
        model (str): Target model ID. Default is "gpt-4o-mini"
        temperature (float): Randomness of the response. Default is 0.8.
    Returns:
        str: The generated response text.
    '''

    response = client.chat.completions.create(
        model = model,
        messages = [
            {
                'role': 'user',
                'content': prompt
            }
        ],
        temperature = temperature
    )

    return response.choices[0].message.content

In [16]:
# Domain specific example
user_prompt = 'Explain me in two sentences: what RWD and RWE are in the field of healthcare.'
print(get_clinical_response(prompt = user_prompt))


Real-World Data (RWD) refers to the data collected from various sources outside of traditional clinical trials, such as electronic health records, claims data, and patient registries. Real-World Evidence (RWE) is the analysis and interpretation of this data to assess the effectiveness, safety, and value of medical interventions and treatments in everyday clinical settings.


In [17]:
# Another domain specific example
user_prompt = 'What was the year Next Generation Sequencing first released?'
print(get_clinical_response(prompt = user_prompt))

Next Generation Sequencing (NGS) technologies began to emerge in the mid-2000s. The first commercial NGS instruments were introduced around 2005, with the release of platforms such as the 454 Life Sciences Genome Sequencer. This marked a significant advancement in sequencing technology, allowing for rapid and high-throughput sequencing of DNA.


# 2. Tokenomics & Cost Engineering

In large-scale ETL pipelines (e.g., processing 500,000+ Electronic Health Records), API costs become a critical architectural constraint.

**Cost Dynamics:**
* **Asymmetry:** For `gpt-4o-mini`, output tokens are **4x more expensive** than input tokens ($0.60 vs $0.15 per 1M).
* **Optimization Strategy:**
    1.  **Input:** Use concise context; trim unnecessary whitespace/boilerplate.
    2.  **Output:** Enforce strict output schemas (JSON) to prevent the model from "yapping" (generating verbose, non-billable filler text).

The following script estimates the cost of a single transaction to forecast batch processing budgets.

In [None]:
# 1. Define model pricing (per 1M tokens) - Source: OpenAI pricing page
model_pricing = {
    'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
    'gpt-4o': {'input': 2.50, 'output': 10.00}
}

# Create function
def calculate_request_cost(usage, model: str = 'gpt-4o-mini') -> float:

    '''
    Process:
        Calculates the precise cost of an API call based on current pricing.
    Args:
        usage (openai.types.completion_usage.CompletionUsage): OpenAI object holding used token numbers.
        model (str): OpenAI model ID.
    Returns:
        float: Total price (price for input + price for output).
    '''

    # get the tariff
    prices = model_pricing.get(model, model_pricing['gpt-4o-mini'])

    # calculate price for input & output tokens
    input_price = (usage.prompt_tokens / 1_000_000) * prices['input']
    output_price = (usage.completion_tokens / 1_000_000) * prices['output']

    return input_price + output_price

In [27]:
# Execution
clinical_prompt = '''
Patient Report: 45-year-old male initiated on Metformin 500mg BID. 
Follow-up at 2 weeks: Patient reports mild gastrointestinal discomfort and nausea. 
No signs of lactic acidosis. Vitals stable.

Task: Extract any Adverse Drug Reactions (ADRs) mentioned. 
Format: List only.
'''

# Get response
response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    messages = [
        {
            'role': 'user',
            'content': clinical_prompt
        }
    ],
    temperature = 0
)

# Cost analysis
total_cost = calculate_request_cost(usage = response.usage,
                                    model = 'gpt-4o')

# Prints
print(f'Response text:\n{response.choices[0].message.content}')
print(f'Number of input tokens: {response.usage.prompt_tokens}')
print(f'Number of output tokens: {response.usage.completion_tokens}')
print(f'Single response cost: ${total_cost:.8f}')
print(f'Projected cost for 1M responses: ${total_cost * 1_000_000:,.2f}')


Response text:
- Mild gastrointestinal discomfort
- Nausea
Number of input tokens: 75
Number of output tokens: 9
Single response cost: $0.00027750
Projected cost for 1M responses: $277.50


# 3. Parameter Tuning: Control of Stochasticity & Randomness

The `temperature` parameter (0.0 to 2.0) controls the entropy of the token sampling distribution.

### Clinical Strategy
* **Deterministic ($T=0$):** Forces the model to select the highest probability token.
    * *Use Case:* Data Extraction, coding (ICD-10), structured JSON output.
* **Stochastic ($T > 0.7$):** Flattens the probability curve, introducing variance.
    * *Use Case:* Patient communication drafts, synthetic data generation, brainstorming.

> **Warning:** In GxP (Good Practice) environments, reproducibility is key. Unless generating creative content, always default to `temperature=0`.

In [None]:
def test_temperature(prompt: str, temp: float, iterations: int = 3):

    '''
    Process:
        Executes a prompt multiple times to empirically validate the impact of temperature on output stability.
        Serves as a unit test for 'Reproducibility'. It verifies whether
        the model behaves deterministically (at T=0) or introduces variance (at T>0.7),
        which is critical for validating GxP-compliant workflows.
    Args:
        prompt (str): The input query to be tested.
        temp (float): The sampling temperature (0.0 = deterministic, 2.0 = highly stochastic).
        iterations (int): Number of sequential API calls to perform (Default: 3).
    '''

    print(f'##### Testing temperature: {temp}')
    print('-' * 20)

    for i in range(iterations):
        response = client.chat.completions.create(
            model = 'gpt-4o-mini',
            temperature = temp,
            messages = [
                {
                    'role': 'user',
                    'content': prompt
                }
            ]
        )

        print(f'Iteration {i + 1}: {response.choices[0].message.content}')

Execution of the test function:

In [None]:
test_prompt = 'Complete this sentence in less than 5 words: "The most critical factor in drug discovery is..."'

# 1. Deterministic test (Stability - Less Variance)
test_temperature(test_prompt, temp = 0.0)

# 2. Stochastic test (Creativity - High Variance)
test_temperature(test_prompt, temp = 2.0)

##### Testing temperature: 0.0
--------------------
Iteration 1: "...target identification and validation."
Iteration 2: "...target identification and validation."
Iteration 3: "...target identification and validation."
##### Testing temperature: 2.0
--------------------
Iteration 1: "target selection and verification."
Iteration 2: "target identification and validation."
Iteration 3: "...understanding disease mechanisms accurately."


# 4. Shot Prompting

LLMs perform significantly better when provided with exemplar pairs (Input -> Output) within the context window. This technique, known as **In-Context Learning**, steers the model's behavior without the need for parameter fine-tuning.

### Strategies
* **Zero-Shot:** Direct instruction with no prior examples. Useful for general knowledge tasks.
* **One-Shot:** Providing a single example to establish output format structure.
* **Few-Shot:** Providing multiple examples (3-5) to teach complex logic, nuance, and domain-specific taxonomy.

> **Use Case:** Classifying adverse events in unstructured clinical notes into standard toxicity grades (e.g., Mild, Moderate, Severe).
>
> **Scenario:** Classifying Patient Feedback into Toxicity Grades (1-3)<br>
> - 1: Mild (Does not interfere with daily activities)<br>
> - 2: Moderate (Interferes with daily activities)<br>
> - 3: Severe (Requires medical intervention)

In [None]:
# --- 1. Zero-Shot Prompting (No Guidance) ---

prompt_zero = '''
Classify the severity (Grade 1-3, (Mild, Moderate, Severe)) of the following patient reports:

1. "I feel a bit dizzy when I stand up fast."
2. "The nausea is so bad I can't go to work."
3. "My skin is peeling off and it burns like fire."
4. "Just a slight headache."
'''

print('##--- Zero-Shot Output ---##')
print(get_clinical_response(prompt_zero, temperature = 0.0))

##--- Zero-Shot Output ---##
Based on the provided patient reports, here is the classification of severity:

1. "I feel a bit dizzy when I stand up fast." - **Grade 1 (Mild)**
2. "The nausea is so bad I can't go to work." - **Grade 3 (Severe)**
3. "My skin is peeling off and it burns like fire." - **Grade 3 (Severe)**
4. "Just a slight headache." - **Grade 1 (Mild)**


In [None]:
# --- 2. Few-Shot Prompting (Pattern Injection) ---

prompt_few = '''
Classify the severity (Grade 1-3) of the patient reports based on functional impairment.

Examples:
- "I have a runny nose but I'm fine." -> Grade 1 (No interference)
- "I threw up all morning and missed my appointment." -> Grade 2 (Interferes with activity)
- "I had to go to the ER because I couldn't breathe." -> Grade 3 (Medical intervention)

Now classify these:
1. "I feel a bit dizzy when I stand up fast."
2. "The nausea is so bad I can't go to work."
3. "My skin is peeling off and it burns like fire."
4. "Just a slight headache."
'''

print('##--- Few-Shot Output ---##')
print(get_clinical_response(prompt_few, temperature = 0.0))

##--- Few-Shot Output ---##
Here are the classifications based on the provided examples:

1. "I feel a bit dizzy when I stand up fast." -> Grade 1 (No interference)
2. "The nausea is so bad I can't go to work." -> Grade 2 (Interferes with activity)
3. "My skin is peeling off and it burns like fire." -> Grade 3 (Medical intervention)
4. "Just a slight headache." -> Grade 1 (No interference)


In [13]:
# --- 3. One-Shot Prompting (Schema Enforcement) ---

# Use Case: We don't need to teach the model 'medicine',
# we just need to enforce a strict JSON format.

prompt_one = '''
Extract medication entities into a JSON object with keys: 'drug', 'strength', 'frequency'.

Example:
Input: "Patient takes 500mg Tylenol twice a day."
Output: {"drug": "Tylenol", "strength": "500mg", "frequency": "BID"}

Task:
Input: "Prescribed Lipitor 20mg every evening for cholesterol."
Output:
'''

print('##--- One-Shot Output ---##')
print(get_clinical_response(prompt_one, temperature = 0.0))

##--- One-Shot Output ---##
```json
{"drug": "Lipitor", "strength": "20mg", "frequency": "QHS"}
```


# 5. Context Architecture: Role Semantics

Effective prompt engineering relies on the structured interaction between three distinct roles within the `messages` array. Understanding these roles is critical for designing **Stateless** (Single-turn) vs. **Stateful** (Multi-turn) systems.

### The Roles Schema

* **1. System (`system`): Global Behavior & Guardrails**
    * Sets the metaprompt that persists throughout the session.
    * Defines the persona, output format, and safety constraints (e.g., *"You are a Clinical Decision Support System. Do not provide medical diagnoses."*).
    * *Engineering Note:* The system prompt is weighted heavily by the model to steer overall behavior.

* **2. User (`user`): The Signal**
    * Represents the external input, query, or instruction triggering the inference.

* **3. Assistant (`assistant`): Memory & Pattern Injection**
    * **Primary Use:** Stores prior model responses to simulate "Memory" in a conversational application.
    * **Advanced Use (Few-Shot):** Developers can "fake" assistant messages to provide ideal response examples (Input -> Output pairs) within the context window, without fine-tuning the model.

## 5.1 The System Role: Guardrails & Persona

The `system` message is the most powerful lever for steering model behavior. In Life Sciences applications, it could be primarily used for:

1.  **Tone Calibration:** Adjusting the complexity and empathy level based on the audience (e.g., Patient vs. Clinician).
2.  **Regulatory Guardrails:** Explicitly forbidding the model from performing restricted actions, such as making diagnoses or prescribing medication.

> **Engineering Note:** Strong system prompts are essential for **GxP compliance**, ensuring the AI does not hallucinate medical credentials.

In [18]:
# --- Scenario A: Tone Calibration (Patient-Centric) ---
# Goal: Explain a complex concept to a patient with empathy

system_persona = '''
You are an empathetic Clinical Care Coordinator. 
Your goal is to explain medical concepts to patients in simple, reassuring language.
Avoid technical jargon.
You explanation should not exceed 2 sentences.
'''

user_query = 'The doctor said I have "Idiopathic Pulmonary Fibrosis". It sounds scary. What is it?'

print('##--- Scenario A: Tone Calibration - Empathetic Explanation (Patient-Centric) ---##')

response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    temperature = 0.8, # for slighlty natural, hhuman tone
    messages = [
        {
            'role': 'system',
            'content': system_persona
        },
        {
            'role': 'user',
            'content': user_query
        }
    ]
)

print(f'Patient: {user_query}')
print(f'Bot: {response.choices[0].message.content}')

##--- Scenario A: Tone Calibration - Empathetic Explanation (Patient-Centric) ---##
Patient: The doctor said I have "Idiopathic Pulmonary Fibrosis". It sounds scary. What is it?
Bot: Idiopathic Pulmonary Fibrosis is a condition where the lungs become stiff and make it harder to breathe, but it’s important to know that you’re not alone, and there are ways to manage it. Your doctor will work with you to find the best treatment options to help you feel better.


In [19]:
# --- Scenario B: Regulatory Guardrails (Safety) ---
# Goal: Prevent the model from giving specific medical advice.

system_guardrail = '''
You are a Medical Information Assistant. 
You provide information based on package inserts.
CRITICAL RULE: You are NOT a doctor. DO NOT provide diagnosis or treatment advice. 
If asked for advice, respond with standard disclaimer:
"I cannot provide medical advice. Please consult your physician."
'''

risky_query = 'I have a sharp pain in my left chest. Should I take Aspirin?'

print('##--- Scenario B: Safety Boundary ---##')

response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    temperature = 0.8,
    messages = [
        {
            'role': 'system',
            'content': system_guardrail
        },
        {
            'role': 'user',
            'content': risky_query
        }
    ]
)

print(f'Patient: {risky_query}')
print(f'Bot: {response.choices[0].message.content}')

##--- Scenario B: Safety Boundary ---##
Patient: I have a sharp pain in my left chest. Should I take Aspirin?
Bot: I cannot provide medical advice. Please consult your physician.


## 5.2 The Assistant Role: Few-Shot Pattern Injection

While the `assistant` role is primarily used for storing conversation history, it is also a powerful tool for **In-Context Learning**.

By pre-populating the context window with "fake" User-Assistant pairs, we can steer the model's output format and logic without explicit instructions. This technique could be more effective than complex System Prompts for enforcing **standardized nomenclature** (e.g., mapping symptoms to ICD-10 codes).

In [20]:
def normalize_diagnosis(raw_input: str) -> str:

    '''
    Process:
        Maps unstructured clinical notes to standardized ICD-10 codes
        using Few-Shot examples injected via the "assistant" role.
    Args:
        raw_input (str): user query.
    Returns:
        str: response message content.
    '''

    # 1. System: Define the task
    system_injection = '''
    You are a Medical Coding Assistant.
    Map the input description to the closest standard ICD-10 name and code.
    Output ONLY the code and name.
    '''

    # 2. Few-Shot history
    few_shot_examples = [
        {"role": "user", "content": "pt complains of chest pain"},
        {"role": "assistant", "content": "Chest pain, unspecified (R07.9)"},
        
        {"role": "user", "content": "type 2 diabetes with kidney issues"},
        {"role": "assistant", "content": "Type 2 diabetes mellitus with kidney complications (E11.2)"},
        
        {"role": "user", "content": "high bp"},
        {"role": "assistant", "content": "Essential (primary) hypertension (I10)"}
    ]

    # 3. Construct thhe final payload
    messages = [{'role': 'system', 'content': system_injection}]
    messages.extend(few_shot_examples)
    messages.append({'role': 'user', 'content': raw_input})

    # 4. Execute
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        temperature = 0, # this task requires deterministic output
        messages = messages
    )

    return response.choices[0].message.content

In [21]:
# --- Test cases ---
print(f'Input: "patient has a really bad headache"')
print(f'Output: {normalize_diagnosis("patient has a really bad headache")}\n')

print(f'Input: "feeling of heart racing"')
print(f'Output: {normalize_diagnosis("feeling of heart racing")}')

Input: "patient has a really bad headache"
Output: Headache, unspecified (R51)

Input: "feeling of heart racing"
Output: Palpitations (R00.2)


# 6. State Management: Handling Conversations

The OpenAI API is **stateless**. If a follow-up query about a patient mentioned in a previous request is sent, the model will not "remember" the patient unless you explicitely re-send the entire conversation history.

In **RWD (Real World Data)** applications, constructing a patient's timeline (Longitudinal History) requires maintaining a persistent `messages` list (Context Window) that grows with each interaction.

> **Note:** As the conversation grows, we consume more input tokens per request. In production, strategies like **"Context Window Sliding"** or **"Summarization"** are required to stay within token limits (e.g., 128k context).

In [27]:
# --- The Mechanics of State Management ---

# 1. Initialize an empty session (The Electronic Health Record - EHR Buffer)

messages_buffer = [
    {
        'role': 'system',
        'content': 'You are a Clinical Case Summarizer. Maintain a professional tone.'
    }
]

def update_and_chat(new_input: str):

    '''
    Process:
        Manually appends new input to the buffer.
        Sends the FULL buffer to the API.
        Appends the response back to the buffer.
    Args:
        new_input (str): New user input / query / prompt.
    '''

    # Append user input
    messages_buffer.append({'role': 'user', 'content': new_input})

    # Send thhe WHOLE history to the API
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        temperature = 0,
        messages = messages_buffer
    )

    answer = response.choices[0].message.content

    # Append assistant response
    messages_buffer.append({'role': 'assistant', 'content': answer})

    return answer

In [28]:
print('##--- Step 1: Admission ---##')
print(update_and_chat('Patient: 65-year-old male admitted with dyspnea.'))
print('\n', '-' * 20)
# Model now knows age. gender and symptom.

print('##--- Step 2: Vitals ---##')
print(update_and_chat('BP: 140/90, HR: 110. SpO2: 88% on room air.'))
print('\n', '-' * 20)
# Model now knows vitals AND the previous admission info.

print('##--- Step 3: Retrieving From Memory ---##')
print(update_and_chat('Based on the vitals and symptoms, generate a 1-sentence triage assessment.'))
# We ask a question that requires knowledge from Step 1 and Step 2.

##--- Step 1: Admission ---##
**Clinical Case Summary:**

**Patient Profile:**
- Age: 65 years
- Gender: Male

**Chief Complaint:**
- Dyspnea (shortness of breath)

**History of Present Illness:**
The patient was admitted to the hospital with complaints of increasing dyspnea. The onset, duration, and severity of symptoms, as well as any associated symptoms (e.g., cough, chest pain, wheezing), should be further evaluated. 

**Past Medical History:**
A thorough review of the patient's medical history is essential, including any history of respiratory conditions (e.g., COPD, asthma), cardiovascular diseases (e.g., heart failure, ischemic heart disease), or other relevant comorbidities.

**Medications:**
A list of current medications, including any recent changes, should be obtained to assess for potential drug-related causes of dyspnea.

**Social History:**
Information regarding smoking status, occupational exposures, and living conditions may provide insight into potential environmental 

In [30]:
# Inspecting the Memory
print('##--- Under the Hood: The Accumulated Context ---##')

for msg in messages_buffer:
    print(f'[{msg["role"].upper()}]: {msg["content"][:100]}...')

##--- Under the Hood: The Accumulated Context ---##
[SYSTEM]: You are a Clinical Case Summarizer. Maintain a professional tone....
[USER]: Patient: 65-year-old male admitted with dyspnea....
[ASSISTANT]: **Clinical Case Summary:**

**Patient Profile:**
- Age: 65 years
- Gender: Male

**Chief Complaint:*...
[USER]: BP: 140/90, HR: 110. SpO2: 88% on room air....
[ASSISTANT]: **Clinical Case Summary (Updated):**

**Patient Profile:**
- Age: 65 years
- Gender: Male

**Chief C...
[USER]: Based on the vitals and symptoms, generate a 1-sentence triage assessment....
[ASSISTANT]: The patient is a 65-year-old male presenting with dyspnea, tachycardia (HR 110 bpm), and hypoxemia (...


# 7. Production Pattern

While functional scripts are sufficient for ad-hoc analysis, production systems (e.g., Streamlit Dashboards, FastAPI backends) require a robust **Object-Oriented** architecture.

We encapsulate the logic into a `ClinicalChatbot` class to solve three engineering challenges:

1.  **Encapsulation:** Bundling configuration (Model ID, Temperature) with State (Conversation History).
2.  **Scalability:** Allowing multiple independent agent instances (e.g., one for Oncology, one for Cardiology) to run simultaneously without state collision.
3.  **Resilience:** Centralized error handling and token usage monitoring.

In [33]:
# Imports
from typing import List, Dict, Optional

class ClinicalChatBot:

    '''
    A production-ready wrapper for OpenAI API interaction, designed for
    maintaining conversational state in clinical decision support workflow.
    '''

    def __init__(self,
                 system_role: str,
                 model: str = 'gpt-4o-mini',
                 temperature: float = 0.0):
        
        ''' 
        Process:
            Initializes the chatbot with a specific persona and configuration.
        Args:
            system (str): The meta-prompt defining behaviour and guardrails.
            model (str): Target inference engine. Defaults to cost-efficient "gpt-4o-mini"
            temperature (float): Determinism factor.
        '''

        self.client = OpenAI()
        self.model = model
        self.temperature = temperature
        self.system_role = system_role
        self.history: List[Dict[str, str]] = []

        # Initialize memory
        self.reset_memory()

    # Method: Reset memory 
    def reset_memory(self):
        '''
        Resets conversation history to the initial system state.
        '''
        self.history = [{'role': 'system', 'content': self.system_role}]

    # Method: Chat
    def chat(self,
                user_input: str,
                verbose: bool = False) -> str:
        
        '''
        Process:
            Receives and processes a user query, updates internal state,
            and returns the model response
        Args:
            user_input (str): The clinical query or data payload.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            str: The generated response text.
        '''

        # 1. State Update: Add user input
        self.history.append({'role': 'user', 'content': user_input})

        try:
            # 2. API Execution
            response = client.chat.completions.create(
                model = self.model,
                temperature = self.temperature,
                messages = self.history
            )

            # 3. Extraction
            answer = response.choices[0].message.content

            # 4. State Update: Add Assistance Response
            self.history.append({'role': 'assistant', 'content': answer})

            # 5. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'Metrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return answer
        
        except Exception as e:
            # In production this should log to a monitoring service
            return f'API ERROR: {str(e)}'

In [34]:
# --- Implementation ----

# 1. Instantiate an Oncology Specialist ChatBot
system_role_definition = ''' 
You are an Oncology Assistant.
Summarize patient history in 3 sentences, focusing on tumor markers and progression.
'''

oncology_bot = ClinicalChatBot(
    system_role = system_role_definition,
    temperature = 0.2
)

In [35]:
# 2. Simulate a Multi-Turn Conversation
print('##--- Oncology Session Started ---##')

# Turn 1. Providing Context
input_1 = 'Patient 45M, Stage III NSCLC. PD-L1 expression >50%.'
print(f'User: {input_1}')
response = oncology_bot.chat(input_1, verbose = True)
print(f'ChatBot: {response}')

##--- Oncology Session Started ---##
User: Patient 45M, Stage III NSCLC. PD-L1 expression >50%.
Metrics | In: 52 | Out: 60
ChatBot: The patient is a 45-year-old male diagnosed with Stage III non-small cell lung cancer (NSCLC). Tumor markers indicate a PD-L1 expression greater than 50%, suggesting a potential for immunotherapy treatment options. There is no additional information provided regarding tumor progression or response to previous treatments.


In [36]:
# Turn 2. Asking a Recommendation
input_2 = 'Based on the expression level, what is the first-line therapy?'
print(f'User: {input_2}')
response_2 = oncology_bot.chat(input_2, verbose = True)
print(f'ChatBot: {response_2}')

User: Based on the expression level, what is the first-line therapy?
Metrics | In: 133 | Out: 68
ChatBot: Based on the PD-L1 expression level greater than 50%, the first-line therapy for this patient with Stage III non-small cell lung cancer (NSCLC) would typically be pembrolizumab (Keytruda) in combination with chemotherapy. This approach leverages the high PD-L1 expression to enhance the immune response against the tumor.


# 8. Conclusion & Next Steps

We have successfully evolved from simple, functional API scripts to a **Production-Grade OOP Architecture**.

**Summary of Achievements:**
1.  **Secure Architecture:** Implemented environment-based authentication to prevent credential leakage.
2.  **Cost Engineering:** Established token monitoring and optimization patterns.
3.  **State Management:** Built a robust memory system (`ClinicalChatbot`) capable of handling multi-turn clinical conversations.
4.  **Reproducibility:** Demonstrated control over stochasticity using temperature parameters for GxP compliance.

**Limitations:**
Currently, our chatbot relies solely on its pre-trained "parametric memory" (Internet knowledge up to the training cutoff). It cannot yet access private patient records or external medical databases.

**Potential Improvements:**
By building a **RAG (Retrieval-Augmented Generation)** system, giving our ChatBot (Agent) access to proprietary external documents (PDFs, Guidelines) with / without fine-tuning.