**Colab Execution:** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tasarorcun/data-science-playbook/blob/main/04-genai-and-agents/001_working_with_openai_api.ipynb)

# 1. Executive Summary

**Project:** Building a Stateful Clinical Conversational Engine (From API Calls to Agents)<br>
**Author:** Orcun Tasar  |  **Role:** Senior Clinical Data Science Consultant<br>
**Domain:** Healthcare · GenAI · Pharmacovigilance<br>
**Tech Stack:** OpenAI API (v1.x), Python 3.10+, spaCy (NLP), Jupyter Notebook<br>

---

## 1.1 Summary
This notebook demonstrates the architectural evolution of a Generative AI solution for healthcare. Unlike simple scripts that perform stateless API calls, we progressively build a **Production-Grade Clinical Assistant** capable of memory management, structured data extraction, and privacy compliance.

## 1.2 The Narrative: Notebook Roadmap
This notebook simulates the **R&D lifecycle** of a real-world consulting project, evolving through four distinct phases:

### Phase 1: API Fundamentals & Discovery
* **Goal:** Understanding the raw capabilities of `openai.chat.completions` within the context of Life Sciences.
* **Output:** Basic connectivity and response handling.

### Phase 2: The Production Pattern (MVP)
* **Goal:** Encapsulating logic into a reusable **`ClinicalChatBot` class**.
* **Challenge:** Solving the "Statelessness" problem of LLMs.
* **Deliverable:** A chatbot object with persistent memory (sliding-window context) and system-level personas.

### Phase 3: Business Value – Pharmacovigilance (AE) Extraction
* **Goal:** Transforming the chatbot into an **Analyst: `ClinicalAssistant` class**.
* **Challenge:** Extracting structured data (JSON) from messy, unstructured doctor notes.
* **Deliverable:** An automated pipeline for **Adverse Event (AE)** detection and database-ready output generation.

### Phase 4: Enterprise Compliance – The Privacy Gatekeeper
* **Goal:** Implementing **"Privacy by Design"**.
* **Challenge:** Ensuring no PII (Personally Identifiable Information) reaches the cloud API.
* **Deliverable:** A hybrid **PII Scrubber** (using `spaCy` NER + Regex) that sanitizes inputs locally before they touch the model.

---

## 1.3 The Core Challenge & Solution
In the Life Sciences domain, generic chatbots are insufficient due to safety and compliance risks.

* **The Problem:** Raw LLMs are forgetful (stateless) and compliant-agnostic (prone to leaking PII if used naively).
* **The Solution:** We implement a wrapper architecture that serves as a **"Trust Layer"** between the user and the Model. This layer handles **Context**, **Structure**, and **Security** programmatically, treating the LLM as a reasoning engine rather than a database.

*Note: While OpenAI offers managed state features (like the new Responses API or Assistants API which may be deprecated soon / Stored Completions), I deliberately chose a client-side state architecture with the legacy `chat.completions` API.*

**Why?** In the rapidly evolving AI landscape, **Model Agnosticism** and **Interoperability** are crucial for enterprise longevity.
1.  **Vendor Independence:** By managing the conversation history (`messages` list) locally, our architecture is not locked into OpenAI's proprietary "Thread" structures. This allows us to easily swap the inference engine (e.g., migrating to Llama 3, Anthropic, or local models) with minimal code changes.
2.  **Standardization:** The `[{role: user, content: ...}]` format is the industry standard. Adhering to this ensures our data remains compatible with open-source frameworks like LangChain or Haystack.

---

**Final Note: Token Economy & Memory Optimization (Out of Scope in This Notebook)**  
> Long-running clinical conversations can become expensive as the chat history grows. In a production setting, it is important to implement cost-saving strategies such as **sliding-window context** and **summarization-based memory** on top of the basic stateful architecture shown here.  
>  
> In this notebook, we **intentionally focus on the core “trust layer” and stateful design**, and do **not** implement these optimization techniques.  
>  
> If you want to see concrete implementations of memory-optimization patterns (sliding windows, summarization, retention rules, etc.), please refer to the companion notebook:  
> **▶︎ [Memory Optimization Patterns for Clinical Chatbots](<INSERT_LINK_HERE>)**

# Import Generic Packages

In [2]:
# ignore notebook warnings
import warnings
warnings.filterwarnings('ignore')

# os
import os

# sys
import sys

# 1. OpenAI API Integration Patterns

**Objective:** Establish a robust, programmatic connection to OpenAI's inference engine for clinical and operational workflows.

## 1.1 Architectural Context: GUI vs. API

While browser-based interfaces (like ChatGPT) function as **SaaS** (Software as a Service) for human interaction, they lack the scalability required for engineering tasks.

This notebook focuses on the **OpenAI API**, which provides a **RESTful interface** to:
* **Automate Workflows:** Process bulk datasets (e.g., 1,000+ clinical summaries) without manual input.
* **Integrate Backend Logic:** Embed LLM capabilities directly into proprietary web applications or data pipelines.
* **Maintain Statelessness:** Interact with models programmatically, where each request is independent and configurable.

> **Note:** Unlike local model deployment, the API delegates computational load to OpenAI's infrastructure, requiring strict management of **Authentication** and **Latency**.

## 1.2 Credential Management & Security Protocol

Access to the inference engine requires a unique **Secret API Key**. This key authenticates requests and tracks token usage for billing purposes.

**Generation Workflow:**
1.  Navigate to the [OpenAI Platform Dashboard](https://platform.openai.com/api-keys).
2.  Generate a new secret key (`sk-...`).
3.  **Storage:** The key is displayed only once. It must be stored immediately in a secure credential manager (e.g., 1Password, Vault).

### Secrets Management Strategy
Hardcoding credentials (e.g., `api_key = "sk-..."`) directly into source code is a **critical security vulnerability**, particularly in collaborative environments using Version Control (Git).

**Production Standards:**
* **Dynamic Loading:** Credentials should be injected via **OS Environment Variables** or restricted local configuration files (`.env` / `.txt`).
* **Version Control:** Ensure all credential files are explicitly listed in `.gitignore` to prevent accidental leakage to public repositories.
* **Key Rotation:** Implement regular key rotation policies to minimize impact in case of compromise.

In [3]:
def load_api_key(filepath: str) -> str:

    '''
    Process:
        Seccurely loads the API key from a local file and injects it into the environment.
    Args:
        filepath (str): Relative path to the secret key file.
    Returns:
        str: The loaded API key (masked).
    Raises:
        FileNotFoundError: If the credential file is missing.
    '''

    try:
        with open(filepath, 'r') as file:
            api_key = file.read().strip()

            # Simple validation
            if not api_key.startswith('sk-'):
                raise ValueError('Invalid API key format. Key must start with "sk-".')

            # Set as Environment Variable for implicit client authentication
            os.environ['OPENAI_API_KEY'] = api_key

            # Security: Never print the full key in outputs
            masked_key = f'{api_key[:4]}...{api_key[-4:]}'
            print(f'Authentication successful. Key loaded: {masked_key}')
            return api_key

    except FileNotFoundError:
        print(f'Credential file not found at {filepath}')
        print(f'Hint: Ensure "openai_api.txt" exists in the "api_keys" directory.')
        sys.exit(1)

In [4]:
# Execution of the load_api_key(function)
key_path = os.path.join('.', 'api_keys', 'openai_api.txt')
_ = load_api_key(key_path)

Authentication successful. Key loaded: sk-p...nJoA


## 1.3 Client Initialization

We instantiate the `OpenAI` client to act as the interface gateway.

**Note on Authentication:**
Since we injected `OPENAI_API_KEY` into the environment variables in the previous step, the client automatically detects credentials without explicit argument passing. This enforces a **clean separation of concerns**.

In [None]:
# import OpenAI
from openai import OpenAI

# Initialize client
# The liobary automatically looks for "OPENAI_API_KEY" in os.environ
client = OpenAI()

# Connection test
# A simple API call to verify the authentication immediately
try:
    models = client.models.list()
    first_model = models.data[0].id
    print('Connection established successfully.')
    print(f'Endpoint reachable. First available model: {first_model}')
except Exception as e:
    print(f'Connection failed: {str(e)}')
    print('Check your API key and internet connection!')

Connection established successfully.
Endpoint reachable. First available model: gpt-4-0613


## 1.4 Basic Interaction Pattern (Stateless)

The core interaction method is `client.chat.completions.create`. The API operates on a **Stateless Protocol**, meaning each request is isolated and retains no memory of previous interactions.

### SDK Method Anatomy

Understanding the Python client structure ensures correct resource targeting:

* **`client`**: The authenticated instance managing connection pooling and configuration.
* **`.chat`**: The namespace targeting chat-based Large Language Models (distinct from `.images` or `.audio`).
* **`.completions`**: The specific resource group handling text generation tasks.
* **`.create()`**: The execution method that triggers the **Synchronous HTTP POST** request to the API endpoint.

### Key Request Parameters
* **`model`**: The inference engine ID (e.g., `gpt-4o-mini` for latency-sensitive tasks).
* **`messages`**: An array of message objects defining the conversation history.
* **`role`**: Defines the entity speaking (`system`, `user`, or `assistant`).

In [5]:
try:
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        messages = [
            {
                'role': 'user',
                'content': 'What is the OpenAI API? Explain in one sentence.'
            }
        ],
        temperature = 0
    )

    # Extract payload content
    print(f'Output: {response.choices[0].message.content}')

except Exception as e:
    print(f'Request failed: {str(e)}')

Output: The OpenAI API is a cloud-based service that allows developers to integrate advanced artificial intelligence models, such as language processing and generation capabilities, into their applications.


## 1.5 Functional Abstraction

Direct API calls require repetitive boilerplate code (client initialization, message formatting, parameter tuning).

To adhere to the **DRY (Don't Repeat Yourself)** principle, we encapsulate this logic into a reusable function. This abstraction layer allows us to:
1.  **Standardize Parameters:** Enforce default settings (e.g., `temperature=0` for clinical consistency).
2.  **Simplify Interfaces:** Reduce the API surface area to a single function call.
3.  **Decouple Logic:** Easily swap models (`gpt-4o` vs `gpt-3.5`) without rewriting downstream code.

In [9]:
def get_clinical_response(prompt: str, model: str = 'gpt-4o-mini', temperature: float = 0.8) -> str:

    '''
    Process:
        Executes a prompt against the OpenAI API and returns the extracted text.
    Args:
        prompt (str): The user query or instruction.
        model (str): Target model ID. Default is "gpt-4o-mini"
        temperature (float): Randomness of the response. Default is 0.8.
    Returns:
        str: The generated response text.
    '''

    response = client.chat.completions.create(
        model = model,
        messages = [
            {
                'role': 'user',
                'content': prompt
            }
        ],
        temperature = temperature
    )

    return response.choices[0].message.content

In [7]:
# Domain specific example
user_prompt = 'Explain me in two sentences: what RWD and RWE are in the field of healthcare.'
print(get_clinical_response(prompt = user_prompt))

Real-World Data (RWD) refers to health-related data collected outside of conventional clinical trials, encompassing various sources such as electronic health records, claims data, and patient registries. Real-World Evidence (RWE) is the clinical evidence derived from the analysis of RWD, used to support regulatory decisions, inform treatment guidelines, and enhance understanding of healthcare outcomes in real-world settings.


In [8]:
# Another domain specific example
user_prompt = 'What was the year Next Generation Sequencing first released?'
print(get_clinical_response(prompt = user_prompt))

Next Generation Sequencing (NGS) technologies started to emerge in the mid-2000s. The first commercial NGS platform, developed by 454 Life Sciences, was released in 2005. This marked a significant advancement in genomics, enabling faster and cheaper sequencing of DNA compared to previous methods.


# 2. Tokenomics & Cost Engineering

In large-scale ETL pipelines (e.g., processing 500,000+ Electronic Health Records), API costs become a critical architectural constraint.

**Cost Dynamics:**
* **Asymmetry:** For `gpt-4o-mini`, output tokens are **4x more expensive** than input tokens ($0.60 vs $0.15 per 1M).
* **Optimization Strategy:**
    1.  **Input:** Use concise context; trim unnecessary whitespace/boilerplate.
    2.  **Output:** Enforce strict output schemas (JSON) to prevent the model from "yapping" (generating verbose, non-billable filler text).

The following script estimates the cost of a single transaction to forecast batch processing budgets.

In [9]:
# 1. Define model pricing (per 1M tokens) - Source: OpenAI pricing page
model_pricing = {
    'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
    'gpt-4o': {'input': 2.50, 'output': 10.00}
}

# Create function
def calculate_request_cost(usage, model: str = 'gpt-4o-mini') -> float:

    '''
    Process:
        Calculates the precise cost of an API call based on current pricing.
    Args:
        usage (openai.types.completion_usage.CompletionUsage): OpenAI object holding used token numbers.
        model (str): OpenAI model ID.
    Returns:
        float: Total price (price for input + price for output).
    '''

    # get the tariff
    prices = model_pricing.get(model, model_pricing['gpt-4o-mini'])

    # calculate price for input & output tokens
    input_price = (usage.prompt_tokens / 1_000_000) * prices['input']
    output_price = (usage.completion_tokens / 1_000_000) * prices['output']

    return input_price + output_price

In [10]:
# Execution
clinical_prompt = '''
Patient Report: 45-year-old male initiated on Metformin 500mg BID. 
Follow-up at 2 weeks: Patient reports mild gastrointestinal discomfort and nausea. 
No signs of lactic acidosis. Vitals stable.

Task: Extract any Adverse Drug Reactions (ADRs) mentioned. 
Format: List only.
'''

# Get response
response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    messages = [
        {
            'role': 'user',
            'content': clinical_prompt
        }
    ],
    temperature = 0
)

# Cost analysis
total_cost = calculate_request_cost(usage = response.usage,
                                    model = 'gpt-4o')

# Prints
print(f'Response text:\n{response.choices[0].message.content}')
print(f'Number of input tokens: {response.usage.prompt_tokens}')
print(f'Number of output tokens: {response.usage.completion_tokens}')
print(f'Single response cost: ${total_cost:.8f}')
print(f'Projected cost for 1M responses: ${total_cost * 1_000_000:,.2f}')


Response text:
- Mild gastrointestinal discomfort
- Nausea
Number of input tokens: 75
Number of output tokens: 9
Single response cost: $0.00027750
Projected cost for 1M responses: $277.50


# 3. Parameter Tuning: Control of Stochasticity & Randomness

The `temperature` parameter (0.0 to 2.0) controls the entropy of the token sampling distribution.

### Clinical Strategy
* **Deterministic ($T=0$):** Forces the model to select the highest probability token.
    * *Use Case:* Data Extraction, coding (ICD-10), structured JSON output.
* **Stochastic ($T > 0.7$):** Flattens the probability curve, introducing variance.
    * *Use Case:* Patient communication drafts, synthetic data generation, brainstorming.

> **Warning:** In GxP (Good Practice) environments, reproducibility is key. Unless generating creative content, always default to `temperature=0`.

In [6]:
def test_temperature(prompt: str, temp: float, iterations: int = 3):

    '''
    Process:
        Executes a prompt multiple times to empirically validate the impact of temperature on output stability.
        Serves as a unit test for 'Reproducibility'. It verifies whether
        the model behaves deterministically (at T=0) or introduces variance (at T>0.7),
        which is critical for validating GxP-compliant workflows.
    Args:
        prompt (str): The input query to be tested.
        temp (float): The sampling temperature (0.0 = deterministic, 2.0 = highly stochastic).
        iterations (int): Number of sequential API calls to perform (Default: 3).
    '''

    print(f'##### Testing temperature: {temp}')
    print('-' * 20)

    for i in range(iterations):
        response = client.chat.completions.create(
            model = 'gpt-4o-mini',
            temperature = temp,
            messages = [
                {
                    'role': 'user',
                    'content': prompt
                }
            ]
        )

        print(f'Iteration {i + 1}: {response.choices[0].message.content}')

Execution of the test function:

In [7]:
test_prompt = 'Complete this sentence in less than 5 words: "The most critical factor in drug discovery is..."'

# 1. Deterministic test (Stability - Less Variance)
test_temperature(test_prompt, temp = 0.0)

# 2. Stochastic test (Creativity - High Variance)
test_temperature(test_prompt, temp = 2.0)

##### Testing temperature: 0.0
--------------------
Iteration 1: "...target identification and validation."
Iteration 2: "...target identification and validation."
Iteration 3: "...target identification and validation."
##### Testing temperature: 2.0
--------------------
Iteration 1: effective target identification.
Iteration 2: "...validating target attructureained outcomes."
Iteration 3: "...utive scientific reliability."


# 4. Shot Prompting

LLMs perform significantly better when provided with exemplar pairs (Input -> Output) within the context window. This technique, known as **In-Context Learning**, steers the model's behavior without the need for parameter fine-tuning.

### Strategies
* **Zero-Shot:** Direct instruction with no prior examples. Useful for general knowledge tasks.
* **One-Shot:** Providing a single example to establish output format structure.
* **Few-Shot:** Providing multiple examples (3-5) to teach complex logic, nuance, and domain-specific taxonomy.

> **Use Case:** Classifying adverse events in unstructured clinical notes into standard toxicity grades (e.g., Mild, Moderate, Severe).
>
> **Scenario:** Classifying Patient Feedback into Toxicity Grades (1-3)<br>
> - 1: Mild (Does not interfere with daily activities)<br>
> - 2: Moderate (Interferes with daily activities)<br>
> - 3: Severe (Requires medical intervention)

In [10]:
# --- 1. Zero-Shot Prompting (No Guidance) ---

prompt_zero = '''
Classify the severity (Grade 1-3, (Mild, Moderate, Severe)) of the following patient reports:

1. "I feel a bit dizzy when I stand up fast."
2. "The nausea is so bad I can't go to work."
3. "My skin is peeling off and it burns like fire."
4. "Just a slight headache."
'''

print('##--- Zero-Shot Output ---##')
print(get_clinical_response(prompt_zero, temperature = 0.0))

##--- Zero-Shot Output ---##
Based on the provided patient reports, here is the classification of severity:

1. "I feel a bit dizzy when I stand up fast." - **Grade 1 (Mild)**
2. "The nausea is so bad I can't go to work." - **Grade 3 (Severe)**
3. "My skin is peeling off and it burns like fire." - **Grade 3 (Severe)**
4. "Just a slight headache." - **Grade 1 (Mild)**


In [11]:
# --- 2. Few-Shot Prompting (Pattern Injection) ---

prompt_few = '''
Classify the severity (Grade 1-3) of the patient reports based on functional impairment.

Examples:
- "I have a runny nose but I'm fine." -> Grade 1 (No interference)
- "I threw up all morning and missed my appointment." -> Grade 2 (Interferes with activity)
- "I had to go to the ER because I couldn't breathe." -> Grade 3 (Medical intervention)

Now classify these:
1. "I feel a bit dizzy when I stand up fast."
2. "The nausea is so bad I can't go to work."
3. "My skin is peeling off and it burns like fire."
4. "Just a slight headache."
'''

print('##--- Few-Shot Output ---##')
print(get_clinical_response(prompt_few, temperature = 0.0))

##--- Few-Shot Output ---##
Here are the classifications based on the provided examples:

1. "I feel a bit dizzy when I stand up fast." -> Grade 1 (No interference)
2. "The nausea is so bad I can't go to work." -> Grade 2 (Interferes with activity)
3. "My skin is peeling off and it burns like fire." -> Grade 3 (Medical intervention)
4. "Just a slight headache." -> Grade 1 (No interference)


In [12]:
# --- 3. One-Shot Prompting (Schema Enforcement) ---

# Use Case: We don't need to teach the model 'medicine',
# we just need to enforce a strict JSON format.

prompt_one = '''
Extract medication entities into a JSON object with keys: 'drug', 'strength', 'frequency'.

Example:
Input: "Patient takes 500mg Tylenol twice a day."
Output: {"drug": "Tylenol", "strength": "500mg", "frequency": "BID"}

Task:
Input: "Prescribed Lipitor 20mg every evening for cholesterol."
Output:
'''

print('##--- One-Shot Output ---##')
print(get_clinical_response(prompt_one, temperature = 0.0))

##--- One-Shot Output ---##
```json
{"drug": "Lipitor", "strength": "20mg", "frequency": "QHS"}
```


# 5. Context Architecture: Role Semantics

Effective prompt engineering relies on the structured interaction between three distinct roles within the `messages` array. Understanding these roles is critical for designing **Stateless** (Single-turn) vs. **Stateful** (Multi-turn) systems.

### The Roles Schema

* **1. System (`system`): Global Behavior & Guardrails**
    * Sets the metaprompt that persists throughout the session.
    * Defines the persona, output format, and safety constraints (e.g., *"You are a Clinical Decision Support System. Do not provide medical diagnoses."*).
    * *Engineering Note:* The system prompt is weighted heavily by the model to steer overall behavior.

* **2. User (`user`): The Signal**
    * Represents the external input, query, or instruction triggering the inference.

* **3. Assistant (`assistant`): Memory & Pattern Injection**
    * **Primary Use:** Stores prior model responses to simulate "Memory" in a conversational application.
    * **Advanced Use (Few-Shot):** Developers can "fake" assistant messages to provide ideal response examples (Input -> Output pairs) within the context window, without fine-tuning the model.

## 5.1 The System Role: Guardrails & Persona

The `system` message is the most powerful lever for steering model behavior. In Life Sciences applications, it could be primarily used for:

1.  **Tone Calibration:** Adjusting the complexity and empathy level based on the audience (e.g., Patient vs. Clinician).
2.  **Regulatory Guardrails:** Explicitly forbidding the model from performing restricted actions, such as making diagnoses or prescribing medication.

> **Engineering Note:** Strong system prompts are essential for **GxP compliance**, ensuring the AI does not hallucinate medical credentials.

In [13]:
# --- Scenario A: Tone Calibration (Patient-Centric) ---
# Goal: Explain a complex concept to a patient with empathy

system_persona = '''
You are an empathetic Clinical Care Coordinator. 
Your goal is to explain medical concepts to patients in simple, reassuring language.
Avoid technical jargon.
You explanation should not exceed 2 sentences.
'''

user_query = 'The doctor said I have "Idiopathic Pulmonary Fibrosis". It sounds scary. What is it?'

print('##--- Scenario A: Tone Calibration - Empathetic Explanation (Patient-Centric) ---##')

response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    temperature = 0.8, # for slighlty natural, hhuman tone
    messages = [
        {
            'role': 'system',
            'content': system_persona
        },
        {
            'role': 'user',
            'content': user_query
        }
    ]
)

print(f'Patient: {user_query}')
print(f'Bot: {response.choices[0].message.content}')

##--- Scenario A: Tone Calibration - Empathetic Explanation (Patient-Centric) ---##
Patient: The doctor said I have "Idiopathic Pulmonary Fibrosis". It sounds scary. What is it?
Bot: Idiopathic Pulmonary Fibrosis is a condition where the lungs become stiff and can make it harder to breathe, but it doesn't mean you're alone—there are treatments and support to help manage it. Your doctor and care team are here to help you with the best ways to take care of yourself moving forward.


In [14]:
# --- Scenario B: Regulatory Guardrails (Safety) ---
# Goal: Prevent the model from giving specific medical advice.

system_guardrail = '''
You are a Medical Information Assistant. 
You provide information based on package inserts.
CRITICAL RULE: You are NOT a doctor. DO NOT provide diagnosis or treatment advice. 
If asked for advice, respond with standard disclaimer:
"I cannot provide medical advice. Please consult your physician."
'''

risky_query = 'I have a sharp pain in my left chest. Should I take Aspirin?'

print('##--- Scenario B: Safety Boundary ---##')

response = client.chat.completions.create(
    model = 'gpt-4o-mini',
    temperature = 0.8,
    messages = [
        {
            'role': 'system',
            'content': system_guardrail
        },
        {
            'role': 'user',
            'content': risky_query
        }
    ]
)

print(f'Patient: {risky_query}')
print(f'Bot: {response.choices[0].message.content}')

##--- Scenario B: Safety Boundary ---##
Patient: I have a sharp pain in my left chest. Should I take Aspirin?
Bot: I cannot provide medical advice. Please consult your physician.


## 5.2 The Assistant Role: Few-Shot Pattern Injection

While the `assistant` role is primarily used for storing conversation history, it is also a powerful tool for **In-Context Learning**.

By pre-populating the context window with "fake" User-Assistant pairs, we can steer the model's output format and logic without explicit instructions. This technique could be more effective than complex System Prompts for enforcing **standardized nomenclature** (e.g., mapping symptoms to ICD-10 codes).

In [15]:
def normalize_diagnosis(raw_input: str) -> str:

    '''
    Process:
        Maps unstructured clinical notes to standardized ICD-10 codes
        using Few-Shot examples injected via the "assistant" role.
    Args:
        raw_input (str): user query.
    Returns:
        str: response message content.
    '''

    # 1. System: Define the task
    system_injection = '''
    You are a Medical Coding Assistant.
    Map the input description to the closest standard ICD-10 name and code.
    Output ONLY the code and name.
    '''

    # 2. Few-Shot history
    few_shot_examples = [
        {"role": "user", "content": "pt complains of chest pain"},
        {"role": "assistant", "content": "Chest pain, unspecified (R07.9)"},
        
        {"role": "user", "content": "type 2 diabetes with kidney issues"},
        {"role": "assistant", "content": "Type 2 diabetes mellitus with kidney complications (E11.2)"},
        
        {"role": "user", "content": "high bp"},
        {"role": "assistant", "content": "Essential (primary) hypertension (I10)"}
    ]

    # 3. Construct thhe final payload
    messages = [{'role': 'system', 'content': system_injection}]
    messages.extend(few_shot_examples)
    messages.append({'role': 'user', 'content': raw_input})

    # 4. Execute
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        temperature = 0, # this task requires deterministic output
        messages = messages
    )

    return response.choices[0].message.content

In [16]:
# --- Test cases ---
print(f'Input: "patient has a really bad headache"')
print(f'Output: {normalize_diagnosis("patient has a really bad headache")}\n')

print(f'Input: "feeling of heart racing"')
print(f'Output: {normalize_diagnosis("feeling of heart racing")}')

Input: "patient has a really bad headache"
Output: Headache, unspecified (R51)

Input: "feeling of heart racing"
Output: Palpitations (R00.2)


# 6. State Management: Handling Conversations

The OpenAI API is **stateless**. If a follow-up query about a patient mentioned in a previous request is sent, the model will not "remember" the patient unless you explicitely re-send the entire conversation history.

In **RWD (Real World Data)** applications, constructing a patient's timeline (Longitudinal History) requires maintaining a persistent `messages` list (Context Window) that grows with each interaction.

> **Note:** As the conversation grows, we consume more input tokens per request. In production, strategies like **"Context Window Sliding"** or **"Summarization"** are required to stay within token limits (e.g., 128k context).

In [17]:
# --- The Mechanics of State Management ---

# 1. Initialize an empty session (The Electronic Health Record - EHR Buffer)

messages_buffer = [
    {
        'role': 'system',
        'content': '''You are a Clinical Case Summarizer.
                      Do not exceed 3 sentences.
                      Maintain a professional tone.'''
    }
]

def update_and_chat(new_input: str):

    '''
    Process:
        Manually appends new input to the buffer.
        Sends the FULL buffer to the API.
        Appends the response back to the buffer.
    Args:
        new_input (str): New user input / query / prompt.
    '''

    # Append user input
    messages_buffer.append({'role': 'user', 'content': new_input})

    # Send thhe WHOLE history to the API
    response = client.chat.completions.create(
        model = 'gpt-4o-mini',
        temperature = 0,
        messages = messages_buffer
    )

    answer = response.choices[0].message.content

    # Append assistant response
    messages_buffer.append({'role': 'assistant', 'content': answer})

    return answer

In [18]:
print('##--- Step 1: Admission ---##')
print(update_and_chat('Patient: 65-year-old male admitted with dyspnea.'))
print('\n', '-' * 20)
# Model now knows age. gender and symptom.

print('##--- Step 2: Vitals ---##')
print(update_and_chat('BP: 140/90, HR: 110. SpO2: 88% on room air.'))
print('\n', '-' * 20)
# Model now knows vitals AND the previous admission info.

print('##--- Step 3: Retrieving From Memory ---##')
print(update_and_chat('Based on the vitals and symptoms, generate a 1-sentence triage assessment.'))
# We ask a question that requires knowledge from Step 1 and Step 2.

##--- Step 1: Admission ---##
A 65-year-old male was admitted to the hospital presenting with dyspnea. Further evaluation is required to determine the underlying cause of his respiratory distress. Management will be guided by the results of diagnostic tests and clinical findings.

 --------------------
##--- Step 2: Vitals ---##
The patient exhibits elevated blood pressure at 140/90 mmHg and tachycardia with a heart rate of 110 bpm. Additionally, oxygen saturation is low at 88% on room air, indicating potential respiratory compromise. Immediate intervention is necessary to address the hypoxemia and stabilize the patient's cardiovascular status.

 --------------------
##--- Step 3: Retrieving From Memory ---##
The patient presents with significant respiratory distress, evidenced by low oxygen saturation and tachycardia, necessitating urgent evaluation and intervention for potential acute respiratory failure.


In [19]:
# Inspecting the Memory
print('##--- Under the Hood: The Accumulated Context ---##')

for msg in messages_buffer:
    print(f'[{msg["role"].upper()}]: {msg["content"][:100]}...')

##--- Under the Hood: The Accumulated Context ---##
[SYSTEM]: You are a Clinical Case Summarizer.
                      Do not exceed 3 sentences.
               ...
[USER]: Patient: 65-year-old male admitted with dyspnea....
[ASSISTANT]: A 65-year-old male was admitted to the hospital presenting with dyspnea. Further evaluation is requi...
[USER]: BP: 140/90, HR: 110. SpO2: 88% on room air....
[ASSISTANT]: The patient exhibits elevated blood pressure at 140/90 mmHg and tachycardia with a heart rate of 110...
[USER]: Based on the vitals and symptoms, generate a 1-sentence triage assessment....
[ASSISTANT]: The patient presents with significant respiratory distress, evidenced by low oxygen saturation and t...


# 7. A Simple Production Pattern

While functional scripts are sufficient for ad-hoc analysis, production systems (e.g., Streamlit Dashboards, FastAPI backends) require a robust **Object-Oriented** architecture.

We encapsulate the logic into a `ClinicalChatbot` class to solve three engineering challenges:

1.  **Encapsulation:** Bundling configuration (Model ID, Temperature) with State (Conversation History).
2.  **Scalability:** Allowing multiple independent agent instances (e.g., one for Oncology, one for Cardiology) to run simultaneously without state collision.
3.  **Resilience:** Centralized error handling and token usage monitoring.

In [20]:
# Imports
from typing import List, Dict, Optional

class ClinicalChatBot:

    '''
    A production-ready wrapper for OpenAI API interaction, designed for
    maintaining conversational state in clinical decision support workflow.
    '''

    def __init__(self,
                 system_role: str,
                 model: str = 'gpt-4o-mini',
                 temperature: float = 0.0):
        
        ''' 
        Process:
            Initializes the chatbot with a specific persona and configuration.
        Args:
            system (str): The meta-prompt defining behaviour and guardrails.
            model (str): Target inference engine. Defaults to cost-efficient "gpt-4o-mini"
            temperature (float): Determinism factor.
        '''

        self.client = OpenAI()
        self.model = model
        self.temperature = temperature
        self.system_role = system_role
        self.history: List[Dict[str, str]] = []

        # Initialize memory
        self.reset_memory()

    # Method: Reset memory 
    def reset_memory(self):
        '''
        Resets conversation history to the initial system state.
        '''
        self.history = [{'role': 'system', 'content': self.system_role}]

    # Method: Chat
    def chat(self,
             user_input: str,
             verbose: bool = False) -> str:
        
        '''
        Process:
            Receives and processes a user query, updates internal state,
            and returns the model response
        Args:
            user_input (str): The clinical query or data payload.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            str: The generated response text.
        '''

        # 1. State Update: Add user input
        self.history.append({'role': 'user', 'content': user_input})

        try:
            # 2. API Execution
            response = client.chat.completions.create(
                model = self.model,
                temperature = self.temperature,
                messages = self.history
            )

            # 3. Extraction
            answer = response.choices[0].message.content

            # 4. State Update: Add Assistance Response
            self.history.append({'role': 'assistant', 'content': answer})

            # 5. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'Metrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return answer
        
        except Exception as e:
            # In production this should log to a monitoring service
            return f'API ERROR: {str(e)}'

In [21]:
# --- Implementation ----

# 1. Instantiate an Oncology Specialist ChatBot
system_role_definition = ''' 
You are an Oncology Assistant.
Summarize patient history in 3 sentences, focusing on tumor markers and progression.
'''

oncology_bot = ClinicalChatBot(
    system_role = system_role_definition,
    temperature = 0.2
)

In [22]:
# 2. Simulate a Multi-Turn Conversation
print('##--- Oncology Session Started ---##')

# Turn 1. Providing Context
input_1 = 'Patient 45M, Stage III NSCLC. PD-L1 expression >50%.'
print(f'User: {input_1}')
response = oncology_bot.chat(input_1, verbose = True)
print(f'ChatBot: {response}')

##--- Oncology Session Started ---##
User: Patient 45M, Stage III NSCLC. PD-L1 expression >50%.
Metrics | In: 52 | Out: 61
ChatBot: The patient is a 45-year-old male diagnosed with Stage III non-small cell lung cancer (NSCLC). Tumor markers indicate a PD-L1 expression greater than 50%, suggesting a potential for immunotherapy treatment options. There is no additional information provided regarding the progression of the disease or treatment history.


In [23]:
# Turn 2. Asking a Recommendation
input_2 = 'Based on the expression level, what is the first-line therapy?'
print(f'User: {input_2}')
response_2 = oncology_bot.chat(input_2, verbose = True)
print(f'ChatBot: {response_2}')

User: Based on the expression level, what is the first-line therapy?
Metrics | In: 134 | Out: 68
ChatBot: Based on the PD-L1 expression level greater than 50%, the first-line therapy for this patient with Stage III non-small cell lung cancer (NSCLC) would typically be pembrolizumab (Keytruda) in combination with chemotherapy. This approach leverages the high PD-L1 expression to enhance the immune response against the tumor.


# 8. Use Case 1: Automated Pharmacovigilance (AE) Extraction

## 8.1. The Business Challenge
In the pharmaceutical industry, **Pharmacovigilance (PV)** teams manually review thousands of unstructured documents (doctor emails, call center logs, social media mentions) daily to identify **Adverse Events (AEs)**. This manual process is:
* **Costly:** Requires highly trained medical professionals.
* **Slow:** Creates bottlenecks in safety reporting compliance.
* **Prone to Error:** Fatigue can lead to missed safety signals.

## 8.2. The Solution: Structure-Aware GenAI
Instead of generic summarization, we deploy the `ClinicalChatBot` in **Extraction Mode**. By enforcing a strict JSON schema, we transform unstructured clinical narratives into structured rows compatible with databases (e.g., SQL, OMOP-CDM) or regulatory forms (e.g., CIOMS, MedWatch).

### Key Technical features:
1.  **Schema Enforcement:** The model is constrained to output specific fields (Drug, Severity, Action).
2.  **Noise Reduction:** Irrelevant conversational filler is ignored.
3.  **Standardization:** Mapping vague terms (e.g., "really bad rash") to standard grades (e.g., "Severe").

In [24]:
# We need JSON package for thıs use case
import json
from typing import List, Dict, Optional, Any

# Define the clinicalChatBOT class
class ClinicalChatBot:

    '''
    A production-ready wrapper for OpenAI API interaction, designed for
    maintaining conversational state in clinical decision support workflow.
    '''

    def __init__(self,
                 system_role: str,
                 temperature: float = 0.0,
                 openai_model: str = 'gpt-4o'):
        
        ''' 
        Process:
            Initializes the chatbot with a specific persona and configuration.
        Args:
            system (str): The meta-prompt defining behaviour and guardrails.
            openai_model (str): Target inference engine. Defaults to "gpt-4o" for JSON jobs.
            temperature (float): Determinism factor.
        '''
        
        self.client = OpenAI()
        self.model = openai_model
        self.temperature = temperature
        self.system_role = system_role
        self.history: List[Dict[str, str]] = []

        # Initiate an empty memory
        self.reset_memory()

    #####
    # METHOD: Memory Reset
    def reset_memory(self):
        '''
        Resets conversation history to the initial system state.
        '''
        self.history = [{'role': 'system', 'content': self.system_role}]

    #####
    # METHOD: Chat
    def chat(self,
             user_input: str,
             verbose: bool = False) -> str:
        
        '''
        Process:
            Receives and processes a user query, updates internal state,
            and returns the model response.
        Args:
            user_input (str): The clinical query or data payload.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            str: The generated response text.
        '''

        # 1. State Update: Add user input
        self.history.append({'role': 'user', 'content': user_input})

        try:
            # 2. API Execution
            response = client.chat.completions.create(
                model = self.model,
                temperature = self.temperature,
                messages = self.history
            )

            # 3. Extraction
            answer = response.choices[0].message.content

            # 4. State Update: Add Assistance Response
            self.history.append({'role': 'assistant', 'content': answer})

            # 5. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'Metrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return answer
        
        except Exception as e:
            # In production this should log to a monitoring service
            return f'API ERROR: {str(e)}'
    
    #####
    # METHOD: Extract Structured Data
    def extract_structured_data(self,
                                clinical_note: str,
                                schema: Dict[str, Any],
                                verbose: bool = False) -> Dict[str, Any]:
        
        '''
        Process:
            Extracts specific clinical entities from unstructured text into a valid JSON object.
            This is crucial for downstream tasks like database entry or OMOP-CDM mapping.
        Args:
            clinical_note (str): The raw text (e.g., doctor's note).
            schema (Dict): A description of the JSON structure expected.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            Dict: Parsed JSON object containing the extracted data.
        Design Note:
            1. This method is STATELESS and uses a specialized system prompt.
                It ignores the conversation history (self.history) to ensure 
                pure extraction without 'context pollution' or hallucination 
                from previous chat turns.
            2. Determinism: Forces 'temperature=0.0' regardless of the instance setting 
                to ensure consistent, reproducible data extraction (Creativity = 0).
        '''

        # 1. Construct a system prompt
        extraction_system_prompt = '''You are a Clinical Data Extractor
                                        Output strictly in JSON format.'''
        
        # 2. Construct a specific extraction prompt for extraction
        extraction_user_prompt = f'''Analyze  the following clinical note
                                        and extract data strictly adhering this structure:
                                
                                        {json.dumps(schema, indent = 2)}

                                        Return ONLY the JSON object.
                                        Do not add any conversational text.
                                
                                        ### Clinical Note:
                                        {clinical_note}''' 
        
        try:
            # 3. Construct initial messages
            messages = [
                {'role': 'system', 'content': extraction_system_prompt},
                {'role': 'user', 'content': extraction_user_prompt}
            ]

            # 4. Get response
            response = client.chat.completions.create(
                model = self.model,
                temperature = 0.0, 
                messages = messages,
                response_format = {'type': 'json_object'}
            )

            # 5. Extract the string response
            json_str = response.choices[0].message.content

            #6. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'Metrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return json.loads(json_str)
        
        # Potential exceptions
        except json.JSONDecodeError:
            print(f'ERROR!: Failed to parse model output as JSON!')
        except Exception as e:
            print(f'ERROR!: Extraction failed!: {str(e)}')


In [25]:
# --- Implementation: Real-World Pharmacovigilance Case ---

# 1. Define the Scenario
ae_schema = {
                'patient_demographics': {'age': 'int or null', 'gender': 'str or null'},
                'drug_information': {'drug_name': 'str', 'dosage': 'str', 'duration': 'str'},
                'adverse_events': [
                    {'event_name': 'str', 'severity': 'Mild/Moderate/Severe',
                     'causality_likelihood': 'Low/Medium/High'}
                ],
                'action_taken': 'str'
}

# 2. Instantiate (Role is mainly for chat, but we can use the class for extraction too)
pv_bot = ClinicalChatBot(system_role = '''You are a Drug Safety Assistant.
                                          Summarize the case in the sense of safety in 3 sentences.''',
                         openai_model = 'gpt-4o-mini',
                         temperature = 0.4)

print('##--- Pharmacovigilance Extraction Module ---##')

# 3. The Unstructured Raw Note (Doctor's Email)
raw_note = '''
Subject: Urgent - Patient Report
Hi team, saw a 64-year-old male today. He's been on Keytruda (pembrolizumab) 200mg every 3 weeks for the last 2 months. 
Yesterday he developed a Grade 3 skin rash covering his torso and reported severe itching. 
We suspected immune-mediated dermatitis. 
We paused the Keytruda and started him on systemic corticosteroids immediately.
'''

print(f"\n--- Raw Clinical Note ---\n{raw_note.strip()}\n")

# 4. Use the class in chat mode
print('\n\n##### Chat Mode -----')
print(pv_bot.chat(user_input = raw_note,
                  verbose = True))

# 5. Use the class in extraction mode
print('\n\n##### Extraction Mode -----')
print(pv_bot.extract_structured_data(clinical_note = raw_note,
                                     schema = ae_schema,
                                     verbose = True))

##--- Pharmacovigilance Extraction Module ---##

--- Raw Clinical Note ---
Subject: Urgent - Patient Report
Hi team, saw a 64-year-old male today. He's been on Keytruda (pembrolizumab) 200mg every 3 weeks for the last 2 months. 
Yesterday he developed a Grade 3 skin rash covering his torso and reported severe itching. 
We suspected immune-mediated dermatitis. 
We paused the Keytruda and started him on systemic corticosteroids immediately.



##### Chat Mode -----
Metrics | In: 123 | Out: 68
A 64-year-old male patient on Keytruda (pembrolizumab) for two months developed a Grade 3 skin rash and severe itching, indicating a potential immune-mediated dermatitis. The treatment with Keytruda was paused to prevent further complications. Systemic corticosteroids were initiated to manage the adverse reaction and ensure patient safety.


##### Extraction Mode -----
Metrics | In: 270 | Out: 150
{'patient_demographics': {'age': 64, 'gender': 'male'}, 'drug_information': {'drug_name': 'Keytruda', '

In [26]:
# Another doctor note
raw_clinical_note = '''
Subject: URGENT REPORT re: Patient J. Doe
Date: Oct 24, 2024

Hi Safety Team,
Reporting a serious case. 64yo male patient started taking Keytruda (pembrolizumab) 
200mg IV q3w about two months ago for NSCLC.

Yesterday he presented to the ER with Grade 3 Colitis and severe diarrhea (7+ stools/day). 
We also noted some mild fatigue but that might be unrelated.
We have permanently discontinued the Keytruda and started high-dose steroids.
Patient is currently hospitalized but stable.
Please confirm receipt.
Dr. House
'''

print(pv_bot.extract_structured_data(clinical_note = raw_clinical_note,
                                     schema = ae_schema,
                                     verbose = True))

Metrics | In: 309 | Out: 190
{'patient_demographics': {'age': 64, 'gender': 'male'}, 'drug_information': {'drug_name': 'Keytruda', 'dosage': '200mg IV q3w', 'duration': 'about two months'}, 'adverse_events': [{'event_name': 'Colitis', 'severity': 'Severe', 'causality_likelihood': 'High'}, {'event_name': 'Diarrhea', 'severity': 'Severe', 'causality_likelihood': 'High'}, {'event_name': 'Fatigue', 'severity': 'Mild', 'causality_likelihood': 'Low'}], 'action_taken': 'Permanently discontinued Keytruda and started high-dose steroids'}


# 9. Use Case 2: PII Detection & Redaction

## 9.1. Purpose
To implement a **"Privacy by Design"** architecture that automatically sanitizes unstructured clinical text before it leaves the secure local environment. The goal is to ensure no **Personally Identifiable Information (PII)** or **Protected Health Information (PHI)** is transmitted to external LLM providers (e.g., OpenAI), adhering to data minimization principles.

## 9.2. The Challenge
* **Regulatory Liability:** strict regulations like **GDPR, HIPAA, and KVKK** impose severe penalties for data leaks.
* **Unstructured Chaos:** Clinical notes contain PII in unpredictable formats (e.g., "Patient visited Dr. House at Princeton-Plainsboro").
* **Context Sensitivity:** Simple rule-based systems (Regex) often fail to distinguish between a patient's name (e.g., "Rose") and a common noun (e.g., "rose" the flower) or a medical condition (e.g., "Parkinson" the disease vs. "Parkinson" the surname).

## 9.3. Business Value
1.  **Risk Mitigation:** Drastically reduces the risk of data breaches and associated legal fines.
2.  **Compliance Acceleration:** Facilitates faster approval from Legal/Compliance teams for AI projects by demonstrating proactive data safety.
3.  **Trust:** Ensures client and patient trust by verifying that sensitive data is handled with "Zero-Trust" architecture.

## 9.4. Technical Approach: Classical NLP Shield
Instead of relying solely on pattern matching, we integrate a **Named Entity Recognition (NER)** layer using the `spaCy` library.
* **Mechanism:** The system scans the input text locally.
* **Detection:** Identifies entities such as `PERSON`, `ORG` (Organization), and `GPE` (Location) contextually.
* **Action:** Replaces these entities with generic placeholders (e.g., `[PERSON_REDACTED]`) *before* the API call is made.

### Strategic Insight:
Why are we using a traditional library like `spaCy` in a GenAI notebook?

In the era of Large Language Models, older NLP pipelines are often overlooked, yet they remain critical for **Cost-Effective Privacy Architecture**.

1.  **The "Local LLM" Bottleneck:** Running a capable LLM locally (to keep data private) often requires expensive, high-end GPU infrastructure, which is not feasible for every edge case.
2.  **The "Cloud LLM" Risk:** Asking a public model (e.g., GPT-4) to "please redact this name" creates a paradox: you must expose the sensitive data to the cloud just to ask for it to be hidden.
3.  **The Hybrid Solution:** Classic NLP models (NER) are lightweight, CPU-friendly, and run strictly locally. They act as the **"Gatekeeper,"** handling sensitive tasks "for free" (computationally) before the heavy lifting is handed over to the expensive GenAI models.

**Maxim:** Use Classic NLP for *Precision & Privacy*. Use GenAI for *Reasoning & Generation*.

> **!NOTE ON ENTERPRISE PRODUCTION:**
> This notebook demonstrates a fundamental PII scrubbing technique using an off-the-shelf NER model. In a real-world enterprise production environment, PII detection requires a **"Defense in Depth" (Swiss Cheese Model)** strategy, which typically includes:
> * **Allow-Lists:** Whitelisting medical terms (ICD-10, RxNorm) to prevent "over-scrubbing" (e.g., not redacting "Parkinson's Disease").
> * **Deterministic Rules:** Using strict Regex for TCKN, SSN, Credit Card, and Phone numbers.
> * **Confidence Thresholds:** Flagging low-confidence predictions for **Human-in-the-Loop** review.
> * **Frameworks:** Utilizing robust libraries like **Microsoft Presidio** to manage these multi-layered policies.

In [27]:
import json
from typing import List, Dict, Optional, Any
# Import new library
import spacy
import re

# Define the clinicalChatBOT class
class ClinicalAssistant:

    '''
    A production-ready wrapper for OpenAI API interaction, designed for
    maintaining conversational state in clinical decision support workflow.
    '''

    def __init__(self,
                 system_role: str,
                 temperature: float = 0.0,
                 openai_model: str = 'gpt-4o'):
        
        ''' 
        Process:
            Initializes the chatbot with a specific persona and configuration.
        Args:
            system (str): The meta-prompt defining behaviour and guardrails.
            openai_model (str): Target inference engine. Defaults to "gpt-4o" for JSON jobs.
            temperature (float): Determinism factor.
        '''
        
        self.client = OpenAI()
        self.model = openai_model
        self.temperature = temperature
        self.system_role = system_role
        self.history: List[Dict[str, str]] = []

        # Spacy cold start
        try:
            self.nlp = spacy.load('en_core_web_sm', disable = ['parser', 'tagger'])
            print('For PII detection, NLP module loaded successfully!')
        except Exception as e:
            print(f'ERROR for NLP module!: {str(e)}')
            raise e

        # Initiate an empty memory
        self.reset_memory()

    #####
    # METHOD: Memory Reset
    def reset_memory(self):
        '''
        Resets conversation history to the initial system state.
        '''
        self.history = [{'role': 'system', 'content': self.system_role}]

    ####
    # METHOD: SCRUB PII
    def _scrub_pii(self,
                   text: str) -> str:
        
        '''
        Internal Utility:
            Uses Named Entity Recognition (NER) to detect and redact sensitive 
            entities (Names, Organizations, Locations) contextually using spaCy.
        '''

        # 1. NLP-NER based PII reduction
        doc = self.nlp(text)
        scrubbed_text = text

        for ent in reversed(doc.ents):
            if ent.label_ in ['PERSON', 'ORG', 'GPE']:
                replacement = f'[{ent.label_}_REDACTED]'
                scrubbed_text = scrubbed_text[:ent.start_char] + replacement + scrubbed_text[ent.end_char:]

        # 2. Regex based PII detection
        regex_patterns = {
            r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b': '[EMAIL_REDACTED]', # Email
            r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE_REDACTED]', # US Phone Format (Generic)
        }

        for pattern, replacement in regex_patterns.items():
            scrubbed_text = re.sub(pattern, replacement, scrubbed_text)

        return scrubbed_text
    
    #####
    # METHOD: Chat
    def chat(self,
             user_input: str,
             verbose: bool = False) -> str:
        
        '''
        Process:
            Receives and processes a user query, updates internal state,
            and returns the model response.
        Args:
            user_input (str): The clinical query or data payload.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            str: The generated response text.
        '''

        # 0. Security
        clean_input = self._scrub_pii(user_input)
        if verbose and clean_input != user_input:
            print('\nWARNING!: PII Detected & Redacted. Sending clean version to API.')

        # 1. State Update: Add clean user input
        self.history.append({'role': 'user', 'content': clean_input})

        try:
            # 2. API Execution
            response = self.client.chat.completions.create(
                model = self.model,
                temperature = self.temperature,
                messages = self.history
            )

            # 3. Extraction
            answer = response.choices[0].message.content

            # 4. State Update: Add Assistance Response
            self.history.append({'role': 'assistant', 'content': answer})

            # 5. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'\nMetrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return answer
        
        except Exception as e:
            # In production this should log to a monitoring service
            return f'\nAPI ERROR: {str(e)}'
    
    #####
    # METHOD: Extract Structured Data
    def extract_structured_data(self,
                                clinical_note: str,
                                schema: Dict[str, Any],
                                verbose: bool = False) -> Dict[str, Any]:
        
        '''
        Process:
            Extracts specific clinical entities from unstructured text into a valid JSON object.
            This is crucial for downstream tasks like database entry or OMOP-CDM mapping.
        Args:
            clinical_note (str): The raw text (e.g., doctor's note).
            schema (Dict): A description of the JSON structure expected.
            verbose (bool): If True, prints token usage metrics.
        Returns:
            Dict: Parsed JSON object containing the extracted data.
        Design Note:
            1. This method is STATELESS and uses a specialized system prompt.
                It ignores the conversation history (self.history) to ensure 
                pure extraction without 'context pollution' or hallucination 
                from previous chat turns.
            2. Determinism: Forces 'temperature=0.0' regardless of the instance setting 
                to ensure consistent, reproducible data extraction (Creativity = 0).
        '''

        # 1. Construct a system prompt
        extraction_system_prompt = '''You are a Clinical Data Extractor
                                        Output strictly in JSON format.'''
        
        # 2. Construct a specific extraction prompt for extraction
        extraction_user_prompt = f'''Analyze  the following clinical note
                                        and extract data strictly adhering this structure:
                                
                                        {json.dumps(schema, indent = 2)}

                                        Return ONLY the JSON object.
                                        Do not add any conversational text.
                                
                                        ### Clinical Note:
                                        {clinical_note}''' 
        
        try:
            # 3. Construct initial messages
            messages = [
                {'role': 'system', 'content': extraction_system_prompt},
                {'role': 'user', 'content': extraction_user_prompt}
            ]

            # 4. Get response
            response = self.client.chat.completions.create(
                model = self.model,
                temperature = 0.0, 
                messages = messages,
                response_format = {'type': 'json_object'}
            )

            # 5. Extract the string response
            json_str = response.choices[0].message.content

            #6. Telemetry (Optional)
            if verbose:
                usage = response.usage
                print(f'Metrics | In: {usage.prompt_tokens} | Out: {usage.completion_tokens}')

            return json.loads(json_str)
        
        # Potential exceptions
        except json.JSONDecodeError:
            print(f'ERROR!: Failed to parse model output as JSON!')
        except Exception as e:
            print(f'ERROR!: Extraction failed!: {str(e)}')


In [28]:
# --- Demo: PII Protected Chat

# 1. Instantiate the bot
secure_assistant = ClinicalAssistant(system_role = '''You are a Medical Assistant.
                                              You provide friendly but short answers.
                                              If there is no question, summarize the input in one sentence.''',
                               openai_model = 'gpt-4o-mini',
                               temperature = 0.2)

# 2. A risky note
risky_note = '''
Patient John Doe (Triage ID: 998877) admitted to Princeton-Plainsboro Hospital 
with severe chest pain. He works at Globex Corporation in New York.
Dr. House requests an immediate cardiology consult.
Patient's phone number is 1-212-456-7890 and his mail is 
doe_john@supermail.com
'''

print(f"\n[Input]:\n{risky_note.strip()}")

# 3. Chat execution
response = secure_assistant.chat(risky_note, verbose = True)
print(f'\nAssistant Response: {response}')

For PII detection, NLP module loaded successfully!

[Input]:
Patient John Doe (Triage ID: 998877) admitted to Princeton-Plainsboro Hospital 
with severe chest pain. He works at Globex Corporation in New York.
Dr. House requests an immediate cardiology consult.
Patient's phone number is 1-212-456-7890 and his mail is 
doe_john@supermail.com


Metrics | In: 135 | Out: 12

Assistant Response: Patient admitted with severe chest pain; cardiology consult requested.


In [29]:
# 4. Audit: What did the Assistant actually see? (Memory Inspection)
print("\n[Audit Log - What is stored in Memory?]:")
last_user_msg = secure_assistant.history[-2]['content']
print(f"--> {last_user_msg}")

# 5. Validation Check
if "John Doe" not in last_user_msg and "New York" not in last_user_msg:
    print("\nSUCCESS: Sensitive data was redacted before API transmission.")
else:
    print("\nFAILURE: Sensitive data leaked to memory!")


[Audit Log - What is stored in Memory?]:
--> 
Patient [PERSON_REDACTED] (Triage ID: 998877) admitted to [ORG_REDACTED] 
with severe chest pain. He works at Globex Corporation in [GPE_REDACTED].
Dr. [ORG_REDACTED] requests an immediate cardiology consult.
[PERSON_REDACTED]'s phone number is 1-[PHONE_REDACTED] and his mail is 
[EMAIL_REDACTED]


SUCCESS: Sensitive data was redacted before API transmission.


# 10. Conclusion & Next Steps

We have successfully evolved this project from simple, stateless API scripts into a **Production-Grade Clinical Assistant Module**.

This notebook serves as a foundational architecture, demonstrating how to wrap raw LLM capabilities into a managed, object-oriented system that respects the strict requirements of the Healthcare domain.

## 10.1. Summary of Achievements
* **Encapsulated Logic:** We moved away from "spaghetti code" to a modular `ClinicalChatBot` and `ClinicalAssistant` classes that manage their own states, configurations, and error handlings.
* **Hybrid Intelligence:** We combined the reasoning power of GenAI (GPT-4) with the precision of Classic NLP (spaCy) and regex.
* **Value Generation:** Demonstrated how to transform unstructured narratives into structured assets (JSON Extraction) for Pharmacovigilance.

## 10.2. Critical Note on Privacy (The "Swiss Cheese" Reality)
While our `_scrub_pii` method successfully demonstrates a local privacy gatekeeper using **NER (Named Entity Recognition)**, it represents only the *first line of defense*.

In a real-world **GxP (Good Practice)** environment, liability is zero-tolerance. A production system would require a **"Defense in Depth"** strategy:
1.  **Allow-Lists:** Whitelisting medical terms to prevent over-redaction.
2.  **Confidence Thresholds:** Flagging low-confidence entities for human review.
3.  **Enterprise Frameworks:** Utilizing tools like **Microsoft Presidio** or **AWS Comprehend Medical** for certified compliance.
*However, this notebook proves the core architectural concept: **Sensitive data must never leave the local environment in its raw form.***

## 10.3. Architectural Analysis: Assistant vs. Agent
Technically, we have built a **"Deterministic Assistant"**, not a fully autonomous "Agent".
* **Assistant:** Follows a strict code path (If user wants extract -> run extraction). Control remains with the human engineer.
* **Agent:** The LLM decides which tools to use and when (e.g., "I should query the database now").

**Domain Insight:** In high-stakes domains like Pharma and Clinical Care, fully autonomous agents are currently viewed with caution. A "hallucinating" agent that decides to skip a safety check is a regulatory nightmare. Therefore, our **Human-in-the-Loop** architecture—where the AI reasons but the code controls the flow—is often the preferred pattern for immediate enterprise adoption.

## 10.4. Future Roadmap
To elevate this MVP to a full Clinical Decision Support System (CDSS), the next steps are:
1.  **RAG (Retrieval-Augmented Generation):** Connecting the bot to a vector database containing private guidelines (PDFs, Protocols) to reduce hallucinations and ground answers in validated facts.
2.  **Evaluation Pipeline:** Implementing automated testing (using framework like `Ragas` or `DeepEval`) to score the clinical accuracy of responses.
3.  **Voice Interface:** Adding Whisper API for voice-to-text to support hands-free usage for surgeons/doctors.

## 10.5. Note on Memory & Token Optimization (Out of Scope)

This project deliberately focuses on the **core architectural pattern** of a clinical conversational engine:  
a stateful “trust layer” with AE extraction and a local privacy gatekeeper.

In a real production environment, however, **long-running conversations** introduce an additional challenge:  
as conversation history grows, both **cost** and **latency** increase due to higher token usage.

To keep this notebook focused and readable, we **did not implement** advanced memory / token optimization strategies here.  
Instead, you should think of the current `messages`-based state management as the **baseline** on top of which such techniques can be added.

Typical production-grade strategies include:

1. **Sliding-Window Context**  
   Keeping only the most relevant recent turns in the prompt, while discarding or compressing older ones.

2. **Summarization-Based Memory**  
   Periodically summarizing earlier parts of the conversation and storing those summaries instead of the full, raw history.

3. **Retention Rules & Domain-Aware Pruning**  
   Applying project-specific policies (e.g., keep all AE-related content, drop small talk) to control what stays in memory.

If you want to explore **concrete implementations** of these patterns (sliding windows, summarization, retention rules, etc.),  
please refer to the companion notebook:

**▶︎ [Memory Optimization Patterns for Clinical Chatbots](<INSERT_LINK_HERE>)**
