<a href="https://colab.research.google.com/github/vmatam7/hds5210/blob/main/midterm/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [37]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [38]:
def calculate_bmi(weight_kg, height_m):
    """
    Calculate Body Mass Index (BMI).

    Parameters:
    weight_kg (float): Weight in kilograms.
    height_m (float): Height in meters.

    Returns:
    float: Calculated BMI.

    Raises:
    ValueError: If weight or height is non-positive.

    Examples:
    >>> calculate_bmi(70, 1.75)
    22.857142857142858
    >>> calculate_bmi(0, 1.75)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive values.
    >>> calculate_bmi(70, 0)
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be positive values.
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive values.")

    return weight_kg / (height_m ** 2)
bmi = calculate_bmi(70, 1.75)
print(f"BMI: {bmi:.2f}")

BMI: 22.86


### Step 2: Calculate BODE Score

In [39]:
def calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters):
    """
    Calculate the BODE score and risk.

    Parameters:
    fev_pct (float): FEV1 % predicted.
    dyspnea_description (str): Dyspnea description ('none', 'mild', 'moderate', 'severe').
    distance_in_meters (float): Distance walked in 6 minutes.

    Returns:
    tuple: (BODE Score, BODE Risk)

    Raises:
    ValueError: If inputs are invalid.

    Examples:
    >>> calculate_bode_score(60, 'none', 200)
    (1, 0.5)
    >>> calculate_bode_score(30, 'severe', 100)
    (7, 0.125)
    >>> calculate_bode_score(40, 'unknown', 200)
    Traceback (most recent call last):
        ...
    ValueError: Invalid input values for FEV1, distance, or dyspnea description.
    """
    # Dyspnea scoring scale
    dyspnea_scale = {
        'none': 0,
        'mild': 1,
        'moderate': 2,
        'severe': 3
    }

    if fev_pct < 0 or distance_in_meters < 0 or dyspnea_description not in dyspnea_scale:
        raise ValueError("Invalid input values for FEV1, distance, or dyspnea description.")

    # FEV1 score
    if fev_pct >= 65:
        fev_score = 0
    elif 50 <= fev_pct < 65:
        fev_score = 1
    elif 36 <= fev_pct < 50:
        fev_score = 2
    else:
        fev_score = 3

    # Dyspnea score
    dyspnea_score = dyspnea_scale[dyspnea_description]

    # Distance score
    if distance_in_meters >= 350:
        distance_score = 0
    elif 250 <= distance_in_meters < 350:
        distance_score = 1
    elif 150 <= distance_in_meters < 250:
        distance_score = 2
    else:
        distance_score = 3

    # Total BODE score
    bode_score = fev_score + dyspnea_score + distance_score

    # BODE risk: simplified as 1 / (1 + score)
    bode_risk = 1 / (1 + bode_score)

    return bode_score, bode_risk
bode_score, bode_risk = calculate_bode_score(40, 'moderate', 200)
print(f"BODE Score: {bode_score}, BODE Risk: {bode_risk:.2f}")


BODE Score: 6, BODE Risk: 0.14


### Step 3: Calculate BODE Risk

In [40]:
def calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters):
    """
    Calculate the BODE score and risk.

    Parameters:
    fev_pct (float): FEV1 % predicted.
    dyspnea_description (str): Dyspnea description ('none', 'mild', 'moderate', 'severe').
    distance_in_meters (float): Distance walked in 6 minutes.

    Returns:
    tuple: (BODE Score, BODE Risk)

    Raises:
    ValueError: If inputs are invalid.

    Examples:
    >>> calculate_bode_score(60, 'none', 200)
    (1, 0.5)
    >>> calculate_bode_score(30, 'severe', 100)
    (7, 0.125)
    >>> calculate_bode_score(40, 'unknown', 200)
    Traceback (most recent call last):
        ...
    ValueError: Invalid input values for FEV1, distance, or dyspnea description.
    """
    # Dyspnea scoring scale
    dyspnea_scale = {
        'none': 0,
        'mild': 1,
        'moderate': 2,
        'severe': 3
    }

    if fev_pct < 0 or distance_in_meters < 0 or dyspnea_description not in dyspnea_scale:
        raise ValueError("Invalid input values for FEV1, distance, or dyspnea description.")

    # FEV1 score
    if fev_pct >= 65:
        fev_score = 0
    elif 50 <= fev_pct < 65:
        fev_score = 1
    elif 36 <= fev_pct < 50:
        fev_score = 2
    else:
        fev_score = 3

    # Dyspnea score
    dyspnea_score = dyspnea_scale[dyspnea_description]

    # Distance score
    if distance_in_meters >= 350:
        distance_score = 0
    elif 250 <= distance_in_meters < 350:
        distance_score = 1
    elif 150 <= distance_in_meters < 250:
        distance_score = 2
    else:
        distance_score = 3

    # Total BODE score
    bode_score = fev_score + dyspnea_score + distance_score

    # BODE risk: simplified as 1 / (1 + score)
    bode_risk = 1 / (1 + bode_score)

    return bode_score, bode_risk
# Example Usage
fev_pct = 40  # FEV1 % predicted
dyspnea_description = 'moderate'  # Dyspnea level
distance_in_meters = 200  # Distance walked

bode_score, bode_risk = calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters)
print(f"BODE Score: {bode_score}, BODE Risk: {bode_risk:.4f}")


BODE Score: 6, BODE Risk: 0.1429


### Step 4: Load Hospital Data

In [41]:
import csv

def load_patient_data(input_file):
    """
    Load patient data from a CSV file.

    Parameters:
    input_file (str): Path to the input CSV file.

    Returns:
    list: List of dictionaries containing patient data.
    """
    patient_data = []

    with open(input_file, mode='r', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            patient_data.append(row)

    return patient_data
def process_patient_data(patient_data):
    """
    Process patient data to calculate BODE scores and risks.

    Parameters:
    patient_data (list): List of dictionaries containing patient data.

    Returns:
    list: Processed patient data with BODE scores and risks.
    """
    processed_data = []

    for row in patient_data:
        try:
            # Calculate BMI
            bmi = calculate_bmi(float(row['WEIGHT_KG']), float(row['HEIGHT_M']))
            # Calculate BODE Score and Risk
            bode_score, bode_risk = calculate_bode_score(
                float(row['fev_pct']),
                row['dyspnea_description'],
                float(row['distance_in_meters'])
            )
            processed_data.append({
                'NAME': row['NAME'],
                'BODE_SCORE': bode_score,
                'BODE_RISK': bode_risk,
                'HOSPITAL': row['hospital']
            })
        except ValueError as e:
            print(f"Error processing {row['NAME']}: {e}")

    return processed_data
def aggregate_hospital_data(patient_data):
    """
    Aggregate patient data by hospital.

    Parameters:
    patient_data (list): List of processed patient data.

    Returns:
    list: Aggregated hospital data with counts and averages.
    """
    hospital_summary = {}

    for patient in patient_data:
        hospital = patient['HOSPITAL']
        if hospital not in hospital_summary:
            hospital_summary[hospital] = {
                'COPD_COUNT': 0,
                'TOTAL_SCORE': 0,
                'TOTAL_RISK': 0
            }

        hospital_summary[hospital]['COPD_COUNT'] += 1
        hospital_summary[hospital]['TOTAL_SCORE'] += patient['BODE_SCORE']
        hospital_summary[hospital]['TOTAL_RISK'] += patient['BODE_RISK']

    hospital_output = []
    for hospital, data in hospital_summary.items():
        avg_score = data['TOTAL_SCORE'] / data['COPD_COUNT']
        avg_risk = data['TOTAL_RISK'] / data['COPD_COUNT']
        hospital_output.append({
            'HOSPITAL_NAME': hospital,
            'COPD_COUNT': data['COPD_COUNT'],
            'AVG_SCORE': avg_score,
            'AVG_RISK': avg_risk
        })

    return hospital_output
def main(input_file):
    # Load patient data from the CSV file
    patient_data = load_patient_data(input_file)
    # Process patient data to calculate BODE scores and risks
    processed_data = process_patient_data(patient_data)
    # Aggregate the data by hospital
    hospital_data = aggregate_hospital_data(processed_data)

    # Write the output to CSV files
    write_csv(processed_data, 'patient_output.csv')
    write_csv(hospital_data, 'hospital_output.csv')




### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [42]:
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"


patient_data = load_patient_data(patient_csv)
# Process patient data to calculate BODE scores and risks
processed_data = process_patient_data(patient_data)
patient_results = [[
    patient['NAME'],
    patient['BODE_SCORE'],
    patient['BODE_RISK'],
    patient['HOSPITAL']
] for patient in processed_data]
hospital_data = aggregate_hospital_data(processed_data)
# Prepare hospital output for CSV
hospital_output_list = [[
    hospital['HOSPITAL_NAME'],
    hospital['COPD_COUNT'],
    hospital['AVG_SCORE'],
    hospital['AVG_RISK']
] for hospital in hospital_data]

#Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(patient_results)
#Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(hospital_output_list)