<a href="https://colab.research.google.com/github/vaibs-max/assignments-/blob/main/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [99]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [100]:
def bmi(weight, height):
   if weight <= 0 or height <= 0:
        raise ValueError("Weight and height must be positive numbers.")

   return weight/(height**2)

In [101]:
bmi(100,155)

0.004162330905306972

In [102]:
bmi(70, 1.75)

22.857142857142858

In [103]:
bmi(50, 1.6)

19.531249999999996

### Step 2: Calculate BODE Score

In [104]:
def bode_score(bmi, fev_pct, dyspnea, distance):
    score = 0

    if fev_pct >= 65: score += 0
    elif 50 <= fev_pct < 65: score += 1
    elif 36 <= fev_pct < 50: score += 2
    else: score += 3

    #if dyspnea == 'Dyspnea only with strenuous exercise': score += 0
    #elif dyspnea == 'Dyspnea when hurrying or walking up a slight hill': score += 1
    #elif dyspnea == 'Walks slower than people of same age because of dyspnea or stops for breath when walking at own pace': score += 2
    #elif dyspnea == 'Stops for breath after walking 100 yards (91 m) or after a few minutes': score += 3
    #elif dyspnea == 'Too dyspneic to leave house or breathless when dressing': score += 4

    dyspnea_mapping = {
        'ONLY STRENUOUS EXERCISE': 0,
        'WHEN HURRYING': 1,
        'WALKING UPHILL': 1,
        'SLOWER THAN PEERS': 2,
        'STOPS WHEN WALKING AT PACE': 2,
        'STOPS AFTER 100 YARDS': 3,
        'STOPS AFTER A FEW MINUTES': 3,
        'UNABLE TO LEAVE HOME': 4,
        'BREATHLESS WHEN DRESSING': 4
    }

    score += dyspnea_mapping.get(dyspnea, 0)

    if distance >= 350: score += 0
    elif 250 <= distance < 350: score += 1
    elif 150 <= distance < 250: score += 2
    else: score += 3

    if bmi <= 21: score += 1

    return score

In [105]:
bode_score(21.5, 50, 3, 400)

1

In [106]:
bode_score(22, 40, 2, 329)

3

### Step 3: Calculate BODE Risk

In [107]:
def bode_risk(score):
    if score <= 2: return '80% survival'
    elif 3 <= score <= 4: return '67% survival'
    elif 5 <= score <= 6: return '57% survival'
    elif 7 <= score <= 10: return '18% survival'

In [108]:
bode_risk(10)

'18% survival'

In [109]:
bode_risk(2)

'80% survival'

In [110]:
bode_risk(4)

'67% survival'

### Step 4: Load Hospital Data

In [111]:
def load_patient_data(file_path):
    patients = []
    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)
        for row in reader:
            patients.append(row)
    return patients

def load_hospital_data(file_path):
    with open(file_path) as f:
        return json.load(f)['hospitals']

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [112]:
patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

###

patient_df['BMI'] = patient_df.apply(lambda row: bmi(row['WEIGHT_KG'], row['HEIGHT_M']), axis=1)
patient_df['BODE_SCORE'] = patient_df.apply(lambda row: bode_score(row['BMI'], row['fev_pct'], row['dyspnea_description'], row['distance_in_meters']), axis=1)
patient_df['BODE_RISK'] = patient_df['BODE_SCORE'].apply(bode_risk)

patient_output = patient_df[['NAME', 'BODE_SCORE', 'BODE_RISK', 'hospital']]


hospital_stats = patient_df.groupby('hospital').agg(
    COPD_COUNT=('NAME', 'count'),
    AVG_SCORE=('BODE_SCORE', 'mean'),
    AVG_RISK=('BODE_RISK', lambda x: x.mode())
).reset_index()


hospital_output = hospital_stats.merge(hospitals_df, left_on='hospital', right_on='name')
hospital_output['PCT_OF_COPD_CASES_OVER_BEDS'] = (hospital_output['COPD_COUNT'] / hospital_output['beds']) * 100

hospital_output = hospital_output[['hospital', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']]
hospital_output.columns = ['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']

###

patient_results = patient_output
hospital_output_list = hospital_output

patient_output.to_csv(patient_output_file, index=False)
hospital_output.to_csv(hospital_output_file, index=False)