<a href="https://colab.research.google.com/github/venkateshpaturu/venkatesh-HDS/blob/main/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [49]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [50]:
def calculate_bmi(weight_kg, height_m):
    """
    Calculate BMI from weight and height.

    >>> calculate_bmi(70, 1.75)
    22.86
    >>> calculate_bmi(90, 1.90)
    24.93
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive values.")

    return round(weight_kg / (height_m ** 2), 2)

if name == "__main__":
     import doctest
     doctest.testmod()

### Step 2: Calculate BODE Score

In [51]:
def cal_bode_score(bmi, fev_pct, distance, dyspnea_description):
    '''Calculate the BODE score based on BMI, FEV percentage, distance walked, and dyspnea description
    The BODE score is calculated as follows:
    - BMI > 21: 0, BMI <= 21: +1
    - FEV% >= 65: 0, FEV% 50-64: +1, FEV% 34-49: +2, FEV% <=35: +3
    - Distance >= 350 m : 0, 250-349 m : +1, 150-249 m : +2, <=149 m : +3
    - Dyspnea based on descriptions:
        "Dyspnea only with strenuous exercise": 0,
        "Dyspnea when hurrying or walking up a slight hill": 0,
        "Walks slower than people of same age because of dyspnea or stops for breath when walking at own pace": +1,
        "Stops for breath after walking 100 yards (91 m) or after a few minutes": +2,
        "Too dyspneic to leave house or breathless when dressing": +3

    >>> cal_bode_score(30.52, 57.73, 367.9, "STOPS AFTER A FEW MINUTES")
    1
    >>> cal_bode_score(30.89, 61.6, 184.16, "WHEN HURRYING")
    3
    >>> cal_bode_score(36.62,83.11, 200.66, "BREATHLESS WHEN DRESSING")
    2
    '''
    score = 0

    # BMI contribution to the score
    if bmi > 21:
        score += 0
    else:
        score += 1

    # FEV contribution to the score
    if fev_pct >= 65:
        score += 0
    elif 50 <= fev_pct < 65:
        score += 1
    elif 36 <= fev_pct < 50:
        score += 2
    else:
        score += 3
    # Distance contribution to the score
    if distance >= 350:
        score += 0
    elif 250 <= distance < 350:
        score += 1
    elif 150 <= distance < 250:
        score += 2
    else:
        score += 3

  # Dyspnea contribution to the score
    if dyspnea_description == "Dyspnea only with strenuous exercise":
        score += 0
    elif dyspnea_description == "Dyspnea when hurrying or walking up a slight hill":
        score += 0
    elif dyspnea_description == "Walks slower than people of same age because of dyspnea or stops for breath when walking at own pace":
        score += 1
    elif dyspnea_description == "Stops for breath after walking 100 yards (91 m) or after a few minutes":
        score += 2
    elif dyspnea_description == "Too dyspneic to leave house or breathless when dressing":
        score += 3

    return score


if name == "__main__":
     import doctest
     doctest.testmod()

### Step 3: Calculate BODE Risk

In [52]:
def calculate_bode_risk(bode_score) :
 """
Calculating the BODE risk based on the BODE score.
bode_score: BODE score
return: BODE risk category
››› calculate bode_risk(2)
20
›>> calculate bode_risk(4)
33
"""
 if bode_score <= 2:
    return 100-80
 elif 3 <= bode_score <=4:
    return 100-67
 elif 5 <= bode_score <=6:
    return 100-57
 else:
    return 100-18


In [53]:
if name == "__main__":
     import doctest
     doctest.testmod()

### Step 4: Load Hospital Data

In [54]:
def load_hospital_data(json_file):
    """
    Load hospital data from a JSON file.
    """
    with open(json_file, 'r') as file:
        return json.load(file)



if name == "__main__":
     import doctest
     doctest.testmod()

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [57]:
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

# Load hospital data
hospital_data = load_hospital_data(hospital_json)

# Initialize the hospital metrics dictionary using the hospital names from the JSON data
hospital_metrics = {}

for entry in hospital_data:
    # Iterate over the hospitals list within the entry
    for hospital in entry['hospitals']:
        hospital_metrics[hospital['name']] = {
            'total_bode_score': 0,
            'total_risk': 0,
            'copd_count': 0,
            'beds': hospital['beds']
        }

patient_results = []

# Read patient data from the CSV file
with open(patient_csv, 'r') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        name = row['NAME']
        ssn = row['SSN']
        language = row['LANGUAGE']
        job = row['JOB']
        height_m = float(row['HEIGHT_M'])
        weight_kg = float(row['WEIGHT_KG'])
        fev_pct = float(row['fev_pct'])
        dyspnea_description = row['dyspnea_description']
        distance_in_meters = float(row['distance_in_meters'])
        hospital_name = row['hospital']

        # Calculate BMI, BODE score, and BODE risk
        bmi = calculate_bmi(weight_kg, height_m)
        bode_score = calculate_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters)
        bode_risk = calculate_bode_risk(bode_score)

        # Add patient results
        patient_results.append([name, bode_score, bode_risk, hospital_name])

        # Update hospital metrics
        if hospital_name in hospital_metrics:
            hospital_metrics[hospital_name]['total_bode_score'] += bode_score
            hospital_metrics[hospital_name]['total_risk'] += bode_risk
            hospital_metrics[hospital_name]['copd_count'] += 1

hospital_output_list = []

# Calculate hospital metrics
for hospital_name, metrics in hospital_metrics.items():
    copd_count = metrics['copd_count']
    if copd_count > 0:
        avg_bode_score = metrics['total_bode_score'] / copd_count
        avg_bode_risk = metrics['total_risk'] / copd_count
    else:
        avg_bode_score = 0
        avg_bode_risk = 0
    pct_of_copd_cases = (copd_count / metrics['beds']) * 100 if metrics['beds'] > 0 else 0
    hospital_output_list.append([hospital_name, copd_count, pct_of_copd_cases, avg_bode_score, avg_bode_risk])

# Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["NAME", "BODE_SCORE", "BODE_RISK", "HOSPITAL"])
    writer.writerows(patient_results)

# Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["HOSPITAL_NAME", "COPD_COUNT", "PCT_OF_COPD_CASES_OVER_BEDS", "AVG_SCORE", "AVG_RISK"])
    writer.writerows(hospital_output_list)


In [58]:
if name == "__main__":
     import doctest
     doctest.testmod()

In [65]:
import csv
with open('patient_output.csv') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        print(row)

{'NAME': 'Vanessa Roberts', 'BODE_SCORE': '4', 'BODE_RISK': '33', 'HOSPITAL': "ST.LUKE'S"}
{'NAME': 'Christopher Fox', 'BODE_SCORE': '5', 'BODE_RISK': '43', 'HOSPITAL': 'SAINT LOUIS UNIVERSITY'}
{'NAME': 'Benjamin Johnston', 'BODE_SCORE': '1', 'BODE_RISK': '20', 'HOSPITAL': 'BJC'}
{'NAME': 'Christopher Hernandez', 'BODE_SCORE': '5', 'BODE_RISK': '43', 'HOSPITAL': 'MISSOURI BAPTIST'}
{'NAME': 'Valerie Burch', 'BODE_SCORE': '3', 'BODE_RISK': '33', 'HOSPITAL': 'BJC WEST COUNTY'}
{'NAME': 'Heather Hart', 'BODE_SCORE': '5', 'BODE_RISK': '43', 'HOSPITAL': 'SAINT LOUIS UNIVERSITY'}
{'NAME': 'Ronald Cobb', 'BODE_SCORE': '6', 'BODE_RISK': '43', 'HOSPITAL': "ST.MARY'S"}
{'NAME': 'Austin French', 'BODE_SCORE': '7', 'BODE_RISK': '82', 'HOSPITAL': 'SAINT LOUIS UNIVERSITY'}
{'NAME': 'Mary Leonard', 'BODE_SCORE': '5', 'BODE_RISK': '43', 'HOSPITAL': 'BJC'}
{'NAME': 'Mrs. Nicole Smith', 'BODE_SCORE': '3', 'BODE_RISK': '33', 'HOSPITAL': "ST.MARY'S"}
{'NAME': 'Ashley Warren', 'BODE_SCORE': '5', 'BODE_RIS

In [66]:
import csv
with open('hospital_output.csv') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        print(row)

{'HOSPITAL_NAME': 'BJC', 'COPD_COUNT': '184', 'PCT_OF_COPD_CASES_OVER_BEDS': '9.2', 'AVG_SCORE': '4.543478260869565', 'AVG_RISK': '41.41304347826087'}
{'HOSPITAL_NAME': 'BJC WEST COUNTY', 'COPD_COUNT': '171', 'PCT_OF_COPD_CASES_OVER_BEDS': '17.1', 'AVG_SCORE': '4.654970760233918', 'AVG_RISK': '42.67251461988304'}
{'HOSPITAL_NAME': 'MISSOURI BAPTIST', 'COPD_COUNT': '161', 'PCT_OF_COPD_CASES_OVER_BEDS': '20.125', 'AVG_SCORE': '4.341614906832298', 'AVG_RISK': '39.900621118012424'}
{'HOSPITAL_NAME': 'SAINT LOUIS UNIVERSITY', 'COPD_COUNT': '164', 'PCT_OF_COPD_CASES_OVER_BEDS': '16.400000000000002', 'AVG_SCORE': '4.4817073170731705', 'AVG_RISK': '40.792682926829265'}
{'HOSPITAL_NAME': "ST.MARY'S", 'COPD_COUNT': '156', 'PCT_OF_COPD_CASES_OVER_BEDS': '31.2', 'AVG_SCORE': '4.5', 'AVG_RISK': '41.32692307692308'}
{'HOSPITAL_NAME': "ST.LUKE'S", 'COPD_COUNT': '164', 'PCT_OF_COPD_CASES_OVER_BEDS': '20.5', 'AVG_SCORE': '4.426829268292683', 'AVG_RISK': '40.457317073170735'}
