<a href="https://colab.research.google.com/github/vkukkapalli1/HDS_5210_vsk/blob/main/Copy_of_midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [29]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [30]:
def bmi_kg_per_m2(weight_kg, height_m):
    """
    Calculate BMI using weight in kilograms and height in meters.

    Formula:
    BMI = weight (kg) / (height (m))^2  #general formula for BMI calculation(SI Units)

    Example:
    >>> bmi_kg_per_m2(70, 1.75)
    22.86

    >>> bmi_kg_per_m2(80, 1.8)
    24.69

    Raises:
        ValueError: if height is less than or equal to zero.
    """
    if height_m <= 0:
        raise ValueError("Height must be greater than 0.")
    if weight_kg <=0:
        raise ValueError("Weight must be greater than 0.")
    height_sq = height_m*height_m
    #adding height_sq for height squared

    bmi_kg_per_m2 = round(weight_kg / height_sq,2)
    #rounded to 2 decimal places as the weight in csv file is rounded to 2 decimal points.


    return bmi_kg_per_m2

In [31]:
import doctest
doctest.run_docstring_examples(bmi_kg_per_m2, globals(), verbose=True)

Finding tests in NoName
Trying:
    bmi_kg_per_m2(70, 1.75)
Expecting:
    22.86
ok
Trying:
    bmi_kg_per_m2(80, 1.8)
Expecting:
    24.69
ok


In [32]:
bmi_kg_per_m2 (67,1.82)

20.23

In [33]:
bmi_kg_per_m2(74,1.82)

22.34

### Step 2: Calculate BODE Score

In [34]:
def bode_score(bmi_kg_per_m2, fev_pct, dyspnea_description, distance_in_meters):
    """
   (float, float, str, int) -> int

    This function computes the BODE score for a COPD patient based on BMI, FEV1 percentage,
    dyspnea level, and 6-minute walk distance.

    The scoring logic is based on the criteria outlined in the BODE Index calculation from
    https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

    Formula:
        - FEV1 % (0-3 points)
        - BMI (0-1 point)
        - Dyspnea (0-3 points)
        - 6 minute walk distance in meters (0-3 points)

    >>> bode_score(25, 70, 'SLOWER THAN PEERS', 400)
    1

    >>> bode_score(18, 40, 'STOPS AFTER 100 YARDS', 300)
    6

    >>> bode_score(22, 30, 'BREATHLESS WHEN DRESSING', 100)
    9

    >>> bode_score(20, 35, 'STOPS AFTER A FEW MINUTES', 250)
    7
    """
    # few variable names and the string values of dyspnea_description are considered from the uploaded patient.csv file

    # Points for BMI
    if bmi_kg_per_m2 <=0:
       raise ValueError("BMI must be greater than 0.")

    if bmi_kg_per_m2 > 21:
        bmi_score = 0
    else:
        bmi_score = 1 # BMI <= 21

    # Points for FEV1 %
    if fev_pct >100 or fev_pct <0 :
      raise ValueError("fev_pct is percentage and cannot be a value over 100 or less than 0.")

    if fev_pct >= 65:
        fev_score = 0
    elif fev_pct >= 50:
        fev_score = 1
    elif fev_pct >= 36:
        fev_score = 2
    else:
        fev_score = 3

    # Points for Dyspnea based on description
    if dyspnea_description in ["STOPS WHEN WALKING AT PACE" , "SLOWER THAN PEERS"]:
        dyspnea_score = 1
    elif dyspnea_description in ["STOPS AFTER 100 YARDS" , "STOPS AFTER A FEW MINUTES"] :
        dyspnea_score = 2
    elif dyspnea_description in ["BREATHLESS WHEN DRESSING" , "UNABLE TO LEAVE HOME"]:
        dyspnea_score = 3
    else:
        dyspnea_score = 0

    # Points for 6-minute walk distance
    if distance_in_meters <0 :
      raise ValueError("Distance cannot be a negative value.")

    if distance_in_meters >= 350:
        distance_score = 0
    elif distance_in_meters >= 250:
        distance_score = 1
    elif distance_in_meters >= 150:
        distance_score = 2
    else:
        distance_score = 3

    # to get the total BODE score for a patient
    bode_score = bmi_score + fev_score + dyspnea_score + distance_score

    return bode_score

In [35]:
import doctest
doctest.run_docstring_examples(bode_score, globals(), verbose=True)

Finding tests in NoName
Trying:
    bode_score(25, 70, 'SLOWER THAN PEERS', 400)
Expecting:
    1
ok
Trying:
    bode_score(18, 40, 'STOPS AFTER 100 YARDS', 300)
Expecting:
    6
ok
Trying:
    bode_score(22, 30, 'BREATHLESS WHEN DRESSING', 100)
Expecting:
    9
ok
Trying:
    bode_score(20, 35, 'STOPS AFTER A FEW MINUTES', 250)
Expecting:
    7
ok


In [36]:
bode_score(20.23,70,'ONLY STRENUOUS EXERCISE',600)

1

In [38]:
bode_score(22.3,200,'BREATHLESS WHEN DRESSING',200)

ValueError: fev_pct is percentage and cannot be a value over 100 or less than 0.

In [39]:
bode_score(21,60,'BREATHLESS WHEN DRESSING',-20)

ValueError: Distance cannot be a negative value.

### Step 3: Calculate BODE Risk

In [40]:
def bode_risk(bode_score):
    """
    Determine the BODE 4-year survival risk from the BODE score.

    Risk levels based on BODE score:
    - 0-2 points: 80%
    - 3-4 points: 67%
    - 5-6 points: 57%
    - 7-10 points: 18%
    Resulting bode_risk is given as an int

    Example:
    >>> bode_risk(0)
    80

    >>> bode_risk(4)
    67

    >>> bode_risk(7)
    18

    """
    #to rule out the values that are not possible to obtain in bode_score
    if bode_score <0 or bode_score >10:
        raise ValueError ("The entered BODE score is ineligible.")

    #to determine the BODE RISK
    if bode_score <= 2:
        return 80
    elif bode_score <= 4:
        return 67
    elif bode_score <= 6:
        return 57
    elif bode_score <= 10:
        return 18

    return bode_risk

In [41]:
import doctest
doctest.run_docstring_examples(bode_risk, globals(), verbose=True)

Finding tests in NoName
Trying:
    bode_risk(0)
Expecting:
    80
ok
Trying:
    bode_risk(4)
Expecting:
    67
ok
Trying:
    bode_risk(7)
Expecting:
    18
ok


In [42]:
bode_risk(0)

80

In [43]:
bode_risk(100)

ValueError: The entered BODE score is ineligible.

In [44]:
bode_risk(-2)

ValueError: The entered BODE score is ineligible.

### Step 4: Load Hospital Data

In [48]:
def hospital_data(file_path):
  """

  hospital data would consist of the information from hospitals.json

  this function would give me loaded hospital data as a dictionary

  """

  with open(file_path, 'r') as file:
    return json.load(file)

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [49]:
import json
import csv
def bmi_kg_per_m2(weight_kg, height_m):

    if height_m <= 0:
        raise ValueError("Height must be greater than 0.")
    if weight_kg <=0:
        raise ValueError("Weight must be greater than 0.")
    height_sq = height_m*height_m
    #adding height_sq for height squared

    bmi_kg_per_m2 = round(weight_kg / height_sq,2)
    #rounded to 2 decimal places as the weight in csv file is rounded to 2 decimal points.


    return bmi_kg_per_m2

def bode_score(bmi_kg_per_m2, fev_pct, dyspnea_description, distance_in_meters):

    if bmi_kg_per_m2 <=0:
       raise ValueError("BMI must be greater than 0.")

    if bmi_kg_per_m2 > 21:
        bmi_score = 0
    else:
        bmi_score = 1 # BMI <= 21

    # Points for FEV1 %
    if fev_pct >100 or fev_pct <0 :
      raise ValueError("fev_pct is percentage and cannot be a value over 100 or less than 0.")

    if fev_pct >= 65:
        fev_score = 0
    elif fev_pct >= 50:
        fev_score = 1
    elif fev_pct >= 36:
        fev_score = 2
    else:
        fev_score = 3

    # Points for Dyspnea based on description
    if dyspnea_description in ["STOPS WHEN WALKING AT PACE" , "SLOWER THAN PEERS"]:
        dyspnea_score = 1
    elif dyspnea_description in ["STOPS AFTER 100 YARDS" , "STOPS AFTER A FEW MINUTES"] :
        dyspnea_score = 2
    elif dyspnea_description in ["BREATHLESS WHEN DRESSING" , "UNABLE TO LEAVE HOME"]:
        dyspnea_score = 3
    else:
        dyspnea_score = 0

    # Points for 6-minute walk distance
    if distance_in_meters <0 :
      raise ValueError("Distance cannot be a negative value.")

    if distance_in_meters >= 350:
        distance_score = 0
    elif distance_in_meters >= 250:
        distance_score = 1
    elif distance_in_meters >= 150:
        distance_score = 2
    else:
        distance_score = 3

    # to get the total BODE score for a patient
    bode_score = bmi_score + fev_score + dyspnea_score + distance_score

    return bode_score

def bode_risk(bode_score):

    if bode_score <0 or bode_score >10:
        raise ValueError ("The entered BODE score is ineligible.")

    #to determine the BODE RISK
    if bode_score <= 2:
        return 80
    elif bode_score <= 4:
        return 67
    elif bode_score <= 6:
        return 57
    elif bode_score <= 10:
        return 18

    return bode_risk

def hospital_data(file_path):

  with open(file_path, 'r') as file:
    return json.load(file)


patient_csv = "/content/patient.csv"
hospital_json = "/content/hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

###
# Your logic here
def midterm_assignment(patient_csv, hospital_json, patient_output_csv, hospital_output_csv):
    hospitals = hospital_data(hospital_json)
    hospital_beds = {hospital['name']: hospital['beds'] for system in hospitals for hospital in system['hospitals']}

    patient_data = []
    hospitals_data = {}

    patient_results = [['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL']]
    hospital_output_list = [['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']]

    with open(patient_csv, 'r') as file:
        reader = csv.DictReader(file)
        for row in reader:
            name = row['NAME']
            hospital = row['hospital']
            weight_kg = float(row['WEIGHT_KG'])
            height_m = float(row['HEIGHT_M'])
            fev_pct = float(row['fev_pct'])
            dyspnea_description = row['dyspnea_description']
            distance_in_meters = float(row['distance_in_meters'])

            bmi = bmi_kg_per_m2(weight_kg, height_m)
            BODEscore = bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters)
            BODErisk = bode_risk(BODEscore)

            patient_data.append([name, BODEscore, f"{BODErisk}%", hospital])

            if hospital not in hospitals_data:
                hospitals_data[hospital] = {'count': 0, 'total_score': 0, 'total_risk': 0}

            hospitals_data[hospital]['count'] += 1
            hospitals_data[hospital]['total_score'] += BODEscore
            hospitals_data[hospital]['total_risk'] += BODErisk

    patient_results.extend(patient_data)


    for hospital, data in hospitals_data.items():
        beds = hospital_beds[hospital]
        copd_count = data['count']
        avg_score = round(data['total_score'] / copd_count, 2)
        avg_risk = round(data['total_risk'] / copd_count, 2)
        pct_of_copd_cases_over_beds = round((copd_count / beds) * 100, 2)
        hospital_output_list.append([hospital, copd_count, pct_of_copd_cases_over_beds, avg_score, f"{avg_risk}%"])

###

# Write Patient_output.csv
    with open(patient_output_csv, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(patient_results)
        print(f"Patient output CSV created: {patient_output_csv}")

    # Write Hospital_output.csv
    with open(hospital_output_csv, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(hospital_output_list)
        print(f"Hospital output CSV created: {hospital_output_csv}")


In [50]:
midterm_assignment("/content/patient.csv", "/content/hospitals.json", "patient_output.csv", "hospital_output.csv")

Patient output CSV created: patient_output.csv
Hospital output CSV created: hospital_output.csv
