# Healthcare Lab (Functions)

**Learning Objectives:**
  * Practice the definition and application of functions operations in data science contexts
  * Gain exposure to healthcare related DataSets

## Context of the dataset

### 1. The dataset is consisted of records corresponding to medical events.
### 2. Each medical event is uniquely identified by `MedicalClaim`.
### 3. A given medical event might involve several medical procedures.
### 4. Each medical procedure is uniquely identified by `ClaimItem`
### 5. A given medical procedure is characterized by `PrincipalDiagnosisDesc`,`PrincipalDiagnosis`,`RevenueCodeDesc`, `RevenueCode`, `TypeFlag` and `TotalExpenses`

### 6. Each medical procedure involves: `MemberName`,`MemberID`,`County`,`HospitalName`, `HospitalType`, `StartDate`,`EndDate`


## 1. Library Import

In [1]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [3]:
HealthCareDataSet=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/HealthcareDataset_PublicRelease.csv",sep=',',parse_dates=['StartDate','EndDate','BirthDate'])

In [4]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [5]:
HealthCareDataSet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52563 entries, 0 to 52562
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Id                      52563 non-null  int64         
 1   MemberName              52563 non-null  object        
 2   MemberID                52563 non-null  object        
 3   County                  52563 non-null  object        
 4   MedicalClaim            52563 non-null  object        
 5   ClaimItem               52563 non-null  int64         
 6   HospitalName            52563 non-null  object        
 7   HospitalType            52563 non-null  object        
 8   StartDate               52563 non-null  datetime64[ns]
 9   EndDate                 52563 non-null  datetime64[ns]
 10  PrincipalDiagnosisDesc  52563 non-null  object        
 11  PrincipalDiagnosis      52563 non-null  object        
 12  RevenueCodeDesc         52561 non-null  object

## 3. Regular functions
### Let's define and apply some functions to transform our healthcare dataset

### 3.1. Function to compute the number of hospitalization days

In [6]:
def hospitalization_days(row):
    start=row['StartDate']
    end=row['EndDate']
    return (end-start).days



In [7]:
hospitalization_days(HealthCareDataSet[['StartDate','EndDate']].loc[300])

3

In [8]:
hospitalization_days(HealthCareDataSet[['StartDate','EndDate']].loc[0])

0

In [9]:
HealthCareDataSet[['StartDate','EndDate']].apply(hospitalization_days,axis=1)

Unnamed: 0,0
0,0
1,0
2,0
3,0
4,0
...,...
52558,7
52559,4
52560,4
52561,4


### 3.2. Function to print patient's medical data

In [10]:
def dataprinter(row):
   internalName=row['MemberName']
   internalId=row['MemberID']
   internalDiagnosisDesc=row['PrincipalDiagnosisDesc']
   internalDiagnosis=row['PrincipalDiagnosis']
   outputString=f" Patient {internalName} with ID {internalId} was diagnosed with {internalDiagnosisDesc} ICD-10 code {internalDiagnosis}"
   return(outputString)

In [11]:
pd.options.display.max_colwidth = 100
HealthCareDataSet.apply(dataprinter,axis=1)

Unnamed: 0,0
0,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
1,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
2,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
3,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
4,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
...,...
52558,Patient ff90a52f with ID 4ed7db9f was diagnosed with Traumatic subarachnoid he ICD-10 code S06....
52559,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0
52560,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0
52561,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0


## 4. Lambda functions
### Let's define and apply some functions to transform our healthcare dataset

### 4.1. Lambda function to compute the number of hospitalization days

In [12]:
# Compute the number of hospitalization days
HealthCareDataSet.apply(lambda row: row['EndDate'] - row['StartDate'], axis=1)

Unnamed: 0,0
0,0 days
1,0 days
2,0 days
3,0 days
4,0 days
...,...
52558,7 days
52559,4 days
52560,4 days
52561,4 days


### 4.2. Lambda Function to print patient's medical data

In [13]:
HealthCareDataSet.apply(lambda row: f" Patient {row['MemberName']} with ID {row['MemberID']} was diagnosed with {row['PrincipalDiagnosisDesc']} ICD-10 code {row['PrincipalDiagnosis']}",axis=1)

Unnamed: 0,0
0,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
1,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
2,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
3,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
4,Patient e659f3f4 with ID 6a380a28 was diagnosed with Epigastric pain ICD-10 code R10.13
...,...
52558,Patient ff90a52f with ID 4ed7db9f was diagnosed with Traumatic subarachnoid he ICD-10 code S06....
52559,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0
52560,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0
52561,Patient f90fcde2 with ID c88e4212 was diagnosed with Iron deficiency anemia se ICD-10 code D50.0
