# Healthcare Lab (Functions)

**Learning Objectives:**
  * Practice the application of functions to Pandas DataFrames
  * Gain exposure to healthcare related DataSets

## Context of the dataset

### 1. The dataset is consisted of records corresponding to medical events.
### 2. Each medical event is uniquely identified by `MedicalClaim`.
### 3. A given medical event might involve several medical procedures.
### 4. Each medical procedure is uniquely identified by `ClaimItem`
### 5. A given medical procedure is characterized by `PrincipalDiagnosisDesc`,`PrincipalDiagnosis`,`RevenueCodeDesc`, `RevenueCode`, `TypeFlag` and `TotalExpenses`

### 6. Each medical procedure involves: `MemberName`,`MemberID`,`County`,`HospitalName`, `HospitalType`, `StartDate`,`EndDate`


## 1. Library Import

In [7]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [8]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [9]:
HealthCareDataSet=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/HealthcareDataset_PublicRelease.csv",sep=',',parse_dates=['StartDate','EndDate','BirthDate'])

In [10]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [11]:
HealthCareDataSet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52563 entries, 0 to 52562
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Id                      52563 non-null  int64         
 1   MemberName              52563 non-null  object        
 2   MemberID                52563 non-null  object        
 3   County                  52563 non-null  object        
 4   MedicalClaim            52563 non-null  object        
 5   ClaimItem               52563 non-null  int64         
 6   HospitalName            52563 non-null  object        
 7   HospitalType            52563 non-null  object        
 8   StartDate               52563 non-null  datetime64[ns]
 9   EndDate                 52563 non-null  datetime64[ns]
 10  PrincipalDiagnosisDesc  52563 non-null  object        
 11  PrincipalDiagnosis      52563 non-null  object        
 12  RevenueCodeDesc         52561 non-null  object

## 3. Regular functions
### Let's practice the definition and application of regular functions to a Pandas DataFrame

### 3.1. Total days in hospital
#### This time we define and apply a regular function to compute the number of days in hospital. This is only for instructional purposes, the optimal approach uses pandas native functionality

In [20]:
def total_days_function(row):

  days=row['EndDate']-row['StartDate']
  return days

In [25]:
HealthCareDataSet.apply(total_days_function, axis=1)

Unnamed: 0,0
0,0 days
1,0 days
2,0 days
3,0 days
4,0 days
...,...
52558,7 days
52559,4 days
52560,4 days
52561,4 days


In [24]:
## Whenever possible it is advisable to rely on Pandas native functionality.

HealthCareDataSet['DaysInHospital']=HealthCareDataSet['EndDate']-HealthCareDataSet['StartDate']
HealthCareDataSet['DaysInHospital']

Unnamed: 0,DaysInHospital
0,0 days
1,0 days
2,0 days
3,0 days
4,0 days
...,...
52558,7 days
52559,4 days
52560,4 days
52561,4 days


### 3.2. Currency Converter
#### This time we define a function to convert dollars to yuans




In [27]:
def currency_converter_function(row):
  yuan_dollar_rate=7.24
  yuans=row['TotalExpenses']*yuan_dollar_rate
  return yuans


In [28]:
HealthCareDataSet.apply(currency_converter_function, axis=1)

Unnamed: 0,0
0,109.67152
1,22.24852
2,897.03600
3,54.37964
4,62.48844
...,...
52558,17636.64000
52559,15026.62000
52560,6269.11600
52561,4814.60000


In [29]:
### Again it is advisable to use Pandas native functionality whenever possible

HealthCareDataSet['TotalExpensesYuan']=HealthCareDataSet['TotalExpenses']*7.24
HealthCareDataSet['TotalExpensesYuan']




Unnamed: 0,TotalExpensesYuan
0,109.67152
1,22.24852
2,897.03600
3,54.37964
4,62.48844
...,...
52558,17636.64000
52559,15026.62000
52560,6269.11600
52561,4814.60000


## 4. Lambda functions
### Let's practice the definition and application of lambda functions to a Pandas DataFrame

### 3.1. Total days in hospital
#### This time we define and apply a lambda function to compute the number of days in hospital. This is only for instructional purposes, the optimal approach uses pandas native functionality

In [33]:
HealthCareDataSet.apply(lambda row: row['EndDate']-row['StartDate'], axis=1)

Unnamed: 0,0
0,0 days
1,0 days
2,0 days
3,0 days
4,0 days
...,...
52558,7 days
52559,4 days
52560,4 days
52561,4 days


In [None]:
## Whenever possible it is advisable to rely on Pandas native functionality.

HealthCareDataSet['DaysInHospital']=HealthCareDataSet['EndDate']-HealthCareDataSet['StartDate']
HealthCareDataSet['DaysInHospital']

Unnamed: 0,DaysInHospital
0,0 days
1,0 days
2,0 days
3,0 days
4,0 days
...,...
52558,7 days
52559,4 days
52560,4 days
52561,4 days


### 3.2. Currency Converter
#### This time we define a function to convert dollars to yuans




In [34]:
HealthCareDataSet.apply(lambda row: row['TotalExpenses']*7.24, axis=1)

Unnamed: 0,0
0,109.67152
1,22.24852
2,897.03600
3,54.37964
4,62.48844
...,...
52558,17636.64000
52559,15026.62000
52560,6269.11600
52561,4814.60000


In [35]:
### Again it is advisable to use Pandas native functionality whenever possible

HealthCareDataSet['TotalExpensesYuan']=HealthCareDataSet['TotalExpenses']*7.24
HealthCareDataSet['TotalExpensesYuan']




Unnamed: 0,TotalExpensesYuan
0,109.67152
1,22.24852
2,897.03600
3,54.37964
4,62.48844
...,...
52558,17636.64000
52559,15026.62000
52560,6269.11600
52561,4814.60000
