# Healthcare Lab (Conditionals and Iterations-Simplified)

**Learning Objectives:**
  * Practice the application of conditionals and iterations in data science contexts
  * Gain exposure to healthcare related DataSets

## Context of the dataset

### 1. The dataset is consisted of records corresponding to medical events.
### 2. Each medical event is uniquely identified by `MedicalClaim`.
### 3. A given medical event might involve several medical procedures.
### 4. Each medical procedure is uniquely identified by `ClaimItem`
### 5. A given medical procedure is characterized by `PrincipalDiagnosisDesc`,`PrincipalDiagnosis`,`RevenueCodeDesc`, `RevenueCode`, `TypeFlag` and `TotalExpenses`

### 6. Each medical procedure involves: `MemberName`,`MemberID`,`County`,`HospitalName`, `HospitalType`, `StartDate`,`EndDate`


## 1. Library Import

In [1]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [3]:
HealthCareDataSet=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/HealthcareDataset_PublicRelease.csv",sep=',',parse_dates=['StartDate','EndDate','BirthDate'])

In [4]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [5]:
HealthCareDataSet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52563 entries, 0 to 52562
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Id                      52563 non-null  int64         
 1   MemberName              52563 non-null  object        
 2   MemberID                52563 non-null  object        
 3   County                  52563 non-null  object        
 4   MedicalClaim            52563 non-null  object        
 5   ClaimItem               52563 non-null  int64         
 6   HospitalName            52563 non-null  object        
 7   HospitalType            52563 non-null  object        
 8   StartDate               52563 non-null  datetime64[ns]
 9   EndDate                 52563 non-null  datetime64[ns]
 10  PrincipalDiagnosisDesc  52563 non-null  object        
 11  PrincipalDiagnosis      52563 non-null  object        
 12  RevenueCodeDesc         52561 non-null  object

## 3. Applying conditionals to a Pandas DataFrame
#### It is advisable to implement conditional-based execution logic only when strictly required. It is better to rely on pandas native operations whenever possible.
#### if clauses are useful in combination with lambda functions as they enable fine-grained processing of DataFrames

In [24]:
HealthCareDataSet['TypeFlag'].apply(lambda x:'Found ER' if x=='ER' else "Did not find ER")

Unnamed: 0,TypeFlag
0,Found ER
1,Found ER
2,Found ER
3,Found ER
4,Found ER
...,...
52558,Did not find ER
52559,Did not find ER
52560,Did not find ER
52561,Did not find ER


In [7]:
HealthCareDataSet['TotalExpenses'].apply(lambda x:'Expense larger than 100' if x>100 else "Expense lower than 100")

Unnamed: 0,TotalExpenses
0,Expense lower than 100
1,Expense lower than 100
2,Expense larger than 100
3,Expense lower than 100
4,Expense lower than 100
...,...
52558,Expense larger than 100
52559,Expense larger than 100
52560,Expense larger than 100
52561,Expense larger than 100


## 4. Applying For Loops to process a DataFrame Column

In [25]:
list=['apple','orange','kiwi','peach','mandarin']

In [26]:
for fruit in list:
  print("I got:"+fruit)

I got:apple
I got:orange
I got:kiwi
I got:peach
I got:mandarin


In [30]:
HealthCareDataSetReduced=HealthCareDataSet.sample(10)
HealthCareDataSetReduced

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
45247,724899,f10b3890,c3071639,fd218584,e228190c5d8fdfb9,4,446442f4,HOSPITAL,2020-11-23,2020-11-27,Sepsis unspecified organi,A41.9,MEDICAL/SURGICAL SUPPLIES,270.0,INP,1952-12-06,4626.937
49665,734960,2a30b16f,c98ed403,425a37b2,dbb0e293fc1daddc,14,2148dc02,HOSPITAL,2020-12-16,2020-12-16,Unspecified injury of hea,S09.90XA,EMERGENCY ROOM,450.0,ER,1977-08-09,1906.8
26025,683959,e77eb541,efd6e7cb,425a37b2,05a008d4d2789daf,7,295f5b41,HOSPITAL,2020-06-18,2020-06-24,Cerebral infarction due t,I63.40,RADIOLOGY - DIAGNOSTIC: CHEST X-RAY,324.0,INP,1955-03-01,635.6
25965,683888,aac0b8eb,77d097f5,425a37b2,7adf6c33522e9f91,29,761ae146,HOSPITAL,2020-06-17,2020-06-18,Calculus of gallbladder w,K80.01,RECOVERY ROOM,710.0,ER,1966-09-09,736.47
15589,663297,fe961d9e,18edb592,02af982d,0a3806627e76fc1c,4,4d103af0,HOSPITAL,2020-03-16,2020-03-16,Fracture of nasal bones i,S02.2XXA,EMERGENCY ROOM,450.0,ER,1955-04-13,2157.792
4599,642470,3736258e,5d672040,ea48569b,1e550bfd13270563,1,901bbdcc,HOSPITAL,2020-01-15,2020-01-17,Sepsis unspecified organi,A41.9,ROOM & BOARD (PRIVATE),110.0,INP,1934-06-19,1477.672
28166,688385,8894032a,0b8a2899,fd218584,5b8bb4ed5c862550,17,b592f5ae,HOSPITAL,2020-07-14,2020-07-16,Altered mental status uns,R41.82,RADIOLOGY - DIAGNOSTIC: CHEST X-RAY,324.0,ER,1937-09-21,477.456
20077,671913,a4007eba,16a20225,89e38653,1b97ec534eecce63,17,30807d03,HOSPITAL,2020-05-06,2020-05-13,Hypertensive heart and ch,I13.0,EMERGENCY ROOM,450.0,INP,1938-05-11,2794.134
40297,714069,95a0f30c,90982883,02af982d,ba4ec525690933fc,1,446442f4,HOSPITAL,2020-10-18,2020-10-20,Dysarthria and anarthria,R47.1,PHARMACY,250.0,ER,1931-06-30,0.042
23874,679346,2a0f737e,5e3e223f,fd218584,47e044e897c402ef,20,446442f4,HOSPITAL,2020-06-16,2020-06-17,Unilateral primary osteoa,M17.11,RECOVERY ROOM,710.0,INP,1936-10-10,4416.475


In [31]:
## we process each element of the column `PrincipalDiagnosisDesc`

for item in HealthCareDataSetReduced['PrincipalDiagnosisDesc']:
  print(item.upper())

SEPSIS UNSPECIFIED ORGANI
UNSPECIFIED INJURY OF HEA
CEREBRAL INFARCTION DUE T
CALCULUS OF GALLBLADDER W
FRACTURE OF NASAL BONES I
SEPSIS UNSPECIFIED ORGANI
ALTERED MENTAL STATUS UNS
HYPERTENSIVE HEART AND CH
DYSARTHRIA AND ANARTHRIA
UNILATERAL PRIMARY OSTEOA


## 4. Applying For Loops to process a Pandas DataFrame Row

In [39]:
HealthCareDataSetReduced.iloc[0]

Unnamed: 0,45247
Id,724899
MemberName,f10b3890
MemberID,c3071639
County,fd218584
MedicalClaim,e228190c5d8fdfb9
ClaimItem,4
HospitalName,446442f4
HospitalType,HOSPITAL
StartDate,2020-11-23 00:00:00
EndDate,2020-11-27 00:00:00


In [41]:
for item in HealthCareDataSetReduced.iloc[0]:
  print("I got the following item:",item)

I got the following item: 724899
I got the following item: f10b3890
I got the following item: c3071639
I got the following item: fd218584
I got the following item: e228190c5d8fdfb9
I got the following item: 4
I got the following item: 446442f4
I got the following item: HOSPITAL
I got the following item: 2020-11-23 00:00:00
I got the following item: 2020-11-27 00:00:00
I got the following item: Sepsis unspecified organi
I got the following item: A41.9
I got the following item: MEDICAL/SURGICAL SUPPLIES
I got the following item: 270.0
I got the following item: INP
I got the following item: 1952-12-06 00:00:00
I got the following item: 4626.937
