# Healthcare Lab (Stage 1)

**Learning Objectives:**
  * Practice basic data manipulation with Pandas
  * Gain exposure to healthcare related DataSets

## Context of the dataset

### 1. The dataset is consisted of records corresponding to medical events.
### 2. Each medical event is uniquely identified by `MedicalClaim`.
### 3. A given medical event might involve several medical procedures.
### 4. Each medical procedure is uniquely identified by `ClaimItem`
### 5. A given medical procedure is characterized by `PrincipalDiagnosisDesc`,`PrincipalDiagnosis`,`RevenueCodeDesc`, `RevenueCode`, `TypeFlag` and `TotalExpenses`

### 6. Each medical procedure involves: `MemberName`,`MemberID`,`County`,`HospitalName`, `HospitalType`, `StartDate`,`EndDate`


## 1. Library Import

In [1]:
import pandas as pd
import warnings
import numpy as np

In [2]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [3]:
HealthCareDataSet=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/HealthcareDataset_PublicRelease.csv",sep=',',parse_dates=['StartDate','EndDate','BirthDate'])

In [4]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [5]:
HealthCareDataSet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52563 entries, 0 to 52562
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Id                      52563 non-null  int64         
 1   MemberName              52563 non-null  object        
 2   MemberID                52563 non-null  object        
 3   County                  52563 non-null  object        
 4   MedicalClaim            52563 non-null  object        
 5   ClaimItem               52563 non-null  int64         
 6   HospitalName            52563 non-null  object        
 7   HospitalType            52563 non-null  object        
 8   StartDate               52563 non-null  datetime64[ns]
 9   EndDate                 52563 non-null  datetime64[ns]
 10  PrincipalDiagnosisDesc  52563 non-null  object        
 11  PrincipalDiagnosis      52563 non-null  object        
 12  RevenueCodeDesc         52561 non-null  object

## 3. Column Selection

In [6]:
HealthCareDataSet[['MemberName','MedicalClaim','ClaimItem','TotalExpenses']]

Unnamed: 0,MemberName,MedicalClaim,ClaimItem,TotalExpenses
0,e659f3f4,c1e3436737c77899,18,15.148
1,e659f3f4,c1e3436737c77899,21,3.073
2,e659f3f4,c1e3436737c77899,10,123.900
3,e659f3f4,c1e3436737c77899,20,7.511
4,e659f3f4,c1e3436737c77899,19,8.631
...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500
52560,f90fcde2,8b6a8d2720d16e97,8,865.900
52561,f90fcde2,8b6a8d2720d16e97,12,665.000


In [7]:
ReducedDataSet=HealthCareDataSet[['MemberName','MedicalClaim','ClaimItem','BirthDate','TotalExpenses']]

In [8]:
ReducedDataSet.head()

Unnamed: 0,MemberName,MedicalClaim,ClaimItem,BirthDate,TotalExpenses
0,e659f3f4,c1e3436737c77899,18,1967-05-13,15.148
1,e659f3f4,c1e3436737c77899,21,1967-05-13,3.073
2,e659f3f4,c1e3436737c77899,10,1967-05-13,123.9
3,e659f3f4,c1e3436737c77899,20,1967-05-13,7.511
4,e659f3f4,c1e3436737c77899,19,1967-05-13,8.631


## 4. Column Elimination

In [9]:
## we set axis=1 to instruct Pandas to drop on a column basis.
ReducedDataSetNoBirthDate=ReducedDataSet.drop('BirthDate',axis=1)

In [10]:
ReducedDataSetNoBirthDate

Unnamed: 0,MemberName,MedicalClaim,ClaimItem,TotalExpenses
0,e659f3f4,c1e3436737c77899,18,15.148
1,e659f3f4,c1e3436737c77899,21,3.073
2,e659f3f4,c1e3436737c77899,10,123.900
3,e659f3f4,c1e3436737c77899,20,7.511
4,e659f3f4,c1e3436737c77899,19,8.631
...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500
52560,f90fcde2,8b6a8d2720d16e97,8,865.900
52561,f90fcde2,8b6a8d2720d16e97,12,665.000


In [11]:
pd.set_option('mode.chained_assignment', None)

In [12]:
# By default pandas operations DO NOT affect the original dataframe, you need to set inplace=True to write over the original dataframe
ReducedDataSet.drop('BirthDate',axis=1,inplace=True)

In [13]:
ReducedDataSet

Unnamed: 0,MemberName,MedicalClaim,ClaimItem,TotalExpenses
0,e659f3f4,c1e3436737c77899,18,15.148
1,e659f3f4,c1e3436737c77899,21,3.073
2,e659f3f4,c1e3436737c77899,10,123.900
3,e659f3f4,c1e3436737c77899,20,7.511
4,e659f3f4,c1e3436737c77899,19,8.631
...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500
52560,f90fcde2,8b6a8d2720d16e97,8,865.900
52561,f90fcde2,8b6a8d2720d16e97,12,665.000


## 5. Column Renaming

In [14]:
ReducedDataSet.rename(columns={"MemberName":"NameID","TotalExpenses":"OverallExpenses"},inplace=True)

In [15]:
ReducedDataSet

Unnamed: 0,NameID,MedicalClaim,ClaimItem,OverallExpenses
0,e659f3f4,c1e3436737c77899,18,15.148
1,e659f3f4,c1e3436737c77899,21,3.073
2,e659f3f4,c1e3436737c77899,10,123.900
3,e659f3f4,c1e3436737c77899,20,7.511
4,e659f3f4,c1e3436737c77899,19,8.631
...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500
52560,f90fcde2,8b6a8d2720d16e97,8,865.900
52561,f90fcde2,8b6a8d2720d16e97,12,665.000


## 6. Operations on Columns

In [16]:
ReducedDataSet['OverallExpensesPowered']=ReducedDataSet['OverallExpenses']**2

In [17]:
ReducedDataSet

Unnamed: 0,NameID,MedicalClaim,ClaimItem,OverallExpenses,OverallExpensesPowered
0,e659f3f4,c1e3436737c77899,18,15.148,2.294619e+02
1,e659f3f4,c1e3436737c77899,21,3.073,9.443329e+00
2,e659f3f4,c1e3436737c77899,10,123.900,1.535121e+04
3,e659f3f4,c1e3436737c77899,20,7.511,5.641512e+01
4,e659f3f4,c1e3436737c77899,19,8.631,7.449416e+01
...,...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000,5.934096e+06
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500,4.307700e+06
52560,f90fcde2,8b6a8d2720d16e97,8,865.900,7.497828e+05
52561,f90fcde2,8b6a8d2720d16e97,12,665.000,4.422250e+05


In [18]:

ReducedDataSet['OverallExpensesLog']=np.log(ReducedDataSet['OverallExpenses'])

In [19]:
ReducedDataSet

Unnamed: 0,NameID,MedicalClaim,ClaimItem,OverallExpenses,OverallExpensesPowered,OverallExpensesLog
0,e659f3f4,c1e3436737c77899,18,15.148,2.294619e+02,2.717869
1,e659f3f4,c1e3436737c77899,21,3.073,9.443329e+00,1.122654
2,e659f3f4,c1e3436737c77899,10,123.900,1.535121e+04,4.819475
3,e659f3f4,c1e3436737c77899,20,7.511,5.641512e+01,2.016369
4,e659f3f4,c1e3436737c77899,19,8.631,7.449416e+01,2.155360
...,...,...,...,...,...,...
52558,ff90a52f,90e8ae169cbba3bd,1,2436.000,5.934096e+06,7.798113
52559,f90fcde2,8b6a8d2720d16e97,7,2075.500,4.307700e+06,7.637957
52560,f90fcde2,8b6a8d2720d16e97,8,865.900,7.497828e+05,6.763769
52561,f90fcde2,8b6a8d2720d16e97,12,665.000,4.422250e+05,6.499787


## 7. Groupby Operations

### Let's compute the average expenses accross counties
#### For this we need to perform a groupby operation. We groupby on `County` and compute the mean of `TotalExpenses`


In [20]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [21]:
HealthCareDataSet.groupby(['County'])['TotalExpenses'].mean()

County
02af982d    2961.022264
217dc01f     531.107559
33b7d74d    1733.242000
39825de7    1863.681612
425a37b2    2523.756193
5597ffc0    1552.266828
6f0b5b6c    2507.675586
6f943458    3209.269318
7a56b047     488.600000
7d9b432e    2756.528070
89e38653    2349.649500
adb3fb00    2823.448197
b021dd12    2249.052194
b60a20c4    1666.055741
bd284e56     964.693333
e6708950    2626.775110
ea48569b    1183.120767
fc471384    2790.097970
fd218584    3068.508123
Name: TotalExpenses, dtype: float64

### Let's compute the average expenses accross types of medical events
#### For this we need to perform a groupby operation. We groupby on `TypeFlag` and compute the mean of `TotalExpenses`

In [22]:
HealthCareDataSet.groupby(['TypeFlag'])['TotalExpenses'].mean()

TypeFlag
ER      923.916490
INP    4254.617614
Name: TotalExpenses, dtype: float64

### Let's compute the average expenses accross counties AND types of medical events
#### For this we need to perform a groupby operation. We groupby on `County` AND `TypeFlag` and compute the mean of `TotalExpenses`

In [23]:
HealthCareDataSet.groupby(['County','TypeFlag'])['TotalExpenses'].mean()

County    TypeFlag
02af982d  ER          1095.624231
          INP         4182.315427
217dc01f  ER           269.906778
          INP         1370.681500
33b7d74d  ER          1733.242000
39825de7  ER           964.264615
          INP         2969.721432
425a37b2  ER           734.935974
          INP         4716.259394
5597ffc0  ER           396.251544
          INP         5569.128020
6f0b5b6c  ER           787.870467
          INP         3554.513484
6f943458  ER           754.605412
          INP         4832.479233
7a56b047  ER           488.600000
7d9b432e  ER          1024.500420
          INP         3998.761016
89e38653  ER           950.906526
          INP         4032.230959
adb3fb00  ER          1088.552277
          INP         3847.806539
b021dd12  ER           754.865066
          INP         3589.799426
b60a20c4  ER           390.176182
          INP         2543.222938
bd284e56  ER           964.693333
e6708950  ER           714.899357
          INP         4223.26

### Let's compute the average expenses accross types of hospitals
#### For this we need to perform a groupby operation. We groupby on `HospitalType` and compute the mean of `TotalExpenses`

In [24]:
HealthCareDataSet.groupby(['HospitalType'])['TotalExpenses'].mean()

HospitalType
HOSPITAL                    2732.788708
HOSPITAL LONG TERM ACUTE    5411.281424
REHABILITATION CENTER       1248.298100
Name: TotalExpenses, dtype: float64

## 8. Filtering Operations

### Let's select Emergency Room Services related events
#### For this we need to perform a filtering operation. We filter on the columm `TypeFlag`.

In [25]:
HealthCareDataSet.head(3)

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.9


In [26]:
## We define the filter

HealthCareDataSet['TypeFlag']=='ER'

0         True
1         True
2         True
3         True
4         True
         ...  
52558    False
52559    False
52560    False
52561    False
52562    False
Name: TypeFlag, Length: 52563, dtype: bool

In [27]:
## We save the filter

ERFilter=HealthCareDataSet['TypeFlag']=='ER'

In [28]:
## We apply the filter

HealthCareDataSet[ERFilter]

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
0,634363,e659f3f4,6a380a28,6f943458,c1e3436737c77899,18,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,15.148
1,634364,e659f3f4,6a380a28,6f943458,c1e3436737c77899,21,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,3.073
2,634387,e659f3f4,6a380a28,6f943458,c1e3436737c77899,10,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1967-05-13,123.900
3,634388,e659f3f4,6a380a28,6f943458,c1e3436737c77899,20,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,7.511
4,634389,e659f3f4,6a380a28,6f943458,c1e3436737c77899,19,04b77561,HOSPITAL,2020-01-08,2020-01-08,Epigastric pain,R10.13,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1967-05-13,8.631
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52486,741526,d04bce31,f743344d,02af982d,f9f6df65b12ceb9b,30,446442f4,HOSPITAL,2020-12-16,2020-12-17,Gastritis unspecified wit,K29.70,ANESTHESIA,370.0,ER,1944-11-29,4411.925
52487,741527,d04bce31,f743344d,02af982d,f9f6df65b12ceb9b,11,446442f4,HOSPITAL,2020-12-16,2020-12-17,Gastritis unspecified wit,K29.70,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1944-11-29,747.117
52547,741683,c8cec3a7,2abd284d,ea48569b,639b490459f88576,1,761ae146,HOSPITAL,2020-12-28,2020-12-28,Constipation unspecified,K59.00,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1941-07-07,64.295
52548,741684,c8cec3a7,2abd284d,ea48569b,639b490459f88576,3,761ae146,HOSPITAL,2020-12-28,2020-12-28,Constipation unspecified,K59.00,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1941-07-07,29.225


### Let's select medical events in which the patient's birthdate was higher than 1975.

#### For this we need to perform a filtering operation. We filter on the columm `BirthDate`.

In [29]:
## We define the filter
HealthCareDataSet['BirthDate']>'1975-01-01'

0        False
1        False
2        False
3        False
4        False
         ...  
52558    False
52559    False
52560    False
52561    False
52562    False
Name: BirthDate, Length: 52563, dtype: bool

In [30]:
## We save the filter
BirthDateFilter=HealthCareDataSet['BirthDate']>'1975-01-01'

In [31]:
## We apply the filter
HealthCareDataSet[BirthDateFilter]

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
69,634487,89b6abdc,222a11a1,b021dd12,e3678501da0b8833,1,46658edc,HOSPITAL,2020-01-03,2020-01-03,Other specified disorders,K08.89,EMERGENCY ROOM,450.0,ER,1986-02-24,1498.700
74,634522,89b6abdc,222a11a1,b021dd12,e3678501da0b8833,2,46658edc,HOSPITAL,2020-01-03,2020-01-03,Other specified disorders,K08.89,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1986-02-24,35.700
1455,636578,85e2dc72,c7be7c19,b021dd12,3e347cb83a1d0024,17,35da0521,HOSPITAL,2020-01-05,2020-01-05,Other muscle spasm,M62.838,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1990-08-22,104.713
1456,636579,85e2dc72,c7be7c19,b021dd12,3e347cb83a1d0024,7,35da0521,HOSPITAL,2020-01-05,2020-01-05,Other muscle spasm,M62.838,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1990-08-22,557.746
1457,636580,85e2dc72,c7be7c19,b021dd12,b1f4489a9b0f7345,5,4a2b1885,HOSPITAL,2020-01-11,2020-01-12,Major depressive disorder,F33.3,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1990-08-22,14.700
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51694,739467,91787cad,58e5143c,89e38653,c9718013c720d0c6,9,4b40cef6,HOSPITAL,2020-12-18,2020-12-18,Pelvic and perineal pain,R10.2,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,ER,1986-05-27,108.080
51695,739468,91787cad,58e5143c,89e38653,c9718013c720d0c6,10,4b40cef6,HOSPITAL,2020-12-18,2020-12-18,Pelvic and perineal pain,R10.2,LABORATORY - CLINICAL DIAGNOSTIC: UROLOGY,307.0,ER,1986-05-27,122.129
51696,739469,91787cad,58e5143c,89e38653,c9718013c720d0c6,4,4b40cef6,HOSPITAL,2020-12-18,2020-12-18,Pelvic and perineal pain,R10.2,IV THERAPY,260.0,ER,1986-05-27,386.218
51697,739470,91787cad,58e5143c,89e38653,c9718013c720d0c6,1,4b40cef6,HOSPITAL,2020-12-18,2020-12-18,Pelvic and perineal pain,R10.2,PHARMACY,250.0,ER,1986-05-27,31.297


### Let's select medical events in which the principal diagnosis is A09

#### For this we need to perform a filtering operation. We filter on the columm `PrincipalDiagnosis`, selecting values equal to A09

In [32]:
## we define the filter
HealthCareDataSet['PrincipalDiagnosis']=='A09'

0        False
1        False
2        False
3        False
4        False
         ...  
52558    False
52559    False
52560    False
52561    False
52562    False
Name: PrincipalDiagnosis, Length: 52563, dtype: bool

In [33]:
## we save the filter
PrincipalDiagnosisFilter=HealthCareDataSet['PrincipalDiagnosis']=='A09'

In [34]:
## We apply the filter
HealthCareDataSet[PrincipalDiagnosisFilter]


Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
8703,650503,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,17,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,311.185
8704,650504,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,4,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1947-03-14,1840.678
8705,650507,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,10,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,EMERGENCY ROOM,450.0,ER,1947-03-14,1734.432
8706,650513,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,15,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,5.502
8711,650518,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,1,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,IV THERAPY,260.0,ER,1947-03-14,1210.363
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49735,735079,b47817ca,11741bbb,7d9b432e,d0cfd1a6b49897be,4,6fca9c55,HOSPITAL,2020-12-08,2020-12-11,Infectious gastroenteriti,A09,IV THERAPY,260.0,INP,1951-11-18,765.800
49736,735080,b47817ca,11741bbb,7d9b432e,d0cfd1a6b49897be,6,6fca9c55,HOSPITAL,2020-12-08,2020-12-11,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC,300.0,INP,1951-11-18,253.225
49737,735081,b47817ca,11741bbb,7d9b432e,d0cfd1a6b49897be,7,6fca9c55,HOSPITAL,2020-12-08,2020-12-11,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,INP,1951-11-18,7495.950
49738,735082,b47817ca,11741bbb,7d9b432e,d0cfd1a6b49897be,8,6fca9c55,HOSPITAL,2020-12-08,2020-12-11,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC: HEMATOLOGY,305.0,INP,1951-11-18,443.100


### Let's select medical events in which the principal diagnosis is A09 and took place in emergency room services

#### For this we need to perform a filtering operation. We: (1) filter on the columm `PrincipalDiagnosis`, selecting values equal to A09 and (2) filter on the column `TypeFlag`

In [35]:
## we define the filter
(HealthCareDataSet['PrincipalDiagnosis']=='A09') & (HealthCareDataSet['TypeFlag']=='ER')

0        False
1        False
2        False
3        False
4        False
         ...  
52558    False
52559    False
52560    False
52561    False
52562    False
Length: 52563, dtype: bool

In [36]:
## we save the filter
ComplexFilter=(HealthCareDataSet['PrincipalDiagnosis']=='A09') & (HealthCareDataSet['TypeFlag']=='ER')

In [37]:
## we apply the filter
HealthCareDataSet[ComplexFilter]

Unnamed: 0,Id,MemberName,MemberID,County,MedicalClaim,ClaimItem,HospitalName,HospitalType,StartDate,EndDate,PrincipalDiagnosisDesc,PrincipalDiagnosis,RevenueCodeDesc,RevenueCode,TypeFlag,BirthDate,TotalExpenses
8703,650503,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,17,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,311.185
8704,650504,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,4,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC: CHEMISTRY,301.0,ER,1947-03-14,1840.678
8705,650507,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,10,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,EMERGENCY ROOM,450.0,ER,1947-03-14,1734.432
8706,650513,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,15,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,5.502
8711,650518,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,1,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,IV THERAPY,260.0,ER,1947-03-14,1210.363
8712,650519,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,14,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,31.178
8716,650539,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,9,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,CT SCAN,350.0,ER,1947-03-14,13120.856
8724,650596,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,8,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,LABORATORY - CLINICAL DIAGNOSTIC: UROLOGY,307.0,ER,1947-03-14,880.334
8725,650597,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,13,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,15.589
8729,650605,1cdff80e,8ad0f672,02af982d,79c73cbc429daf6a,16,ae46acbf,HOSPITAL,2020-02-07,2020-02-07,Infectious gastroenteriti,A09,DRUGS REQUIRE SPECIFIC ID: DRUGS REQUIRING DET...,636.0,ER,1947-03-14,67.144


## 9. Challenge Yourself !!

## Which Revenue Codes have the largest Average Total Expenses ?

## Which Revenue Codes have the largest Maximum Total Expenses ?

## Which Medical events lasted longer than 1 week ?
#### Tip: create a new column containing medical events' duration. Then apply a filter over that column