# What's a custom pivot?

In this example, I know what columns I want to have in the pivot, but I don't know if they exist, or if extra columns might exist that I want to simply ignore.

Back to our complications data...

Suppose that we want a column for the following measure IDs:
* PSI_90_SAFETY
* PSI_13_POST_SEPSIS
* PSI_08_HIP_FRAC


In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('https://hds5210-data.s3.amazonaws.com/complications.csv')

In [None]:
data.columns

Index(['provider_id', 'hospital_name', 'address', 'city', 'state', 'zip_code',
       'county_name', 'phone_number', 'measure_id', 'measure_name',
       'compared_to_national', 'denominator', 'score', 'lower_estimate',
       'higher_estimate', 'footnote', 'measure_start_date',
       'measure_end_date'],
      dtype='object')

In [None]:
data.head()

Unnamed: 0,provider_id,hospital_name,address,city,state,zip_code,county_name,phone_number,measure_id,measure_name,compared_to_national,denominator,score,lower_estimate,higher_estimate,footnote,measure_start_date,measure_end_date
0,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,COMP_HIP_KNEE,Rate of complications for hip/knee replacement...,No Different Than the National Rate,292,3.2,2.1,4.8,,2015-04-01T00:00:00.000,2018-03-31T00:00:00.000
1,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_AMI,Death rate for heart attack patients,No Different Than the National Rate,688,13.0,11.0,15.5,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
2,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_CABG,Death rate for CABG surgery patients,No Different Than the National Rate,291,4.3,2.6,6.8,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
3,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_COPD,Death rate for COPD patients,No Different Than the National Rate,411,8.8,6.7,11.4,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
4,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_HF,Death rate for heart failure patients,No Different Than the National Rate,869,12.7,10.7,15.0,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000


In [None]:
data['measure_id'].value_counts()

COMP_HIP_KNEE                53
PSI_10_POST_KIDNEY           53
PSI_14_POSTOP_DEHIS          53
PSI_13_POST_SEPSIS           53
MORT_30_AMI                  53
PSI_11_POST_RESP             53
PSI_12_POSTOP_PULMEMB_DVT    53
MORT_30_STK                  53
MORT_30_PN                   53
MORT_30_HF                   53
MORT_30_COPD                 53
MORT_30_CABG                 53
PSI_15_ACC_LAC               52
PSI_3_ULCER                  52
PSI_4_SURG_COMP              52
PSI_6_IAT_PTX                52
PSI_8_POST_HIP               52
PSI_90_SAFETY                52
PSI_9_POST_HEM               52
Name: measure_id, dtype: int64

**Strategy**

Retreive each subset of rows that we want and put them into separate data frames.

Then merge those data frames together using the key values we want as our new rows.

In [None]:
data.columns

Index(['provider_id', 'hospital_name', 'address', 'city', 'state', 'zip_code',
       'county_name', 'phone_number', 'measure_id', 'measure_name',
       'compared_to_national', 'denominator', 'score', 'lower_estimate',
       'higher_estimate', 'footnote', 'measure_start_date',
       'measure_end_date'],
      dtype='object')

In [None]:
psi_03 = data[data['measure_id'] == 'PSI_3_ULCER'][['provider_id','denominator','score']]
psi_13 = data[data['measure_id'] == 'PSI_13_POST_SEPSIS'][['provider_id','denominator']]
psi_08 = data[data['measure_id'] == 'PSI_8_POST_HIP'][['provider_id','denominator']]

In [None]:
psi_03['denominator'].value_counts()

11162    1
4942     1
6967     1
9216     1
187      1
435      1
1593     1
855      1
1578     1
690      1
856      1
7228     1
2245     1
3190     1
6070     1
787      1
1865     1
2650     1
6701     1
240      1
14487    1
261      1
465      1
810      1
4777     1
303      1
90       1
1278     1
6078     1
8055     1
1036     1
298      1
5747     1
1542     1
3942     1
27       1
4634     1
777      1
249      1
6243     1
3890     1
8078     1
19343    1
650      1
4343     1
2178     1
1622     1
26657    1
7109     1
496      1
474      1
901      1
Name: denominator, dtype: int64

In [None]:
psi_03.head()

Unnamed: 0,provider_id,denominator,score
13,10001,11162,0.32
32,10005,4942,0.73
51,10006,8055,0.4
70,10007,1036,0.35
89,10008,298,0.47


In [None]:
psi_13.head()

Unnamed: 0,provider_id,denominator
10,10001,1607
29,10005,641
48,10006,551
67,10007,41
86,10008,Not Available


In [None]:
psi_08.head()

Unnamed: 0,provider_id,denominator
16,10001,11971
35,10005,5742
54,10006,8846
73,10007,1176
92,10008,334


In [None]:
psis = psi_03.\
       merge(psi_13, how='outer', on=['provider_id']).\
       merge(psi_08, how='outer', on=['provider_id'])


In [None]:
psis.head()

Unnamed: 0,provider_id,denominator_x,score,denominator_y,denominator
0,10001,11162,0.32,1607,11971
1,10005,4942,0.73,641,5742
2,10006,8055,0.4,551,8846
3,10007,1036,0.35,41,1176
4,10008,298,0.47,Not Available,334


In [None]:
psis.columns = ['provider_id','PSI_03 Den','PSI_03 Score','PSI_13','PSI_08']

In [None]:
psis

Unnamed: 0,provider_id,PSI_03 Den,PSI_03 Score,PSI_13,PSI_08
0,10001,11162.0,0.32,1607,11971.0
1,10005,4942.0,0.73,641,5742.0
2,10006,8055.0,0.4,551,8846.0
3,10007,1036.0,0.35,41,1176.0
4,10008,298.0,0.47,Not Available,334.0
5,10011,5747.0,0.22,793,5416.0
6,10012,1542.0,0.28,87,1874.0
7,10016,3942.0,0.44,761,4690.0
8,10018,27.0,0.51,34,58.0
9,10019,4634.0,0.12,Not Available,4831.0
