# Part 5: Exploratory Analysis


| Code Type | Code Value | Description | Associated CMS Metric | Metric Details
| --- | --- | --- | --- | --- |
| CPT | 36556 | Under Insertion of Central Venous Access Device | HAI_1 | Central Line Associated Bloodstream Infection
| CPT | 51701 | Under Introduction Procedures on the Bladder | HAI_2 | Catheter Associated Urinary Tract Infections
| CPT | 51702 | Under Introduction Procedures on the Bladder | HAI_2 | Catheter Associated Urinary Tract Infections
| HCPCS | A4314 | Insertion tray with drainage bag with indwelling catheter, Foley type, 2-way latex with coating (Teflon, silicone, silicone elastomer or hydrophilic, etc.) | HAI_2 | Catheter Associated Urinary Tract Infections
| HCPCS | A4315 | Insertion tray with drainage bag with indwelling catheter, Foley type, 2-way, all silicone | HAI_2 | Catheter Associated Urinary Tract Infections
| HCPCS | G9312 | Surgical site infection | HAI_3 | Surgical Site Infection - Colon Surgery
| CPT | 58150 | Under Hysterectomy Procedures | HAI_4 | Surgical Site Infection - Abdominal Hysterectomy
| CPT | 15920 | Under Pressure Ulcers (Decubitus Ulcers) Procedures | PSI-3 | Pressure Ulcer Rate
| CPT | 35800 |  Under Repair, Excision, Exploration, Revision Procedures on Arteries and Veins | PSI-9 | Postoperative hemorrhage or hematoma rate
| HCPCS | J1650 |  Injection, enoxaparin sodium, 10 mg | PSI-12 | Perioperative pulmonary embolism or deep vein thrombosis rate

In [1]:
import pandas as pd
import numpy as np
import os
from dotenv import load_dotenv
from datetime import date, datetime, timedelta
import glob
from ast import literal_eval
from collections import Counter
from tqdm.auto import tqdm
import pyarrow as pa

In [19]:
drg = pd.read_parquet("D://Vignesh/Capstone/combined/drg_parquet/drg.parquet", engine='pyarrow')
cms = pd.read_csv('Hospital_Metrics/Medicare_Inpatient_Hospital_by_Provider_and_Service_2020.csv',encoding='windows-1252')

In [20]:
drg.head()

Unnamed: 0,billing_type,billing_code,negotiated_rates,ccn
0,MS-DRG,1,135000.0,50327
1,MS-DRG,2,135000.0,50327
2,MS-DRG,3,9000.0,50327
3,MS-DRG,4,9000.0,50327
4,MS-DRG,5,135095.0,50327


In [21]:
drg.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9777 entries, 0 to 9776
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   billing_type      9777 non-null   object 
 1   billing_code      9777 non-null   object 
 2   negotiated_rates  9777 non-null   float64
 3   ccn               9777 non-null   object 
dtypes: float64(1), object(3)
memory usage: 305.7+ KB


In [22]:
cms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165281 entries, 0 to 165280
Data columns (total 15 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   Rndrng_Prvdr_CCN           165281 non-null  int64  
 1   Rndrng_Prvdr_Org_Name      165281 non-null  object 
 2   Rndrng_Prvdr_St            165281 non-null  object 
 3   Rndrng_Prvdr_City          165281 non-null  object 
 4   Rndrng_Prvdr_State_Abrvtn  165281 non-null  object 
 5   Rndrng_Prvdr_State_FIPS    165281 non-null  int64  
 6   Rndrng_Prvdr_Zip5          165281 non-null  int64  
 7   Rndrng_Prvdr_RUCA          165281 non-null  float64
 8   Rndrng_Prvdr_RUCA_Desc     165281 non-null  object 
 9   DRG_Cd                     165281 non-null  int64  
 10  DRG_Desc                   165281 non-null  object 
 11  Tot_Dschrgs                165281 non-null  int64  
 12  Avg_Submtd_Cvrd_Chrg       165281 non-null  float64
 13  Avg_Tot_Pymt_Amt           16

In [23]:
drg = drg.astype({'billing_code': 'str','ccn':'int64'})
drg['billing_code'] = drg.apply(lambda row: row['billing_code'] if row['billing_code'].isdigit() else None, axis=1)

drg.head()

Unnamed: 0,billing_type,billing_code,negotiated_rates,ccn
0,MS-DRG,1,135000.0,50327
1,MS-DRG,2,135000.0,50327
2,MS-DRG,3,9000.0,50327
3,MS-DRG,4,9000.0,50327
4,MS-DRG,5,135095.0,50327


In [25]:
drg.dropna(subset=['billing_code'],inplace=True)
drg = drg.astype({'billing_code': 'int64'})

In [26]:
merged = drg.merge(cms,how='inner',left_on=['ccn','billing_code'],right_on=['Rndrng_Prvdr_CCN','DRG_Cd'])

In [27]:
merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 799 entries, 0 to 798
Data columns (total 19 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   billing_type               799 non-null    object 
 1   billing_code               799 non-null    int64  
 2   negotiated_rates           799 non-null    float64
 3   ccn                        799 non-null    int64  
 4   Rndrng_Prvdr_CCN           799 non-null    int64  
 5   Rndrng_Prvdr_Org_Name      799 non-null    object 
 6   Rndrng_Prvdr_St            799 non-null    object 
 7   Rndrng_Prvdr_City          799 non-null    object 
 8   Rndrng_Prvdr_State_Abrvtn  799 non-null    object 
 9   Rndrng_Prvdr_State_FIPS    799 non-null    int64  
 10  Rndrng_Prvdr_Zip5          799 non-null    int64  
 11  Rndrng_Prvdr_RUCA          799 non-null    float64
 12  Rndrng_Prvdr_RUCA_Desc     799 non-null    object 
 13  DRG_Cd                     799 non-null    int64  