# <p style="color:red;">  Opioids and Drug-Related Deaths  
    
## DSC 540 | Advanced Machine Learning 
## Exploratory Data Analysis

### <p style="color:gray;"> Libraries

In [17]:
!pip install folium
!pip install geopandas
!pip install plotly



In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import json
import geopandas as gpd
import plotly.express as px

In [19]:
import os
os.getcwd()

'/home/jordan/rot/ent/gitOp'

## <p style="color:purple;"> Connecticut Medicare Part_D Opioid Prescriber Summary File 2014 Dataset

In [20]:
df = pd.read_csv('/data/Connecticut_Medicare_Part_D_Opioid_Prescriber_Summary_File_2014.csv')

FileNotFoundError: [Errno 2] No such file or directory: '/data/Connecticut_Medicare_Part_D_Opioid_Prescriber_Summary_File_2014.csv'

In [None]:
df.head()

In [None]:
#Adding leading zero to Zip Codes
df['NPPES Provider Zip Code'] = df['NPPES Provider Zip Code'].astype(str)
df['NPPES Provider Zip Code'] = df['NPPES Provider Zip Code'].str.zfill(5)
df['NPPES Provider Zip Code'].head()

- NPPES (National Plan and Provider Enumeration System)  
  
    - The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers for health care providers and health plans.  
      
    - The purpose of these provisions is to improve the efficiency and effectiveness of the electronic transmission of health information. The Centers for Medicare & Medicaid Services (CMS) has developed the National Plan and Provider Enumeration System (NPPES) to assign these unique identifiers.

In [None]:
df.shape

In [None]:
df.columns

In [None]:
#Chnaging column name for ZIPCode
df.columns = ['NPI', 'NPPES Provider Last/Org Name', 'NPPES Provider First Name',
       'ZIPCode', 'NPPES Provider State',
       'Specialty Description', 'Total Claim Count', 'Opioid Claim Count',
       'Opioid Prescribing Rate']

In [None]:
#Cleaning the Percentage Column
def p2f(x):
    return float(x.strip('%'))/100

rate = list(df['Opioid Prescribing Rate'])

new_rates = []
for n in rate:
    if isinstance(n,str):
        n_new = p2f(n)
        new_rates.append(n_new)
    else:
        new_rates.append(n)
new_rates

df['Opioid Prescribing Rate'] = new_rates
df.head()

### Column Descriptions (Self-Made)

- **NPI**: NPI (National Provider Identifier) Number is a 10-digit numerical identifier used to identify an individual provider or a health care entity. An NPI number is shared with other providers, employers, health plans, and payers. (Unique Identifer)
- **NPPES Provider Last/Org Name**: Last Name or Organization Name of the NPPES Provider
- **NPPES Provider First Name**: First Name of the NPPES Provider
- **NPPES Provider Zip Code**: ZIP Code of the NPPES Provider
- **NPPES Provider State**: State of the NPPES Provider 
- **Speciality Description**: Brief description of the health providers practice (what they specialize in)
- **Total Claim Count**:  The total count the NPPES has claimed claimed 
- **Opioid Claim Count**: Subset of the claim count, count of claims that are Opioid related
- **Opioid Prescribing Rate**:  *Opioid Claim Count* / *Total Claim Count*

### <span style="color:green"> **Where in Connecticut is our data coming from?** </span>  

#### <span style="color:green"> *NPPES Provider State + ZIP Code Analysis* </span>  


In [None]:
df.info()

In [None]:
df.isna().sum()

In [None]:
df['Opioid Prescribing Rate'].value_counts()

In [None]:
df['NPPES Provider Zip Code'].value_counts()

In [None]:
ct = gpd.read_file('../Data/ct_connecticut_zip_codes_geo.min.json')
ct.head()
ct.columns = ['STATEFP10', 'ZIPCode', 'GEOID10', 'CLASSFP10', 'MTFCC10',
       'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10',
       'PARTFLG10', 'geometry']

In [None]:
map_df = ct.merge(df, on='ZIPCode')
map_df.shape

In [None]:
map_df.columns

In [None]:
ct_opioid_rate = map_df.plot(column = map_df['Opioid Prescribing Rate'],cmap = 'Reds',edgecolor = 'black',figsize=(20,12),
                        legend = True,vmin = 0,vmax = 1)

ct_opioid_rate.axis('off')
ct_opioid_rate.set_title("Opioid Prescribing Rate in Conneticut",fontsize = 20)

In [None]:
#average rate
map_df['Opioid Prescribing Rate'].mean()

In [None]:
map_df.describe()[['Total Claim Count','Opioid Claim Count','Opioid Prescribing Rate']]