# DIABETES
This analysis is performed on data from US hospital encounters, related to diabetes. It was recorded in period from 1999-2008.


The aim of this work is to analyse the number of times patients are entered into the system. The analysis will be done with Python and its libraries. 

In [1]:
import pandas as pd
import numpy as np

In [2]:
#import a dataset and mapping file
data = pd.read_csv('path_to_diabetes_dataset\diabetic_data.csv', index_col=0) 
mapping = pd.read_csv('path_to_mapping_dataset\IDs_mapping.csv')

In [3]:
#slice mapping file in tables
admission_type=mapping.iloc[0:8]
discharge_disposition = mapping.iloc[9:40]
admission_source=mapping.iloc[41:]

In [4]:
#clean indexes of mapping files
discharge_disposition.reset_index(drop=True, inplace= True)
admission_source.reset_index(drop=True, inplace= True)

discharge_disposition=discharge_disposition.rename(columns=discharge_disposition.iloc[0]).loc[1:]
admission_source=admission_source.rename(columns=admission_source.iloc[0]).loc[1:]

discharge_disposition.reset_index(drop=True, inplace= True)
admission_source.reset_index(drop=True, inplace= True)

In [5]:
print(data.columns)
print(data.shape)

Index(['patient_nbr', 'race', 'gender', 'age', 'weight', 'admission_type_id',
       'discharge_disposition_id', 'admission_source_id', 'time_in_hospital',
       'payer_code', 'medical_specialty', 'num_lab_procedures',
       'num_procedures', 'num_medications', 'number_outpatient',
       'number_emergency', 'number_inpatient', 'diag_1', 'diag_2', 'diag_3',
       'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'metformin',
       'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
       'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide',
       'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone',
       'tolazamide', 'examide', 'citoglipton', 'insulin',
       'glyburide-metformin', 'glipizide-metformin',
       'glimepiride-pioglitazone', 'metformin-rosiglitazone',
       'metformin-pioglitazone', 'change', 'diabetesMed', 'readmitted'],
      dtype='object')
(101766, 49)


In [6]:
#make subset by deleteing redundant columns
columns=['race', 'gender', 'age', 'weight','payer_code', 'medical_specialty',
       'num_lab_procedures', 'num_procedures', 'num_medications',
       'number_outpatient', 'number_emergency', 'number_inpatient', 'diag_1',
       'diag_2', 'diag_3', 'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'change', 'diabetesMed', 'metformin',
       'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
       'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide',
       'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone',
       'tolazamide', 'examide', 'citoglipton', 'insulin',
       'glyburide-metformin', 'glipizide-metformin',
       'glimepiride-pioglitazone', 'metformin-rosiglitazone',
       'metformin-pioglitazone', 'change', 'diabetesMed']
subset = data.drop(columns, axis=1)
subset.head(5)

Unnamed: 0_level_0,patient_nbr,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,readmitted
encounter_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2278392,8222157,6,25,1,1,NO
149190,55629189,1,1,7,3,>30
64410,86047875,1,1,7,2,NO
500364,82442376,1,1,7,2,NO
16680,42519267,1,1,7,1,NO


How many unique patients are in our data, and for how many of them have we recordeed multiple encounters?

In [9]:
#make a data frame of unique patint numbers and the number of their encounters
patients=pd.DataFrame(data=subset['patient_nbr'].value_counts())
patients.rename(columns={'patient_nbr': 'visit_nbr'}, inplace=True)

In [10]:
print('Number of patients:', len(patients),
      '\nMax encounters:', max(patients['visit_nbr']), 
      '\nMin encounters:', min(patients['visit_nbr']),
      '\nNumber of patients with single encounter:', len(patients[patients['visit_nbr']==1]),
      '\nNumber of patients with multiple encounters:', len(patients[patients['visit_nbr']>1]),
      '\nNumber of patients with more than 10 encounters:', len(patients[patients['visit_nbr']>10]),
      '\nNumber of patients with more than 20 encounters:', len(patients[patients['visit_nbr']>20]),
      '\nNumber of patients with more than 30 encounters:', len(patients[patients['visit_nbr']>30])
      )

Number of patients: 71518 
Max encounters: 40 
Min encounters: 1 
Number of patients with single encounter: 54745 
Number of patients with multiple encounters: 16773 
Number of patients with more than 10 encounters: 97 
Number of patients with more than 20 encounters: 8 
Number of patients with more than 30 encounters: 1


Data from authors: Clore,John, Cios,Krzysztof, DeShazo,Jon & Strack,Beata. (2014). Diabetes 130-US hospitals for years 1999-2008. UCI Machine Learning Repository. 
Data available at: 
https://archive-beta.ics.uci.edu/dataset/296/diabetes+130+us+hospitals+for+years+1999+2008 