# Brazil Medical: No Show Appointments Analysis

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

### Dataset Description 
This dataset collects information from 100k medical appointments in Brazil and is focused on the question of whether or not patients show up for their appointment. A number of characteristics about the patient are included in each row.

Original dataset provided by Joni Hoppen and Aquarela Analytics via [Kaggle](https://www.kaggle.com/datasets/joniarroba/noshowappointments).
- ‘ScheduledDay’ tells us on what day the patient set up their appointment.
- ‘Neighborhood’ indicates the location of the hospital.
- ‘Scholarship’ indicates whether or not the patient is enrolled in Brasilian welfare program Bolsa Família.

### Question(s) for Analysis

The primary question for analysis is "What groups of individuals are least likely to show up for their appointments?" 
- Do demographics impact likelithood?
    - Does affliction?
    - Facility process?
- Are no-shows becoming more frequent?
- Which areas have the most no-shows?


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<a id='wrangling'></a>
## Data Wrangling

### General Properties


In [68]:
df = pd.read_csv('data/kaggleV2-may-2016.csv',
                parse_dates=['ScheduledDay', 'AppointmentDay'])
print(df.info())
print('-----\n')
print(df.head())
print('-----\n')
print(df['Handcap'].value_counts())
print("Duplicate Values: {}".format(sum(df.duplicated())))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110527 entries, 0 to 110526
Data columns (total 14 columns):
 #   Column          Non-Null Count   Dtype              
---  ------          --------------   -----              
 0   PatientId       110527 non-null  float64            
 1   AppointmentID   110527 non-null  int64              
 2   Gender          110527 non-null  object             
 3   ScheduledDay    110527 non-null  datetime64[ns, UTC]
 4   AppointmentDay  110527 non-null  datetime64[ns, UTC]
 5   Age             110527 non-null  int64              
 6   Neighbourhood   110527 non-null  object             
 7   Scholarship     110527 non-null  int64              
 8   Hipertension    110527 non-null  int64              
 9   Diabetes        110527 non-null  int64              
 10  Alcoholism      110527 non-null  int64              
 11  Handcap         110527 non-null  int64              
 12  SMS_received    110527 non-null  int64              
 13  No-show       

### Data Cleaning

In [69]:
#rename columns
df.rename(columns=lambda s: s.lower().replace('day', '_day')
                                        .replace('id', '_id')
                                        .replace('-', '_')
                                        .replace('handcap', 'handicap'), inplace=True)


df.patient_id = df.patient_id.astype('string')
df.appointment_id = df.appointment_id.astype('string')
df.gender = df.gender.astype('category')
df.age = df.age.astype('int16')
df.neighbourhood = df.neighborhood.astype('category')
df.scholarship = df.scholarship.astype('bool')
df.hipertension = df.hipertension.astype('bool')
df.diabetes = df.diabetes.astype('bool')
df.alcoholism = df.alcoholism.astype('bool')
df.sms_received = df.sms_received.astype('bool')
df.handicap = df.handicap > 0
df.no_show = df.no_show = 'Yes'  

print(df.info())

AttributeError: 'DataFrame' object has no attribute 'neighborhood'

<a id='eda'></a>
## Exploratory Data Analysis

### Research Question 1: What groups of individuals are least likely to show up for their appointments?

### Research Question 2: Are no-shows becoming more frequent?

<a id='conclusions'></a>
## Conclusions