# Exploratory Data Analysis
---

In [17]:
import pandas as pd
import seaborn as sns

## Importing all .csv files as dataframes

In [16]:
acp = pd.read_csv('raw_csv/all_contracted_providers.csv')
blb = pd.read_csv('raw_csv/bottom_line_budget.csv')
bs = pd.read_csv('raw_csv/budgeted_services.csv')
cpe = pd.read_csv('raw_csv/contracted_providers_expanded.csv')
re = pd.read_csv('raw_csv/reported_expenditures.csv')
rsu = pd.read_csv('raw_csv/reported_service_units.csv')
sccd = pd.read_csv('raw_csv/senior_center_client_data_fy2020.csv')
scpd = pd.read_csv('raw_csv/senior_center_provider_data_fy2020.csv')
sadcs = pd.read_csv('raw_csv/social_adult_day_care_services.csv')

---

## Exploring Reported Expenditures
`raw_csv/reported_expenditures.csv`

In [18]:
re.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72720 entries, 0 to 72719
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ProviderType    72720 non-null  object 
 1   DFTA ID         72720 non-null  object 
 2   ContractYear    72720 non-null  int64  
 3   SponsorName     72720 non-null  object 
 4   ProgramName     72720 non-null  object 
 5   LineItem        72720 non-null  int64  
 6   LineItemName    72720 non-null  object 
 7   ReportedMonth   72720 non-null  object 
 8   ReportedAmount  72720 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 5.0+ MB


In [25]:
# Get the unique values in each column, and count the number of rows for each unique value
for column in re.columns:
    print(f'{re.value_counts(column)}\n')

ProviderType
SENIOR CENTER CONTRACTS                              51120
NATURALLY OCCURING RETIREMENT COMMUNITY CONTRACTS     6435
CASE MANAGEMENT SERVICES CONTRACTS                    3975
HOME DELIVERED MEAL SERVICE CONTRACTS                 3960
CAREGIVER SERVICES CONTRACTS                          1815
TRANSPORTATION SERVICES CONTRACTS                     1470
HOMECARE SERVICES CONTRACTS                            900
LEGAL SERVICES CONTRACTS                               900
GERIATRIC MENTAL HEALTH SERVICES CONTRACTS             735
NEW YORK CONNECTS CONTRACTS                            690
CITY MEALS ADMINISTRATIVE SERVICES CONTRACTS           360
ELDER ABUSE SERVICES CONTRACTS                         360
dtype: int64

DFTA ID
45A01    270
33V01    210
31701    210
34Y01    210
54501    210
        ... 
15G01     90
64601     90
N4402     45
N3E02     15
N3E03     15
Length: 403, dtype: int64

ContractYear
2019    72720
dtype: int64

SponsorName
JEWISH ASSOCIATION FOR SERVICES FO

#### What's in this dataset?
* Data is from one contract year, fiscal year 2019 (July 2018 - June 2019)
* 72,720 rows, 9 columns
* Each row is an expenditure (i.e. reported line item) associated with a specific NYC Aging program for a given month
* There are 403 different programs (DFTA ID), sponsored by 128 partner organizations
    * These programs fall into 12 different contract types (i.e. types of services)
        * Senior Centers, Naturally-Occurring Retirement Communities, Case Management Agencies, Home-Delivered Meal Programs, Caregiver Services, Transportation Services, Home-Care Services, Legal Services, Mental Health Services, New York Connects Services, City Meals Administrative Services, Elder Abuse Programs
* Expenditures fall into 15 categories
    * Catered Food/Disposables, Communications, Consultants, Equipment Rental, Other Expenses, Other Occupancy, Personnel, Printing/Supplies, Program Insurance, Raw Food/Disposables, Rent, Rent Usage Charges, Travel, Utilities, Vehicles  

#### What are my grouping/categorical columns:
* `ProviderType`, `LineItemName`, `ReportedMonth`

#### What are my numerical columns:
* `ReportedAmount`

## Exploring Senior Center Client Data
`senior_center_client_data_fy2020.csv`

In [26]:
sccd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69156 entries, 0 to 69155
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   dftaid           69156 non-null  object
 1   provider_name    69156 non-null  object
 2   service_date     69156 non-null  object
 3   total_daily      69156 non-null  int64 
 4   breakfast_units  69156 non-null  int64 
 5   lunch_units      69156 non-null  int64 
 6   dinner_units     69156 non-null  int64 
 7   tot_meals        69156 non-null  int64 
 8   aib_tot          69156 non-null  int64 
 9   sce_tot          69156 non-null  int64 
 10  hpp_tot          69156 non-null  int64 
 11  tot_serv_pp      69156 non-null  int64 
dtypes: int64(9), object(3)
memory usage: 6.3+ MB


In [27]:
# Get the unique values in each column, and count the number of rows for each unique value
for column in sccd.columns:
    print(f'{sccd.value_counts(column)}\n')

dftaid
35D01    342
35K01    338
64901    323
350      315
45T01    307
        ... 
35B02    179
3AP04    160
23F02    118
12H03    104
41G01     10
Length: 287, dtype: int64

provider_name
Senior Center - Carter Burden Covello ISC    342
Senior Center - Lenox Hill Innovative        338
Senior Center - SAGE Innovative              323
Senior Center - Sirovich ISC                 315
Senior Center - Selfhelp Innovative          307
                                            ... 
Senior Center - A. Philip Randolph           179
Senior Center - SAGE Staten Island           160
Senior Center - Armstrong Senior Program     118
Senior Center - RAIN Bailey                  104
Senior Center - CCNS Hillcrest                10
Length: 287, dtype: int64

service_date
01/15/2020    286
07/31/2019    286
07/10/2019    286
01/29/2020    286
10/30/2019    285
             ... 
08/11/2019     11
10/20/2019     10
09/29/2019      9
12/01/2019      8
09/01/2019      7
Length: 366, dtype: int64

total

#### What's in this dataset?
* Data is from one contract year, fiscal year 2020 (July 2019 - June 2020)
* 69,156 rows, 12 columns
* Each row is a specific date of participation at a specific senior center
* There are 287 programs (dftaid) of the provider/contract/service type Senior Centers

#### What are my grouping/categorical columns:
* Needs to be derived --> Borough, ReportedMonth, SponsorName

#### What are my numerical columns:
* `total_daily` (total clients service, i.e. instances of service, for service date)
* `tot_serv_pp` (total clients, i.e. number of people, that received services for date)
* `breakfasts_units` (total breakfast served for date)
* `lunch_units` (total lunch served for date)
* `dinner_units` (total dinner served for date)
* `tot_meals` (total meals served for date)
* `aib_tot` (total clients served for AIB service, i.e. Assistance, Information and Benefits, for date)
* `sce_tot` (total clients served for SCE service, i.e. Education, Recreation, Technology, for date)
* `hpp_tot` (total clients served for HPP service, i.e. Health Promotion (Evidence-Based and Non-Evidence-Based) for date)