# Exploratory Data Analysis
---

In [17]:
import pandas as pd
import seaborn as sns

## Importing all .csv files as dataframes

In [16]:
acp = pd.read_csv('raw_csv/all_contracted_providers.csv')
blb = pd.read_csv('raw_csv/bottom_line_budget.csv')
bs = pd.read_csv('raw_csv/budgeted_services.csv')
cpe = pd.read_csv('raw_csv/contracted_providers_expanded.csv')
re = pd.read_csv('raw_csv/reported_expenditures.csv')
rsu = pd.read_csv('raw_csv/reported_service_units.csv')
sccd = pd.read_csv('raw_csv/senior_center_client_data_fy2020.csv')
scpd = pd.read_csv('raw_csv/senior_center_provider_data_fy2020.csv')
sadcs = pd.read_csv('raw_csv/social_adult_day_care_services.csv')

---

## Exploring Reported Expenditures
`raw_csv/reported_expenditures.csv`

In [18]:
re.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72720 entries, 0 to 72719
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ProviderType    72720 non-null  object 
 1   DFTA ID         72720 non-null  object 
 2   ContractYear    72720 non-null  int64  
 3   SponsorName     72720 non-null  object 
 4   ProgramName     72720 non-null  object 
 5   LineItem        72720 non-null  int64  
 6   LineItemName    72720 non-null  object 
 7   ReportedMonth   72720 non-null  object 
 8   ReportedAmount  72720 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 5.0+ MB


In [25]:
# Get the unique values in each column, and count the number of rows for each unique value
for column in re.columns:
    print(f'{re.value_counts(column)}\n')

ProviderType
SENIOR CENTER CONTRACTS                              51120
NATURALLY OCCURING RETIREMENT COMMUNITY CONTRACTS     6435
CASE MANAGEMENT SERVICES CONTRACTS                    3975
HOME DELIVERED MEAL SERVICE CONTRACTS                 3960
CAREGIVER SERVICES CONTRACTS                          1815
TRANSPORTATION SERVICES CONTRACTS                     1470
HOMECARE SERVICES CONTRACTS                            900
LEGAL SERVICES CONTRACTS                               900
GERIATRIC MENTAL HEALTH SERVICES CONTRACTS             735
NEW YORK CONNECTS CONTRACTS                            690
CITY MEALS ADMINISTRATIVE SERVICES CONTRACTS           360
ELDER ABUSE SERVICES CONTRACTS                         360
dtype: int64

DFTA ID
45A01    270
33V01    210
31701    210
34Y01    210
54501    210
        ... 
15G01     90
64601     90
N4402     45
N3E02     15
N3E03     15
Length: 403, dtype: int64

ContractYear
2019    72720
dtype: int64

SponsorName
JEWISH ASSOCIATION FOR SERVICES FO

#### What's in this dataset?
* Data is from one contract year, fiscal year 2019 (July 2018 - June 2019)
* 72,720 rows, 9 columns
* Each row is an expenditure (i.e. reported line item) associated with a specific NYC Aging program for a given month
* There are 403 different programs (DFTA ID), sponsored by 128 partner organizations
    * These programs fall into 12 different contract types (i.e. types of services)
        * Senior Centers, Naturally-Occurring Retirement Communities, Case Management Agencies, Home-Delivered Meal Programs, Caregiver Services, Transportation Services, Home-Care Services, Legal Services, Mental Health Services, New York Connects Services, City Meals Administrative Services, Elder Abuse Programs
* Expenditures fall into 15 categories
    * Catered Food/Disposables, Communications, Consultants, Equipment Rental, Other Expenses, Other Occupancy, Personnel, Printing/Supplies, Program Insurance, Raw Food/Disposables, Rent, Rent Usage Charges, Travel, Utilities, Vehicles  

#### What are my grouping/categorical columns:
* `ProviderType`, `LineItemName`, `ReportedMonth`

#### What are my numerical columns:
* `ReportedAmount`