# Understanding the Raw Data
---

In [11]:
import pandas as pd

## Importing all .csv files as dataframes

In [2]:
acp = pd.read_csv('raw_csv/all_contracted_providers.csv')
blb = pd.read_csv('raw_csv/bottom_line_budget.csv')
bs = pd.read_csv('raw_csv/budgeted_services.csv')
cpe = pd.read_csv('raw_csv/contracted_providers_expanded.csv')
re = pd.read_csv('raw_csv/reported_expenditures.csv')
rsu = pd.read_csv('raw_csv/reported_service_units.csv')
sccd = pd.read_csv('raw_csv/senior_center_client_data_fy2020.csv')
scpd = pd.read_csv('raw_csv/senior_center_provider_data_fy2020.csv')
sadcs = pd.read_csv('raw_csv/social_adult_day_care_services.csv')

---

## Exploring Reported Expenditures
`reported_expenditures.csv`

In [3]:
# re.info()

In [4]:
# # Get the unique values in each column, and count the number of rows for each unique value
# for column in re.columns:
#     print(f'{re.value_counts(column)}\n')

#### What's in this dataset?
* Data is from one contract year, fiscal year 2019 (July 2018 - June 2019)
* 72,720 rows, 9 columns
* Each row is an expenditure (i.e. reported line item) associated with a specific NYC Aging program for a given month
* There are 403 different programs (`DFTA ID`), sponsored by 128 partner organizations
    * These programs fall into 12 different contract types (i.e. types of services)
        * Senior Centers, Naturally-Occurring Retirement Communities, Case Management Agencies, Home-Delivered Meal Programs, Caregiver Services, Transportation Services, Home-Care Services, Legal Services, Mental Health Services, New York Connects Services, City Meals Administrative Services, Elder Abuse Programs
* Expenditures fall into 15 categories
    * Catered Food/Disposables, Communications, Consultants, Equipment Rental, Other Expenses, Other Occupancy, Personnel, Printing/Supplies, Program Insurance, Raw Food/Disposables, Rent, Rent Usage Charges, Travel, Utilities, Vehicles  

#### What are my grouping/categorical columns:
* `ProviderType`, `LineItemName`, `ReportedMonth`

#### What are my numerical columns:
* `ReportedAmount`

## Exploring Senior Center Client Data
`senior_center_client_data_fy2020.csv`

In [5]:
# sccd.info()

In [6]:
# sccd.head()

In [7]:
# # Get the unique values in each column, and count the number of rows for each unique value
# for column in sccd.columns:
#     print(f'{sccd.value_counts(column)}\n')

#### What's in this dataset?
* Data is from one contract year, fiscal year 2020 (July 2019 - June 2020)
* 69,156 rows, 12 columns
* Each row is a specific date of participation at a specific senior center
* There are 287 programs/senior centers (`dftaid`)

#### What are my grouping/categorical columns:
* Needs to be derived --> Service Month

#### What are my numerical columns:
* `total_daily` (total clients service, i.e. instances of service, for service date)
* `tot_serv_pp` (total clients, i.e. number of people, that received services for date)
* `breakfasts_units` (total breakfast served for date)
* `lunch_units` (total lunch served for date)
* `dinner_units` (total dinner served for date)
* `tot_meals` (total meals served for date)
* `aib_tot` (total clients served for AIB service, i.e. Assistance, Information and Benefits, for date)
* `sce_tot` (total clients served for SCE service, i.e. Education, Recreation, Technology, for date)
* `hpp_tot` (total clients served for HPP service, i.e. Health Promotion (Evidence-Based and Non-Evidence-Based) for date)

## Exploring Senior Center Provider Data
`senior_center_provider_data_fy2020.csv`

In [8]:
# scpd.shape

In [9]:
# scpd.columns

In [10]:
# # Get the unique values in each column, and count the number of rows for each unique value
# for column in scpd.columns:
#     print(f'{scpd.value_counts(column)}\n')

#### What's in this dataset?
* Data is from one contract year, fiscal year 2020 (July 2019 - June 2020)
* 294 rows, 49 columns
* Each row is a specific program/senior center (`DFTA ID`)

#### What are my grouping/categorical columns:
* To derive: Operational Hours (Weekdays vs. Full-Week), Part-Time Employee Status (Has PTE, Has No PTE), HHS Affiliation (HHS-Affiliated, No Reported Affiliation)
* `Site Type` (program site type: Neighborhood Senior Centers, Innovative Senior Centers, Senior Center)
* `Borough/City`
* `Meal Prep1` (program prepares meal in-house: Meal Prep on-site, Catered, Both, Not Applicable)
* `Prep for Others1` (contract prepares meals for other sites: No, Yes)

#### What are my numerical columns:
* To derive: % Budget for Personnel, % Budget for Meals, % Budget for AIB, SCE, & HPP Services
* `FY 20 Budget` (contract annual budget amount)
* `FY 20 Reimbursement` (total reported expenses for year)
* `# of Full-time Staff`
* `# of Part-time Staff`
* `Months in HHS` (number of months provider gives services under HHS umbrella)
* `Total FY20 Budget` (contract annual budget amount)
* `Total FY20 Personnel Budget` (total personnel budget)
* `Total FY20 Reimbursement` (total reported expenses for year)
* `Average Daily Participants` (average number of daily client participants)
* `Kosher Raw Food/ Disposable'` (kosher meal budget)
* `Non-Kosher Raw Food/ Disposable` (non-kosher meal budget)
* `Total Raw Food/ Disposable` (total meal budget)
* `Expenditures per Meal for Food and Disposable` (average cost per meal per person)
* `Annual Expenditures for Information and Assistance, Education and Recreation, Health Promotion` (total expenditures for aib, sce, & hpp services)
* `Annual Expenditures Per Client for Information and Assistance, Education and Recreation, Health Promotion` (average cost per participant/client for services)
* `Ultilization` (contractor utilization for year, i.e. how effectively resources and services have been used to meet the requirements of contract w/ NYC Dept. of Aging)