## Analysis for 4C for Children

This analysis was requested by **4C for Children**, a nonprofit organization.  

4C for Children is a **Child Care Resource & Referral Agency** that supports everyone who cares for young children—ranging from parents to child care providers. The organization's mission is to ensure **high-quality early education and care** for all children.

A questionaire was presented to childcare centers and in-home childcare centers (Family Childcare Centers/FCCs) to assess their ability to fill classes. A secondary purpose is to identify those who might need assistance in understanding or utilizing the "Step Up to Quality" rating system. The results of that questionaire was collected into an excel spreadsheet and this analysis will focus on Centers and attempt to identify the gap in staffing for each of classes: Infant, Toddler, Preschool and School Age. 


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML

# bring in 2 files, one for data and one as a map to help shorten column names where possible
data_file_path = "State of Childcare Survey.xlsx"
mapping_file_path = "Column Mappings.xlsx"
df = pd.read_excel(data_file_path)
mapping_df = pd.read_excel(mapping_file_path)
#print(df.head())
#print(mapping_df.head())

column_mapping = dict(zip(mapping_df['Old Name'], mapping_df['New Name']))
df = df.rename(columns=column_mapping)
print("\nUpdated Column Names:")
print(df.columns)


Updated Column Names:
Index(['Respondent ID', 'Collector ID', 'Start Date', 'End Date', 'IP Address',
       'Email Address', 'First Name', 'Last Name', 'Custom Data 1',
       'I am a...', 'Any classrooms not open due to staff shortage?',
       'classrooms not open-Infant', 'classrooms not open-Toddler',
       'classrooms not open-Preschool', 'classrooms not open-School Age',
       'staff need to open classrooms-Infant',
       'staff need to open classrooms-Toddler',
       'staff need to open classrooms-Preschool',
       'staff need to open classrooms-School Age',
       'additional children in unopened classrooms-Infant',
       'additional children in unopened classrooms-Toddler',
       'additional children in unopened classrooms-Preschool',
       'additional children in unopened classrooms-School Age',
       'classrooms not at full capacity due to staff shortage?',
       'classrooms short staffed-Infant', 'classrooms short staffed-Toddler',
       'classrooms short staff

  from IPython.core.display import display, HTML


### Initial Check:
- Dataset is very wide with approximately 100 columns.
- Dataset is split into two main groups of questions:
  - Columns A - BM are questions for Centers
  - Columns BN - CQ are questions for Family Childcare
  - Columns CR - CV are questions for both 
  - There are multiple sets of questions that are addressed to both types
- In this analysis, since we are focusing on centers, we can remove columns that are:
  - empty: columns F-I
  - personal identifying information: CR, CS (we will leave city, state and Zip for potential analysis)
  - unneccessary information such as ipaddress, collector ID, Quality Ratings, etc.: B, E, BA-BM
  - questions addressed to Family Childcare Centers: BN-CQ
  - Open-Ended Responses that can't be quantified: AU, AV, BB

In [2]:
#Drop unnecessary columns
#positions_to_drop = [1,4,5,6,7,8,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96]
positions_to_drop = [1,4,5,6,7,8] + list(range(46,97))
columns_to_drop = df.columns[positions_to_drop]
df = df.drop(columns=columns_to_drop)

In [3]:
# Date conversion from datetime to date. Errors='coerce' is set to set anything that won't covert to be NaN
df['Start Date'] = pd.to_datetime(df['Start Date'], errors='coerce').dt.date
df['End Date'] = pd.to_datetime(df['End Date'], errors='coerce').dt.date

print(df['Start Date'])
print(df['End Date'])



0      2024-12-13
1      2024-12-13
2      2024-12-13
3      2024-12-13
4      2024-12-13
          ...    
309    2024-12-03
310    2024-12-03
311    2024-12-03
312    2024-12-02
313    2024-12-02
Name: Start Date, Length: 314, dtype: object
0      2024-12-13
1      2024-12-13
2      2024-12-13
3      2024-12-13
4      2024-12-13
          ...    
309    2024-12-03
310    2024-12-03
311    2024-12-03
312    2024-12-02
313    2024-12-02
Name: End Date, Length: 314, dtype: object


In [7]:
# apply filter b/c I can
center_df = df[df['I am a...'] == 'Center']

#print(center_df.head())


In [8]:
# Function to handle numeric data wrangling.  
# I decided if it were empty, I'd like to keep it empty.  
# If there is text, I'll assume at least a 1 value.
# If there is a number, it will remain a number

def number_conversion(value):
    if isinstance(value, str):
        return 1
    elif pd.isna(value):
        return value
    else:
        return value

In [9]:
# Identify the relevant columns for calculations/conversion
columns_to_convert = ['classrooms not open-Infant', 
                      'classrooms not open-Toddler', 
                      'classrooms not open-Preschool', 
                      'classrooms not open-School Age',
                      'staff need to open classrooms-Infant',
                      'staff need to open classrooms-Toddler',
                      'staff need to open classrooms-Preschool',
                      'staff need to open classrooms-School Age',
                      'additional children in unopened classrooms-Infant',
                      'additional children in unopened classrooms-Toddler',
                      'additional children in unopened classrooms-Preschool',
                      'additional children in unopened classrooms-School Age',
                      'classrooms short staffed-Infant',
                      'classrooms short staffed-Toddler',
                      'classrooms short staffed-Preschool',
                      'classrooms short staffed-School Age',
                      'staff need to fill classrooms-Infant',
                      'staff need to fill classrooms-Toddler',
                      'staff need to fill classrooms-Preschool',
                      'staff need to fill classrooms-School Age',
                      'additional children in unopened classrooms-Infant',
                      'additional children in unopened classrooms-Toddler',
                      'additional children in unopened classrooms-Preschool',
                      'additional children in unopened classrooms-School Age',
                      'current enrollment-Infant',
                      'current enrollment-Toddler',
                      'current enrollment-Preschool',
                      'current enrollment-School Age',
                      'current enrollment-Infant',
                      'current enrollment-Toddler',
                      'current enrollment-Preschool',
                      'current enrollment-School Age',
                      'ideal enrollment-Infant',
                      'ideal enrollment-Toddler',
                      'ideal enrollment-Preschool',
                      'ideal enrollment-School Age',
                     ]

# Apply the conversion to each column
for col in columns_to_convert:
    center_df.loc[:, col] = center_df[col].apply(number_conversion)


#sum the columns
column_sums = center_df[columns_to_convert].sum()

# Count the unique values in the 'Respondent ID' column
unique_respondents = center_df['Respondent ID'].nunique()

#print(f"For Centers:")
print(f"Total number of unique respondents: {unique_respondents}")

# Print the results for each column
for col, sum_value in column_sums.items():
    print(f"Total {col}: {int(sum_value)}")


Total number of unique respondents: 134
Total classrooms not open-Infant: 17
Total classrooms not open-Toddler: 29
Total classrooms not open-Preschool: 34
Total classrooms not open-School Age: 16
Total staff need to open classrooms-Infant: 24
Total staff need to open classrooms-Toddler: 31
Total staff need to open classrooms-Preschool: 32
Total staff need to open classrooms-School Age: 16
Total additional children in unopened classrooms-Infant: 144
Total additional children in unopened classrooms-Toddler: 200
Total additional children in unopened classrooms-Preschool: 368
Total additional children in unopened classrooms-School Age: 187
Total classrooms short staffed-Infant: 29
Total classrooms short staffed-Toddler: 27
Total classrooms short staffed-Preschool: 35
Total classrooms short staffed-School Age: 7
Total staff need to fill classrooms-Infant: 34
Total staff need to fill classrooms-Toddler: 24
Total staff need to fill classrooms-Preschool: 27
Total staff need to fill classrooms-

In [12]:
#Change width of Jupyter cell to allow all metrics to be displayed and list metric descriptions on appropriate fields
display(HTML("<style>.container { width: 100% !important; }</style>"))
center_df[['classrooms not open-Infant']].describe()

Unnamed: 0,classrooms not open-Infant
count,24.0
mean,0.708333
std,0.624094
min,0.0
25%,0.0
50%,1.0
75%,1.0
max,2.0
