## Analysis for 4C for Children

This analysis was requested by **4C for Children**, a nonprofit organization.  

4C for Children is a **Child Care Resource & Referral Agency** that supports everyone who cares for young childrenâ€”ranging from parents to child care providers. The organization's mission is to ensure **high-quality early education and care** for all children.

A questionaire was presented to childcare centers and in-home childcare centers (Family Childcare Centers/FCCs) to assess their ability to fill classes due to staffing concerns. A secondary purpose is to identify those who might need assistance in understanding or utilizing the "Step Up to Quality" rating system.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

file_path = "State of Childcare Survey.xlsx"
df = pd.read_excel(file_path)
print(df.head())

### Initial Check:
- Dataset is very wide with approximately 100 columns.
- Dataset is split into two main groups of questions:
  - Columns A - BM are questions for Centers
  - Columns BN - CQ are questions for Family Childcare
  - Columns CR - CV are questions for both 
  - There are multiple sets of questions that are addressed to both types
- In this analysis, we may not utilize every column, but some can be removed as unnessary.
  - empty
  - personal identifying information
  - unneccessary information such as ipaddress

In [2]:
#Drop columns unnecessary for either type of center and open-ended responses
#positions_to_drop = [1,4,5,6,7,8,62,78,79,80,81,82,83,84,85,86,93,94,95,96,97,98,99]
positions_to_drop = [1,4,5,6,7,8,46,47,53,55,58,64,75,76,77,82,84,87,93,94,95,96,97,98,99]
columns_to_drop = df.columns[positions_to_drop]
df = df.drop(columns=columns_to_drop)

In [None]:
# There are currently 2 header rows.  Pandas thinks the second header row is data, so the below skips that row 
# to properly do the date conversion. Errors='coerce' is set to set anything that won't covert to be NaN
df.loc[1:, 'Start Date'] = pd.to_datetime(df.loc[1:, 'Start Date'], errors='coerce').dt.date
df.loc[1:, 'End Date'] = pd.to_datetime(df.loc[1:, 'End Date'], errors='coerce').dt.date
print(df['Start Date'].iloc[1:])
print(df['End Date'].iloc[1:])



In [11]:
#create Center oriented data frame
center_columns_to_drop = df.columns[53:75]
center_df = df.drop(columns=center_columns_to_drop).copy()
center_df = center_df[center_df['I am a...'] == 'Center']

# create FCC oriented data frame
fcc_columns_to_drop = df.columns[0:53]
fcc_df = df.drop(columns=fcc_columns_to_drop).copy()
fcc_df = center_df[center_df['I am a...'] == 'Family Child Care']

#print(center_df.head())
#print(fcc_df.head())

In [6]:
# Function to handle numeric data wrangling.  
# I decided if it were empty, I'd like to keep it empty.  
# If there is text, I'll assume at least a 1 value.
# If there is a number, it will remain a number

def number_conversion(value):
    if isinstance(value, str):
        return 1
    elif pd.isna(value):
        return value
    else:
        return value

In [12]:
#Apply conversion function to columns I wish to perform math on.
#center_df.loc['classrooms not open-Infant'] = center_df['classrooms not open-Infant'].apply(number_conversion)
#center_df.loc['classrooms not open-Toddler'] = center_df['classrooms not open-Toddler'].apply(number_conversion)
#center_df.loc['classrooms not open-Preschool'] = center_df['classrooms not open-Preschool'].apply(number_conversion)
#center_df.loc['classrooms not open-School Age'] = center_df['classrooms not open-School Age'].apply(number_conversion)
print(center_df.head())
#print(fcc_df.head())

  Respondent ID  Start Date    End Date I am a...  \
1  118763424407  2024-12-13  2024-12-13    Center   
2  118763412562  2024-12-13  2024-12-13    Center   
3  118763398077  2024-12-13  2024-12-13    Center   
5  118763387777  2024-12-13  2024-12-13    Center   
7  118763379562  2024-12-13  2024-12-13    Center   

  Do you have any classrooms that are not open to children because of a shortage of staff?  \
1                                                Yes                                         
2                                                NaN                                         
3                                                 No                                         
5                                                Yes                                         
7                                                 No                                         

  classrooms not open-Infant classrooms not open-Toddler  \
1                        NaN                           1

In [13]:
# Identify the relevant columns
columns_to_convert = ['classrooms not open-Infant', 
                      'classrooms not open-Toddler', 
                      'classrooms not open-Preschool', 
                      'classrooms not open-School Age']

# Apply the conversion to each column
for col in columns_to_convert:
    center_df[col] = center_df[col].apply(number_conversion)

#sum the columns
column_sums = center_df[columns_to_convert].sum()

# Count the unique values in the 'Respondent ID' column
unique_respondents = center_df['Respondent ID'].nunique()

print(f"Total number of unique respondents: {unique_respondents}")

# Print the results for each column
for col, sum_value in column_sums.items():
    print(f"Total {col}: {sum_value}")


Total number of unique respondents: 134
Total classrooms not open-Infant: 17.0
Total classrooms not open-Toddler: 29.0
Total classrooms not open-Preschool: 34.0
Total classrooms not open-School Age: 16.0
