## Question 3: Cultural Arts Organizations course types and number of instructional hours per year correlation with ELA and math performance

Author: Thomas

Updates:
* Last updated on March 25, 2020
* Updated on March 24, 2020
* Updated on March 23, 2020
* Updated on March 21, 2020

**Hypothesis**: If art programs provided by Cultural Arts Organizations affect student performance, then schools with stronger indicators of instructional hours will have significantly different math and ela state test scores than schools with weaker indicators of instructional hours.

**Null hypothesis**: If art programs provided by Cultural Arts Organizations do not affect student performance, then schools with stronger indicators of instructional hours will not have significantly different math and ELA state test scores than schools with weaker indicators of instructional hours.

**Indicators of instructional hours** will be: (a) number of art course types and (b) self-report of instructional hours. 

**Math/ELA performance** will be measured by pass rate (proportion of students earning a score of 3 or 4).

## Data

### School Test Results Data
2013-2019 School Test Results Page
https://infohub.nyced.org/reports/academics/test-results

2013-2019 ELA Test Results: 
https://infohub.nyced.org/docs/default-source/default-document-library/school-ela-results-2013-2019-(public).xlsx
2013-2019 Math Test Results: 
https://infohub.nyced.org/docs/default-source/default-document-library/school-math-results-2013-2019-(public).xlsx

### Arts Survey Data
2016-2017 Arts Survey Data
https://data.cityofnewyork.us/Education/2016-2017-Arts-Survey-data/f33j-ecpr

2017-2018 Arts Survey Data
https://data.cityofnewyork.us/Education/2017-2018-Arts-Data-Report/d9fr-a56v

2018-2019 Arts Data Survey
https://data.cityofnewyork.us/Education/2018-2019-Arts-Data-Survey/5cxm-c27f

In [None]:
# import dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import json


## Dataset 1: Math and ELA Tests Data

In [None]:
# CSV Files
math_results = 'data/2013-2019_school_math_results.csv'
ela_results = 'data/2013-2019_school_ela_results.csv'

In [None]:
# read csv file into dataframe
math_df = pd.read_csv(math_results, encoding='utf-8', low_memory=False)

# clean up math dataframe

keep_cols = ['DBN','School Name','Grade','Year','Number Tested','# Level 3+4', f'% Level 3+4']
keep_grades = ['3', '4','5','6']

# drop unneeded columns
math_df = math_df.drop(labels='Unnamed: 0', axis=1)[math_df['Year'] >= 2017].reset_index().drop(labels='index', axis=1)[keep_cols]
# remove all rows matching `All Grades` grade
math_df = math_df[ math_df['Grade'] != 'All Grades' ]
# keep only grades from 3 to 6
math_df = math_df[ (math_df['Grade'].astype(int) >= 3) & (math_df['Grade'].astype(int) <= 6) ]
# add test column
math_df['Test'] = 'Math'

print(math_df.shape)
math_df.head(1)

In [None]:
# read csv file into dataframe
ela_df = pd.read_csv(ela_results, encoding='utf-8', low_memory=False)

# clean up ela dataframe

keep_cols = ['DBN','School Name','Grade','Year','Number Tested','# Level 3+4', f'% Level 3+4']
keep_grades = ['3', '4','5','6']

# drop unneeded columns
ela_df = ela_df.drop(labels='Unnamed: 0', axis=1)[ela_df['Year'] >= 2017].reset_index().drop(labels='index', axis=1)[keep_cols]
# remove all rows matching `All Grades` grade
ela_df = ela_df[ ela_df['Grade'] != 'All Grades' ]
# keep only grades from 3 to 6
ela_df = ela_df[ (ela_df['Grade'].astype(int) >= 3) & (ela_df['Grade'].astype(int) <= 6) ]
# add test column
ela_df['Test'] = 'ELA'

print(ela_df.shape)
ela_df.head(1)

## Dataset 2: Art Surveys Data

In [None]:
# CSV files
arts_2017 = 'data/2016-2017_Arts_Survey_Data.csv'
arts_2018 = 'data/2017-2018_Arts_Survey_Data.csv'
arts_2019 = 'data/2018-2019_Arts_Survey_Data.csv'

### Load and clean data for 2017 Arts Survey

In [None]:
arts_2017_df = pd.read_csv(arts_2017, encoding='utf-8', low_memory=False)
arts_2017_df = arts_2017_df.rename(columns={ 'Q0_DBN': 'DBN' })
arts_2017_df['Year'] = 2017

arts_2017_grade_3_dance_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q15', 'R1' ]) ]
arts_2017_grade_3_music_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q15', 'R2' ]) ]
arts_2017_grade_3_theater_cols        = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q15', 'R3' ]) ]
arts_2017_grade_3_visual_cols         = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q15', 'R4' ]) ]

arts_2017_grade_3_df = pd.DataFrame()
arts_2017_grade_3_df['Dance']         = arts_2017_df[arts_2017_grade_3_dance_cols].fillna(0).sum(axis=1)
arts_2017_grade_3_df['Music']         = arts_2017_df[arts_2017_grade_3_music_cols].fillna(0).sum(axis=1)
arts_2017_grade_3_df['Theater']       = arts_2017_df[arts_2017_grade_3_theater_cols].fillna(0).sum(axis=1)
arts_2017_grade_3_df['Visual Arts']   = arts_2017_df[arts_2017_grade_3_visual_cols].fillna(0).sum(axis=1)
arts_2017_grade_3_df[['DBN','Year']]  = arts_2017_df[['DBN','Year']]
arts_2017_grade_3_df['Grade']         = 3

arts_2017_grade_4_dance_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q16', 'R1' ]) ]
arts_2017_grade_4_music_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q16', 'R2' ]) ]
arts_2017_grade_4_theater_cols        = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q16', 'R3' ]) ]
arts_2017_grade_4_visual_cols         = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q16', 'R4' ]) ]

arts_2017_grade_4_df = pd.DataFrame()
arts_2017_grade_4_df['Dance']         = arts_2017_df[arts_2017_grade_4_dance_cols].fillna(0).sum(axis=1)
arts_2017_grade_4_df['Music']         = arts_2017_df[arts_2017_grade_4_music_cols].fillna(0).sum(axis=1)
arts_2017_grade_4_df['Theater']       = arts_2017_df[arts_2017_grade_4_theater_cols].fillna(0).sum(axis=1)
arts_2017_grade_4_df['Visual Arts']   = arts_2017_df[arts_2017_grade_4_visual_cols].fillna(0).sum(axis=1)
arts_2017_grade_4_df[['DBN','Year']]  = arts_2017_df[['DBN','Year']]
arts_2017_grade_4_df['Grade']         = 4

arts_2017_grade_5_dance_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q17', 'R1' ]) ]
arts_2017_grade_5_music_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q17', 'R2' ]) ]
arts_2017_grade_5_theater_cols        = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q17', 'R3' ]) ]
arts_2017_grade_5_visual_cols         = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q17', 'R4' ]) ]

arts_2017_grade_5_df = pd.DataFrame()
arts_2017_grade_5_df['Dance']         = arts_2017_df[arts_2017_grade_5_dance_cols].fillna(0).sum(axis=1)
arts_2017_grade_5_df['Music']         = arts_2017_df[arts_2017_grade_5_music_cols].fillna(0).sum(axis=1)
arts_2017_grade_5_df['Theater']       = arts_2017_df[arts_2017_grade_5_theater_cols].fillna(0).sum(axis=1)
arts_2017_grade_5_df['Visual Arts']   = arts_2017_df[arts_2017_grade_5_visual_cols].fillna(0).sum(axis=1)
arts_2017_grade_5_df[['DBN','Year']]  = arts_2017_df[['DBN','Year']]
arts_2017_grade_5_df['Grade']         = 5

arts_2017_grade_6_dance_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q19', 'R1' ]) ]
arts_2017_grade_6_music_cols          = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q19', 'R2' ]) ]
arts_2017_grade_6_theater_cols        = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q19', 'R3' ]) ]
arts_2017_grade_6_visual_cols         = [  col for col in arts_2017_df.columns if all(ele in col for ele in [ 'Q19', 'R4' ]) ]

arts_2017_grade_6_df = pd.DataFrame()
arts_2017_grade_6_df['Dance']         = arts_2017_df[arts_2017_grade_6_dance_cols].fillna(0).sum(axis=1)
arts_2017_grade_6_df['Music']         = arts_2017_df[arts_2017_grade_6_music_cols].fillna(0).sum(axis=1)
arts_2017_grade_6_df['Theater']       = arts_2017_df[arts_2017_grade_6_theater_cols].fillna(0).sum(axis=1)
arts_2017_grade_6_df['Visual Arts']   = arts_2017_df[arts_2017_grade_6_visual_cols].fillna(0).sum(axis=1)
arts_2017_grade_6_df[['DBN','Year']]  = arts_2017_df[['DBN','Year']]
arts_2017_grade_6_df['Grade']         = 6

arts_2017_grades = pd.concat([arts_2017_grade_3_df, arts_2017_grade_4_df, arts_2017_grade_5_df, arts_2017_grade_6_df])
arts_2017_grades = arts_2017_grades[arts_2017_grades.columns.tolist()[-3:] + arts_2017_grades.columns.tolist()[:-3]]
print(arts_2017_grades.shape)
arts_2017_grades.head(1)


In [None]:
arts_2018_df = pd.read_csv(arts_2018, encoding='utf-8', low_memory=False)
arts_2018_df = arts_2018_df.rename(columns={ 'Q0_DBN': 'DBN' })
arts_2018_df['Year'] = 2018

arts_2018_grade_3_dance_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q15', 'R1' ]) ]
arts_2018_grade_3_music_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q15', 'R2' ]) ]
arts_2018_grade_3_theater_cols        = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q15', 'R3' ]) ]
arts_2018_grade_3_visual_cols         = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q15', 'R4' ]) ]

arts_2018_grade_3_df = pd.DataFrame()
arts_2018_grade_3_df['Dance']         = arts_2018_df[arts_2018_grade_3_dance_cols].fillna(0).sum(axis=1)
arts_2018_grade_3_df['Music']         = arts_2018_df[arts_2018_grade_3_music_cols].fillna(0).sum(axis=1)
arts_2018_grade_3_df['Theater']       = arts_2018_df[arts_2018_grade_3_theater_cols].fillna(0).sum(axis=1)
arts_2018_grade_3_df['Visual Arts']   = arts_2018_df[arts_2018_grade_3_visual_cols].fillna(0).sum(axis=1)
arts_2018_grade_3_df[['DBN','Year']]  = arts_2018_df[['DBN','Year']]
arts_2018_grade_3_df['Grade']         = 3

arts_2018_grade_4_dance_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q16', 'R1' ]) ]
arts_2018_grade_4_music_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q16', 'R2' ]) ]
arts_2018_grade_4_theater_cols        = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q16', 'R3' ]) ]
arts_2018_grade_4_visual_cols         = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q16', 'R4' ]) ]

arts_2018_grade_4_df = pd.DataFrame()
arts_2018_grade_4_df['Dance']         = arts_2018_df[arts_2018_grade_4_dance_cols].fillna(0).sum(axis=1)
arts_2018_grade_4_df['Music']         = arts_2018_df[arts_2018_grade_4_music_cols].fillna(0).sum(axis=1)
arts_2018_grade_4_df['Theater']       = arts_2018_df[arts_2018_grade_4_theater_cols].fillna(0).sum(axis=1)
arts_2018_grade_4_df['Visual Arts']   = arts_2018_df[arts_2018_grade_4_visual_cols].fillna(0).sum(axis=1)
arts_2018_grade_4_df[['DBN','Year']]  = arts_2018_df[['DBN','Year']]
arts_2018_grade_4_df['Grade']         = 4

arts_2018_grade_5_dance_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q17', 'R1' ]) ]
arts_2018_grade_5_music_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q17', 'R2' ]) ]
arts_2018_grade_5_theater_cols        = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q17', 'R3' ]) ]
arts_2018_grade_5_visual_cols         = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q17', 'R4' ]) ]

arts_2018_grade_5_df = pd.DataFrame()
arts_2018_grade_5_df['Dance']         = arts_2018_df[arts_2018_grade_5_dance_cols].fillna(0).sum(axis=1)
arts_2018_grade_5_df['Music']         = arts_2018_df[arts_2018_grade_5_music_cols].fillna(0).sum(axis=1)
arts_2018_grade_5_df['Theater']       = arts_2018_df[arts_2018_grade_5_theater_cols].fillna(0).sum(axis=1)
arts_2018_grade_5_df['Visual Arts']   = arts_2018_df[arts_2018_grade_5_visual_cols].fillna(0).sum(axis=1)
arts_2018_grade_5_df[['DBN','Year']]  = arts_2018_df[['DBN','Year']]
arts_2018_grade_5_df['Grade']         = 5

arts_2018_grade_6_dance_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q19', 'R1' ]) ]
arts_2018_grade_6_music_cols          = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q19', 'R2' ]) ]
arts_2018_grade_6_theater_cols        = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q19', 'R3' ]) ]
arts_2018_grade_6_visual_cols         = [  col for col in arts_2018_df.columns if all(ele in col for ele in [ 'Q19', 'R4' ]) ]

arts_2018_grade_6_df = pd.DataFrame()
arts_2018_grade_6_df['Dance']         = arts_2018_df[arts_2018_grade_6_dance_cols].fillna(0).sum(axis=1)
arts_2018_grade_6_df['Music']         = arts_2018_df[arts_2018_grade_6_music_cols].fillna(0).sum(axis=1)
arts_2018_grade_6_df['Theater']       = arts_2018_df[arts_2018_grade_6_theater_cols].fillna(0).sum(axis=1)
arts_2018_grade_6_df['Visual Arts']   = arts_2018_df[arts_2018_grade_6_visual_cols].fillna(0).sum(axis=1)
arts_2018_grade_6_df[['DBN','Year']]  = arts_2018_df[['DBN','Year']]
arts_2018_grade_6_df['Grade']         = 6

arts_2018_grades = pd.concat([arts_2018_grade_3_df, arts_2018_grade_4_df, arts_2018_grade_5_df, arts_2018_grade_6_df])
arts_2018_grades = arts_2018_grades[arts_2018_grades.columns.tolist()[-3:] + arts_2018_grades.columns.tolist()[:-3]]
print(arts_2018_grades.shape)
arts_2018_grades.head(1)


In [None]:
arts_2019_df = pd.read_csv(arts_2019, encoding='utf-8', low_memory=False)
arts_2019_df = arts_2019_df.rename(columns={ 'Q0_DBN': 'DBN' })
arts_2019_df['Year'] = 2019

arts_2019_grade_3_dance_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q15', 'R1' ]) ]
arts_2019_grade_3_music_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q15', 'R2' ]) ]
arts_2019_grade_3_theater_cols        = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q15', 'R3' ]) ]
arts_2019_grade_3_visual_cols         = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q15', 'R4' ]) ]

arts_2019_grade_3_df = pd.DataFrame()
arts_2019_grade_3_df['Dance']         = arts_2019_df[arts_2019_grade_3_dance_cols].fillna(0).sum(axis=1)
arts_2019_grade_3_df['Music']         = arts_2019_df[arts_2019_grade_3_music_cols].fillna(0).sum(axis=1)
arts_2019_grade_3_df['Theater']       = arts_2019_df[arts_2019_grade_3_theater_cols].fillna(0).sum(axis=1)
arts_2019_grade_3_df['Visual Arts']   = arts_2019_df[arts_2019_grade_3_visual_cols].fillna(0).sum(axis=1)
arts_2019_grade_3_df[['DBN','Year']]  = arts_2019_df[['DBN','Year']]
arts_2019_grade_3_df['Grade']         = 3

arts_2019_grade_4_dance_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q16', 'R1' ]) ]
arts_2019_grade_4_music_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q16', 'R2' ]) ]
arts_2019_grade_4_theater_cols        = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q16', 'R3' ]) ]
arts_2019_grade_4_visual_cols         = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q16', 'R4' ]) ]

arts_2019_grade_4_df = pd.DataFrame()
arts_2019_grade_4_df['Dance']         = arts_2019_df[arts_2019_grade_4_dance_cols].fillna(0).sum(axis=1)
arts_2019_grade_4_df['Music']         = arts_2019_df[arts_2019_grade_4_music_cols].fillna(0).sum(axis=1)
arts_2019_grade_4_df['Theater']       = arts_2019_df[arts_2019_grade_4_theater_cols].fillna(0).sum(axis=1)
arts_2019_grade_4_df['Visual Arts']   = arts_2019_df[arts_2019_grade_4_visual_cols].fillna(0).sum(axis=1)
arts_2019_grade_4_df[['DBN','Year']]  = arts_2019_df[['DBN','Year']]
arts_2019_grade_4_df['Grade']         = 4

arts_2019_grade_5_dance_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q17', 'R1' ]) ]
arts_2019_grade_5_music_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q17', 'R2' ]) ]
arts_2019_grade_5_theater_cols        = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q17', 'R3' ]) ]
arts_2019_grade_5_visual_cols         = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q17', 'R4' ]) ]

arts_2019_grade_5_df = pd.DataFrame()
arts_2019_grade_5_df['Dance']         = arts_2019_df[arts_2019_grade_5_dance_cols].fillna(0).sum(axis=1)
arts_2019_grade_5_df['Music']         = arts_2019_df[arts_2019_grade_5_music_cols].fillna(0).sum(axis=1)
arts_2019_grade_5_df['Theater']       = arts_2019_df[arts_2019_grade_5_theater_cols].fillna(0).sum(axis=1)
arts_2019_grade_5_df['Visual Arts']   = arts_2019_df[arts_2019_grade_5_visual_cols].fillna(0).sum(axis=1)
arts_2019_grade_5_df[['DBN','Year']]  = arts_2019_df[['DBN','Year']]
arts_2019_grade_5_df['Grade']         = 5

arts_2019_grade_6_dance_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q19', 'R1' ]) ]
arts_2019_grade_6_music_cols          = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q19', 'R2' ]) ]
arts_2019_grade_6_theater_cols        = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q19', 'R3' ]) ]
arts_2019_grade_6_visual_cols         = [  col for col in arts_2019_df.columns if all(ele in col for ele in [ 'Q19', 'R4' ]) ]

arts_2019_grade_6_df = pd.DataFrame()
arts_2019_grade_6_df['Dance']         = arts_2019_df[arts_2019_grade_6_dance_cols].fillna(0).sum(axis=1)
arts_2019_grade_6_df['Music']         = arts_2019_df[arts_2019_grade_6_music_cols].fillna(0).sum(axis=1)
arts_2019_grade_6_df['Theater']       = arts_2019_df[arts_2019_grade_6_theater_cols].fillna(0).sum(axis=1)
arts_2019_grade_6_df['Visual Arts']   = arts_2019_df[arts_2019_grade_6_visual_cols].fillna(0).sum(axis=1)
arts_2019_grade_6_df[['DBN','Year']]  = arts_2019_df[['DBN','Year']]
arts_2019_grade_6_df['Grade']         = 6

arts_2019_grades = pd.concat([arts_2019_grade_3_df, arts_2019_grade_4_df, arts_2019_grade_5_df, arts_2019_grade_6_df])
arts_2019_grades = arts_2019_grades[arts_2019_grades.columns.tolist()[-3:] + arts_2019_grades.columns.tolist()[:-3]]
print(arts_2019_grades.shape)
arts_2019_grades.head(1)


In [None]:
arts_df = pd.concat([arts_2017_grades, arts_2018_grades, arts_2019_grades])
print(arts_df.shape)
arts_df.head()