## Question 1: Funding quality and measure per year correlation with ELA and math performance

Last updated on 3/24/20


**Hypothesis**: If art programs affect student performance, then schools with stronger indicators of funding will have significantly different math and ela state test scores than schools with weaker indicators of funding.

**Null hypothesis**: If art programs do not affect student performance, then schools with stronger indicators of funding will not have significantly different math and ELA state test scores than schools with weaker indicators of funding.

**Indicators of funding** will be: (a) number of funding sources, (b) self-report of increase or decrease in funding, and (c) self-report of adequate or inadequate funding). 

**Math/ELA performance** will be measured by pass rate (proportion of students earning a score of 3 or 4).

### Data

Annual Arts Education Survey: https://data.cityofnewyork.us/browse/select_dataset?Dataset-Information_Agency=Department+of+Education+%28DOE%29&nofederate=true&suppressed_facets[]=domain&utf8=✓&sortBy=relevance&q=Arts%20Data%20Survey 

New York State Test Results: https://infohub.nyced.org/reports/academics/test-results 

In [27]:
# import dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import linregress
import requests
import json

## Dataset 1: Arts Data

In [4]:
# arts_2014_2015_file = "2014-2015_Arts_Survey_Data.csv"
# arts_2015_2016_file = "2015-2016_Arts_Survey_Data.csv"
arts_2016_2017_file = "2016-2017_Arts_Survey_Data.csv"
arts_2017_2018_file = "2017-2018_Arts_Survey_Data.csv"
arts_2018_2019_file = "2018-2019_Arts_Survey_Data.csv"

# arts_2014_2015 = pd.read_csv(arts_2014_2015_file, low_memory=False)
# arts_2015_2016 = pd.read_csv(arts_2015_2016_file, low_memory=False)
arts_2016_2017 = pd.read_csv(arts_2016_2017_file, low_memory=False)
arts_2017_2018 = pd.read_csv(arts_2017_2018_file, low_memory=False)
arts_2018_2019 = pd.read_csv(arts_2018_2019_file, low_memory=False)


arts_2016_2017['Year'] = 2017
arts_2017_2018['Year'] = 2018
arts_2018_2019['Year'] = 2019

## Data  Dictionaries: 

### 2016-2017

**Q34:** Did your school receive the following funding sources (non-DOE) to support arts education in this school year (*check all that apply*)?

(Yes/No radio buttons)

**Funding Sources**
* Cultural organizations
* Education association
* Federal, state, or city grants
* Local business or corporation
* Private foundation
* PTA/PA
* State, county, local arts councils

---

**Q35:** Funding for the arts is generally:

* Abundant
* Sufficient
* Insufficient
* N/A

---

**Q36:** Funding over the past three years has:

* Increased
* Decreased
* Remained the same

### 2017-2018

**Q36:** Did your school receive the following funding sources (non-DOE) to support arts education in this school year (*check all that apply*)?

(Yes/No radio buttons)

**Funding Sources**
* Cultural organizations
* Education association
* Federal, state, or city grants
* Local business or corporation
* Private foundation
* PTA/PA
* State, county, local arts councils

---

**Q37:** Funding for the arts is generally:

* Abundant
* Sufficient
* Insufficient
* N/A

---

**Q38:** Funding over the past three years has:

* Increased
* Decreased
* Remained the same

### 2018-2019

**Q32:** Did your school receive the following funding sources (non-DOE) to support arts education in this school year (*check all that apply*)?

(Yes/No radio buttons)

**Funding Sources**
* Cultural organizations
* Education association
* Federal, state, or city grants
* Local business or corporation
* Private foundation
* PTA/PA
* State, county, local arts councils

---

**Q33:** Funding for the arts is generally:

* Abundant
* Sufficient
* Insufficient
* N/A

---

**Q34:** Funding over the past three years has:

* Increased
* Decreased
* Remained the same

## Clean/Explore Dataset 1

In [5]:
#print(str(arts_2016_2017.columns))

# Q34, Q35, Q36
# # Note: for Q34, quetions with the 'C1' and 'C2' suffixes are redundant because they are inverses of each other
# funding_2016_2017 = arts_2016_2017[['Q0_DBN', 
#                                     'Q34_R1_C1', 'Q34_R1_C2', 'Q34_R2_C1', 'Q34_R2_C2',
#                                     'Q34_R3_C1', 'Q34_R3_C2', 'Q34_R4_C1', 'Q34_R4_C2', 
#                                     'Q34_R5_C1','Q34_R5_C2', 'Q34_R6_C1', 'Q34_R6_C2',
#                                     'Q34_R7_C1', 'Q34_R7_C2',
#                                     'Q35_1', 'Q35_2', 'Q35_3', 'Q35_4', 
#                                     'Q36_1', 'Q36_2', 'Q36_3']]


# Q34, Q35, Q36
funding_2016_2017 = arts_2016_2017[['Q0_DBN',
                                    'Year',
                                    'Q34_R1_C1', 
                                    'Q34_R2_C1',
                                    'Q34_R3_C1',
                                    'Q34_R4_C1',
                                    'Q34_R5_C1',
                                    'Q34_R6_C1',
                                    'Q34_R7_C1', 
                                    'Q35_1', 'Q35_2', 'Q35_3', 'Q35_4', 
                                    'Q36_1', 'Q36_2', 'Q36_3']]

funding_2016_2017.head()

Unnamed: 0,Q0_DBN,Year,Q34_R1_C1,Q34_R2_C1,Q34_R3_C1,Q34_R4_C1,Q34_R5_C1,Q34_R6_C1,Q34_R7_C1,Q35_1,Q35_2,Q35_3,Q35_4,Q36_1,Q36_2,Q36_3
0,01M015,2017,1,0,1,1,1,0,1,0,0,1,0,0,0,1
1,01M019,2017,0,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2017,0,0,1,0,0,1,0,0,0,1,0,0,1,0
3,01M034,2017,0,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M515,2017,1,0,1,0,0,0,0,0,0,1,0,0,1,0


In [6]:
# print(str(arts_2017_2018.columns))

# # Q36, Q37, Q38
# # Note: for Q36, quetions with the 'C1' and 'C2' suffixes are redundant because they are  inverses of each other
# funding_2017_2018 = arts_2017_2018[['Q0_DBN', 
#                                    'Q36_R1_C1','Q36_R1_C2',
#                                     'Q36_R2_C1', 'Q36_R2_C2', 
#                                     'Q36_R3_C1', 'Q36_R3_C2', 
#                                     'Q36_R4_C1', 'Q36_R4_C2',
#                                     'Q36_R5_C1', 'Q36_R5_C2', 
#                                     'Q36_R6_C1', 'Q36_R6_C2', 
#                                     'Q36_R7_C1','Q36_R7_C2', 
#                                     'Q37_1', 'Q37_2', 
#                                     'Q37_3', 'Q37_4', 
#                                     'Q38_1', 'Q38_2','Q38_3']]


# # Q36, Q37, Q38
funding_2017_2018 = arts_2017_2018[['Q0_DBN',
                                    'Year',
                                   'Q36_R1_C1',
                                    'Q36_R2_C1', 
                                    'Q36_R3_C1',
                                    'Q36_R4_C1',
                                    'Q36_R5_C1',
                                    'Q36_R6_C1',
                                    'Q36_R7_C1',
                                    'Q37_1', 'Q37_2', 'Q37_3', 'Q37_4', 
                                    'Q38_1', 'Q38_2','Q38_3']]

funding_2017_2018.head()

Unnamed: 0,Q0_DBN,Year,Q36_R1_C1,Q36_R2_C1,Q36_R3_C1,Q36_R4_C1,Q36_R5_C1,Q36_R6_C1,Q36_R7_C1,Q37_1,Q37_2,Q37_3,Q37_4,Q38_1,Q38_2,Q38_3
0,01M015,2018,1,1,1,1,1,0,1,0,0,1,0,0,0,1
1,01M019,2018,0,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2018,0,0,0,0,0,0,0,0,0,1,0,0,1,0
3,01M034,2018,1,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M063,2018,0,0,1,0,0,1,1,0,0,1,0,0,1,0


In [7]:
#print(str(arts_2018_2019.columns))

# # Q32, Q33, Q34
# # Note: for Q32, quetions with the 'C1' and 'C2' suffixes are redundant because they are  inverses of each other
# funding_2018_2019 = arts_2018_2019[['Q0_DBN', 
#                                     ' Q32_R1_C1', ' Q32_R1_C2', ' Q32_R2_C1',
#                                     ' Q32_R2_C2', ' Q32_R3_C1', ' Q32_R3_C2', 
#                                     ' Q32_R4_C1', ' Q32_R4_C2',' Q32_R5_C1', 
#                                     ' Q32_R5_C2', ' Q32_R6_C1', ' Q32_R6_C2', 
#                                     ' Q32_R7_C1',' Q32_R7_C2', 
#                                     'Q33_1', 'Q33_2', 'Q33_3', 'Q33_4', 
#                                     'Q34_1', 'Q34_2','Q34_3']]


# Q32, Q33, Q34
funding_2018_2019 = arts_2018_2019[['Q0_DBN',
                                    'Year',
                                    ' Q32_R1_C1',
                                    ' Q32_R2_C1',
                                    ' Q32_R3_C1', 
                                    ' Q32_R4_C1',
                                    ' Q32_R5_C1', 
                                    ' Q32_R6_C1',
                                    ' Q32_R7_C1', 
                                    'Q33_1', 'Q33_2', 'Q33_3', 'Q33_4', 
                                    'Q34_1', 'Q34_2','Q34_3']]



funding_2018_2019.head()

Unnamed: 0,Q0_DBN,Year,Q32_R1_C1,Q32_R2_C1,Q32_R3_C1,Q32_R4_C1,Q32_R5_C1,Q32_R6_C1,Q32_R7_C1,Q33_1,Q33_2,Q33_3,Q33_4,Q34_1,Q34_2,Q34_3
0,01M015,2019,1,0,0,1,1,0,0,0,0,1,0,0,1,0
1,01M019,2019,1,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2019,0,0,0,0,1,1,0,0,0,1,0,0,1,0
3,01M034,2019,1,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M063,2019,0,0,1,0,0,1,0,0,0,1,0,0,1,0


In [8]:
# Rename columns

funding_2016_2017 = funding_2016_2017.rename(columns={'Q0_DBN': 'DBN', 
                                    'Q34_R1_C1': 'Q1_cultural_org',
                                    'Q34_R2_C1': 'Q1_education_assoc',
                                    'Q34_R3_C1': 'Q1_grants', 
                                    'Q34_R4_C1': 'Q1_local_business',
                                    'Q34_R5_C1': 'Q1_private_foundation', 
                                    'Q34_R6_C1': 'Q1_PTA_PA',
                                    'Q34_R7_C1': 'Q1_arts_council', 
                                    'Q35_1': 'Q2_abundant', 'Q35_2': 'Q2_sufficient', 
                                    'Q35_3': 'Q2_insufficient', 'Q35_4': 'Q2_na', 
                                    'Q36_1': 'Q3_increased', 'Q36_2': 'Q3_decreased','Q36_3':'Q3_same'})
funding_2016_2017.head()

Unnamed: 0,DBN,Year,Q1_cultural_org,Q1_education_assoc,Q1_grants,Q1_local_business,Q1_private_foundation,Q1_PTA_PA,Q1_arts_council,Q2_abundant,Q2_sufficient,Q2_insufficient,Q2_na,Q3_increased,Q3_decreased,Q3_same
0,01M015,2017,1,0,1,1,1,0,1,0,0,1,0,0,0,1
1,01M019,2017,0,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2017,0,0,1,0,0,1,0,0,0,1,0,0,1,0
3,01M034,2017,0,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M515,2017,1,0,1,0,0,0,0,0,0,1,0,0,1,0


In [9]:
# Rename columns

funding_2017_2018 = funding_2017_2018.rename(columns={'Q0_DBN': 'DBN', 
                                    'Q36_R1_C1': 'Q1_cultural_org',
                                    'Q36_R2_C1': 'Q1_education_assoc',
                                    'Q36_R3_C1': 'Q1_grants', 
                                    'Q36_R4_C1': 'Q1_local_business',
                                    'Q36_R5_C1': 'Q1_private_foundation', 
                                    'Q36_R6_C1': 'Q1_PTA_PA',
                                    'Q36_R7_C1': 'Q1_arts_council', 
                                    'Q37_1': 'Q2_abundant', 'Q37_2': 'Q2_sufficient', 
                                    'Q37_3': 'Q2_insufficient', 'Q37_4': 'Q2_na', 
                                    'Q38_1': 'Q3_increased', 'Q38_2': 'Q3_decreased','Q38_3':'Q3_same'})
funding_2016_2017.head()

Unnamed: 0,DBN,Year,Q1_cultural_org,Q1_education_assoc,Q1_grants,Q1_local_business,Q1_private_foundation,Q1_PTA_PA,Q1_arts_council,Q2_abundant,Q2_sufficient,Q2_insufficient,Q2_na,Q3_increased,Q3_decreased,Q3_same
0,01M015,2017,1,0,1,1,1,0,1,0,0,1,0,0,0,1
1,01M019,2017,0,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2017,0,0,1,0,0,1,0,0,0,1,0,0,1,0
3,01M034,2017,0,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M515,2017,1,0,1,0,0,0,0,0,0,1,0,0,1,0


In [10]:
# Rename columns

funding_2018_2019 = funding_2018_2019.rename(columns={'Q0_DBN': 'DBN', 
                                    ' Q32_R1_C1': 'Q1_cultural_org',
                                    ' Q32_R2_C1': 'Q1_education_assoc',
                                    ' Q32_R3_C1': 'Q1_grants', 
                                    ' Q32_R4_C1': 'Q1_local_business',
                                    ' Q32_R5_C1': 'Q1_private_foundation', 
                                    ' Q32_R6_C1': 'Q1_PTA_PA',
                                    ' Q32_R7_C1': 'Q1_arts_council', 
                                    'Q33_1': 'Q2_abundant', 'Q33_2': 'Q2_sufficient', 
                                    'Q33_3': 'Q2_insufficient', 'Q33_4': 'Q2_na', 
                                    'Q34_1': 'Q3_increased', 'Q34_2': 'Q3_decreased','Q34_3':'Q3_same'})
funding_2018_2019.head()

Unnamed: 0,DBN,Year,Q1_cultural_org,Q1_education_assoc,Q1_grants,Q1_local_business,Q1_private_foundation,Q1_PTA_PA,Q1_arts_council,Q2_abundant,Q2_sufficient,Q2_insufficient,Q2_na,Q3_increased,Q3_decreased,Q3_same
0,01M015,2019,1,0,0,1,1,0,0,0,0,1,0,0,1,0
1,01M019,2019,1,0,0,0,0,0,0,0,0,1,0,0,0,1
2,01M020,2019,0,0,0,0,1,1,0,0,0,1,0,0,1,0
3,01M034,2019,1,0,0,0,0,0,0,0,0,1,0,0,1,0
4,01M063,2019,0,0,1,0,0,1,0,0,0,1,0,0,1,0


In [11]:
# Merge dataframes to continue the cleaning process

funding_df = funding_2016_2017.append([funding_2017_2018, funding_2018_2019])

In [12]:
# count number of funding sources for Q1

funding_df['Q1_funding_sources'] = funding_df['Q1_cultural_org'] + funding_df['Q1_education_assoc'] + funding_df['Q1_grants'] + funding_df['Q1_local_business'] + funding_df['Q1_private_foundation'] + funding_df['Q1_PTA_PA'] + funding_df['Q1_arts_council']

# set labels for Q2
q2_cond1 = (funding_df['Q2_abundant'] == 1)
funding_df.loc[q2_cond1, 'Q2_label'] = 'abundant'

q2_cond2 = (funding_df['Q2_sufficient'] == 1)
funding_df.loc[q2_cond2, 'Q2_label'] = 'sufficient'

q2_cond3 = (funding_df['Q2_insufficient'] == 1)
funding_df.loc[q2_cond3, 'Q2_label'] = 'insufficient'

q2_cond4 = (funding_df['Q2_na'] == 1)
funding_df.loc[q2_cond4, 'Q2_label'] = 'na'


# set labels for Q3
q3_cond1 = (funding_df['Q3_increased'] == 1)
funding_df.loc[q3_cond1, 'Q3_label'] = 'increased'

q3_cond2 = (funding_df['Q3_decreased'] == 1)
funding_df.loc[q3_cond2, 'Q3_label'] = 'decreased'

q3_cond3 = (funding_df['Q3_same'] == 1)
funding_df.loc[q3_cond3, 'Q3_label'] = 'same'

funding_df = funding_df[['DBN', 'Year', 'Q1_funding_sources', 'Q2_label', 'Q3_label',
        'Q1_cultural_org', 'Q1_education_assoc', 'Q1_grants',
       'Q1_local_business', 'Q1_private_foundation', 'Q1_PTA_PA',
       'Q1_arts_council', 'Q2_abundant', 'Q2_sufficient', 'Q2_insufficient',
       'Q2_na', 'Q3_increased', 'Q3_decreased', 'Q3_same']]

# create smaller df with columns of interest

funding_analysis_df = funding_df[['DBN', 'Year', 'Q1_funding_sources', 'Q2_label', 'Q3_label']]

funding_analysis_df.head()

Unnamed: 0,DBN,Year,Q1_funding_sources,Q2_label,Q3_label
0,01M015,2017,5,insufficient,same
1,01M019,2017,0,insufficient,same
2,01M020,2017,2,insufficient,decreased
3,01M034,2017,0,insufficient,decreased
4,01M515,2017,2,insufficient,decreased


### Explore Data

In [13]:
# Note: we may want to use dummy coding when doing analyses like correlation and regression

# Q1: Type of funding sources by DBN

# Q1: Number of funding sources reported by DBN

funding_analysis_df.sort_values(by='Q1_funding_sources').head(20)

# Q2: Funding rating (abundant, sufficient, insufficient, N/A)

funding_analysis_df.groupby(['Q2_label', 'Year']).mean().reset_index()

#Q3: Increase/decrease/no change in funding

funding_analysis_df.groupby(['Q3_label', 'Year']).mean().reset_index()


# Group by all three questions to create "types"
funding_analysis_df.groupby(['Q2_label', 'Q3_label', 'Year']).mean().reset_index()

Unnamed: 0,Q2_label,Q3_label,Year,Q1_funding_sources
0,abundant,decreased,2017,2.0
1,abundant,increased,2017,2.266667
2,abundant,increased,2018,2.052632
3,abundant,increased,2019,2.285714
4,abundant,same,2017,1.818182
5,abundant,same,2018,1.444444
6,abundant,same,2019,1.235294
7,insufficient,decreased,2017,1.428994
8,insufficient,decreased,2018,1.481481
9,insufficient,decreased,2019,1.464819


In [14]:
funding_analysis_df.groupby(['DBN', 'Year', 'Q2_label', 'Q3_label']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Q1_funding_sources
DBN,Year,Q2_label,Q3_label,Unnamed: 4_level_1
01M015,2017,insufficient,same,5
01M015,2018,insufficient,same,6
01M015,2019,insufficient,decreased,3
01M019,2017,insufficient,same,0
01M019,2018,insufficient,same,0
...,...,...,...,...
75X754,2018,sufficient,same,0
75X754,2019,sufficient,same,0
75X811,2017,sufficient,decreased,0
75X811,2018,sufficient,increased,0


## Dataset 2: ELA and Math Data

### API

In [28]:
# # define API URLs
math_url = 'https://data.cityofnewyork.us/resource/m27t-ht3h.json'
ela_url = 'https://data.cityofnewyork.us/resource/qkpp-pbi8.json'

In [30]:
# # retrieve 2013-2018 School Math Results
math_results = requests.get(math_url).json()

math_df = pd.DataFrame(math_results)
math_df.head()
print(f'Math Results Rows: {len(math_df)}')
print(list(math_df.columns))

Math Results Rows: 1000
['dbn', 'school_name', 'grade', 'year', 'category', 'number_tested', 'mean_scale_score', 'level_1', 'level_1_1', 'level_2', 'level_2_1', 'level_3', 'level_3_1', 'level_4', 'level_4_1', 'level_3_4', 'level_3_4_1']


In [16]:
# # retrieve 2013-2018 School ELA Results
# ela_results = requests.get(ela_url).json()

# ela_df = pd.DataFrame(ela_results)
# ela_df.head()
# print(f'ELA Results Rows: {len(ela_df)}')
# print(list(ela_df.columns))
# print(ela_df['year'].unique())

ELA Results Rows: 1000
['dbn', 'school_name', 'grade', 'year', 'category', 'number_tested', 'mean_scale_score', 'level_1', 'level_1_1', 'level_2', 'level_2_1', 'level_3', 'level_3_1', 'level_4', 'level_4_1', 'level_3_4', 'level_3_4_1']
['2013' '2014' '2015' '2016' '2017' '2018']


In [54]:
# can use .csvs
ela_df = pd.read_csv('2013-2019_school_ela_results.csv', low_memory=False)
math_df = pd.read_csv('2013-2019_school_math_results.csv', low_memory=False)


#rename the columns to match the 
math_df = math_df.rename(columns = {
                'Unnamed: 0': 'Unnamed',
                'School Name':'school_name', 
                'Grade': 'grade', 
                'Year': 'year', 
                'Category': 'category',
                'Number Tested': 'number_tested', 
                'Mean Scale Score': 'mean_scale_score', 
                '# Level 1': 'level_1', 
                '% Level 1': 'level_1_1',
                '# Level 2': 'level_2',
                '% Level 2': 'level_2_1',
                '# Level 3': 'level_3',
                '% Level 3': 'level_3_1',
                '# Level 4': 'level_4',
                '% Level 4': 'level_4_1',
                '# Level 3+4':'level_3_4',
                '% Level 3+4':'level_3_4_1'})

ela_df = ela_df.rename(columns = {
                'Unnamed: 0': 'Unnamed',
                'School Name':'school_name', 
                'Grade': 'grade', 
                'Year': 'year', 
                'Category': 'category',
                'Number Tested': 'number_tested', 
                'Mean Scale Score': 'mean_scale_score', 
                '# Level 1': 'level_1', 
                '% Level 1': 'level_1_1',
                '# Level 2': 'level_2',
                '% Level 2': 'level_2_1',
                '# Level 3': 'level_3',
                '% Level 3': 'level_3_1',
                '# Level 4': 'level_4',
                '% Level 4': 'level_4_1',
                '# Level 3+4':'level_3_4',
                '% Level 3+4':'level_3_4_1'})

## Clean/Explore Dataset 2

In [55]:
# convert columns to numeric
cols = ['number_tested', 'mean_scale_score', 'level_1', 'level_1_1', 'level_2', 'level_2_1', 'level_3', 'level_3_1', 'level_4', 'level_4_1', 'level_3_4', 'level_3_4_1']
math_df[cols] = math_df[cols].apply(pd.to_numeric, errors='coerce')

# convert columns to numeric
cols = ['number_tested', 'mean_scale_score', 'level_1', 'level_1_1', 'level_2', 'level_2_1', 'level_3', 'level_3_1', 'level_4', 'level_4_1', 'level_3_4', 'level_3_4_1']
ela_df[cols] = ela_df[cols].apply(pd.to_numeric, errors='coerce')

In [56]:
# look at math performance by year
math_df.groupby(['year']).mean().sort_values(by='mean_scale_score', ascending = False).round()


Unnamed: 0_level_0,number_tested,mean_scale_score,level_1,level_1_1,level_2,level_2_1,level_3,level_3_1,level_4,level_4_1,level_3_4,level_3_4_1
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2019,154.0,599.0,44.0,31.0,38.0,25.0,34.0,22.0,38.0,22.0,72.0,44.0
2018,155.0,598.0,48.0,33.0,39.0,26.0,34.0,22.0,35.0,19.0,69.0,41.0
2017,158.0,301.0,50.0,34.0,47.0,30.0,33.0,20.0,29.0,16.0,62.0,36.0
2016,159.0,301.0,49.0,34.0,50.0,32.0,32.0,19.0,29.0,16.0,61.0,35.0
2014,166.0,301.0,52.0,34.0,54.0,33.0,35.0,20.0,25.0,13.0,60.0,34.0
2015,161.0,301.0,51.0,34.0,52.0,32.0,34.0,20.0,26.0,14.0,60.0,34.0
2013,174.0,298.0,59.0,37.0,60.0,34.0,34.0,18.0,21.0,11.0,55.0,29.0


In [63]:
ela_funding = pd.merge(funding_analysis_df, ela_df, on = 'DBN')
math_funding = pd.merge(funding_analysis_df, ela_df, on = 'DBN')

ela_funding.columns

Index(['DBN', 'Year', 'Q1_funding_sources', 'Q2_label', 'Q3_label', 'Unnamed',
       'school_name', 'grade', 'year', 'category', 'number_tested',
       'mean_scale_score', 'level_1', 'level_1_1', 'level_2', 'level_2_1',
       'level_3', 'level_3_1', 'level_4', 'level_4_1', 'level_3_4',
       'level_3_4_1'],
      dtype='object')

In [64]:
# grab pass rate data 
ela_pass_rate = ela_funding[['DBN', 'Year', 'Q1_funding_sources', 'Q2_label', 'Q3_label',
       'school_name', 'grade', 'year', 'category', 'number_tested','level_3_4',
       'level_3_4_1']]
math_pass_rate = math_funding[['DBN', 'Year', 'Q1_funding_sources', 'Q2_label', 'Q3_label',
       'school_name', 'grade', 'year', 'category', 'number_tested','level_3_4',
       'level_3_4_1']]

## Analysis

### Descriptives

In [67]:
# 1116
sample_size = len(ela_pass_rate['DBN'].unique())

In [None]:
# Summary table:

funding_summary = pd.DataFrame({'Sample size': sample_size})

### Statistical Tests

In [None]:
# T-Test: Is there a statistically signficant difference in state test pass rates for schools that have self-reported good funding versus schools that have self-reported poor funding?

# ELA

# Measure 1: Sufficient/Insuffcient
# ela_good_funding_measure1 =
# ela_poor_funding_measure1 =

# Measure 2: Increasing/decreasing
# ela_good_funding_measure2 =
# ela_poor_funding_measure2 =

# MATH

# Measure 1: Sufficient/Insuffcient
# math_good_funding_measure1 =
# math_poor_funding_measure1 =

# Measure 2: Increasing/decreasing
# math_good_funding_measure2 =
# math_poor_funding_measure2 =

In [None]:
# Regression/Correlation: Is there linear relationship between number of funding sources and ELA/Math pass rate