# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [1]:
# -*- coding: utf-8 -*-
"""
Created on Tue May 29 14:38:16 2018
@author: Shiva
"""
import os
#import csv
import pandas as pd
import numpy as np

#create path variable and assign relevant file names.
fp_students=os.path.join("raw_data","students_complete.csv")
fp_schools=os.path.join("raw_data","schools_complete.csv")

#read files into pandas data frames.
students_df = pd.read_csv(fp_students)
print(students_df.head(3))

schools_df  = pd.read_csv(fp_schools)
print(schools_df.head(3))



   Student ID             name gender grade             school  reading_score  \
0           0     Paul Bradley      M   9th  Huang High School             66   
1           1     Victor Smith      M  12th  Huang High School             94   
2           2  Kevin Rodriguez      M  12th  Huang High School             90   

   math_score  
0          79  
1          61  
2          60  
   School ID                  name      type  size   budget
0          0     Huang High School  District  2917  1910635
1          1  Figueroa High School  District  2949  1884411
2          2   Shelton High School   Charter  1761  1056600


In [2]:

student_count = students_df["Student ID"].count()
student_count


39170

In [3]:
school_count= schools_df["name"].count()
school_count

15

In [4]:
students_df.columns


Index(['Student ID', 'name', 'gender', 'grade', 'school', 'reading_score',
       'math_score'],
      dtype='object')

In [5]:
tot_budget = schools_df["budget"].sum()
tot_budget

24649428

In [6]:
mean_reading_score = students_df["reading_score"].mean()
mean_reading_score

81.87784018381414

In [7]:
mean_math_score = students_df["math_score"].mean()
mean_math_score

78.98537145774827

In [15]:
pass_math_df_cnt = students_df.loc[students_df["math_score"] > 70,:]["Student ID"].count()
pass_math_df_cnt

28356

In [16]:
pass_math_percent = (pass_math_df_cnt / student_count)*100
pass_math_percent

72.39213683941792

In [17]:
pass_reading_df_cnt = students_df.loc[students_df["reading_score"] > 70]["Student ID"].count()
pass_reading_df_cnt

32500

In [18]:
pass_reading_percent = (pass_reading_df_cnt / student_count)*100
pass_reading_percent


82.97166198621395

In [12]:
overall_pass_percent = (pass_math_percent + pass_reading_percent) / 2 
overall_pass_percent

77.68189941281594

In [78]:
summary_dict={"Total Schools": [school_count], "Total Students": [student_count],"Total Budget":24649428,
              "Average Math Score":mean_math_score,"Average Reading Score":mean_reading_score,
              "% Passing Math":pass_math_percent,"% Passing Reading":pass_reading_percent,
              "Overall Passing Rate":overall_pass_percent
             }

summary_dist_df = pd.DataFrame(summary_dict,columns=["Total Schools","Total Students","Total Budget","Average Math Score",
                                                     "Average Reading Score","% Passing Math","% Passing Reading",
                                                     "Overall Passing Rate"
                                                    ])
summary_dist_df.head()

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,24649428,78.985371,81.87784,72.392137,82.971662,77.681899


In [79]:
summary_dist_df["Total Students"] = summary_dist_df["Total Students"].map("{:,}".format)


## District Summary

In [80]:
summary_dist_df["Total Budget"]   = summary_dist_df["Total Budget"].map("${:,.2f}".format)
summary_dist_df.head()

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,"$24,649,428.00",78.985371,81.87784,72.392137,82.971662,77.681899


In [136]:
schools_ren_df = schools_df.rename(columns={"name": "school"})
#schools_ren_df = schools_ren_df.set_index("school")
del schools_ren_df["School ID"]
schools_ren_df.head(20)


Unnamed: 0,school,type,size,budget
0,Huang High School,District,2917,1910635
1,Figueroa High School,District,2949,1884411
2,Shelton High School,Charter,1761,1056600
3,Hernandez High School,District,4635,3022020
4,Griffin High School,Charter,1468,917500
5,Wilson High School,Charter,2283,1319574
6,Cabrera High School,Charter,1858,1081356
7,Bailey High School,District,4976,3124928
8,Holden High School,Charter,427,248087
9,Pena High School,Charter,962,585858


In [134]:
#mean of reading and math scores by school
students_df_groupsch_avg = students_df.groupby('school').mean()
del students_df_groupsch_avg["Student ID"]
students_df_groupsch_avg.reset_index(level=0, inplace=True)
students_df_groupsch_avg = students_df_groupsch_avg.rename(columns={"reading_score":"avg_reading_score",
                                                                    "math_score":"avg_math_score"})
#students_df_groupsch_avg["school"]=students_df_groupsch_avg.index
students_df_groupsch_avg


Unnamed: 0,school,avg_reading_score,avg_math_score
0,Bailey High School,81.033963,77.048432
1,Cabrera High School,83.97578,83.061895
2,Figueroa High School,81.15802,76.711767
3,Ford High School,80.746258,77.102592
4,Griffin High School,83.816757,83.351499
5,Hernandez High School,80.934412,77.289752
6,Holden High School,83.814988,83.803279
7,Huang High School,81.182722,76.629414
8,Johnson High School,80.966394,77.072464
9,Pena High School,84.044699,83.839917


In [160]:
#total students by school.
students_df_groupsch_totstudents = students_df.groupby('school').count()
#students_df_groupsch_totstudents = students_df.groupby(['school'])["name"].count()

#students_df_groupsch_totstudents = pd.DataFrame(students_df_groupsch_totstudents,columns=["Total Student Count"])
students_df_groupsch_totstudents.reset_index(level=0,inplace=True)
#del students_df_groupsch_totstudents[["name","gender","grade","reading_score","math_score"]]
students_df_groupsch_totstudents= students_df_groupsch_totstudents[["school","name"]]
students_df_groupsch_totstudents = students_df_groupsch_totstudents.rename(columns={"name":"Total_student_count"})
students_df_groupsch_totstudents


Unnamed: 0,school,Total_student_count
0,Bailey High School,4976
1,Cabrera High School,1858
2,Figueroa High School,2949
3,Ford High School,2739
4,Griffin High School,1468
5,Hernandez High School,4635
6,Holden High School,427
7,Huang High School,2917
8,Johnson High School,4761
9,Pena High School,962


In [173]:
## Count of students passing math by school
#students_df_groupsch.loc[students_df_groupsch["math_score" > 70]]
#??pass_math_df_cnt = students_df.loc[students_df["math_score"] > 70,:]["Student ID"].count()
school_stu_math_df = students_df.loc[students_df["math_score"] > 70,:]
#school_stu_math_df_cnt= school_stu_math_df.groupby('school').count()
#school_stu_math_df_cnt= school_stu_math_df.groupby('school')["Student ID"].count()
school_stu_math_df_cnt= school_stu_math_df.groupby('school').count()

school_stu_math_df_cnt.reset_index(level=0,inplace=True)
school_stu_math_df_cnt=school_stu_math_df_cnt[["school","name"]]
school_stu_math_df_cnt = school_stu_math_df_cnt.rename(columns={"name":"pass_math_count"})
school_stu_math_df_cnt


Unnamed: 0,school,pass_math_count
0,Bailey High School,3216
1,Cabrera High School,1664
2,Figueroa High School,1880
3,Ford High School,1801
4,Griffin High School,1317
5,Hernandez High School,3001
6,Holden High School,387
7,Huang High School,1847
8,Johnson High School,3040
9,Pena High School,882


In [171]:
#Count of students passing reading.
school_stu_read_df = students_df.loc[students_df["reading_score"] > 70,:]
#school_stu_read_df.head(10)
#school_stu_read_df_cnt= school_stu_read_df.groupby('school')["name"].count()
school_stu_read_df_cnt= school_stu_read_df.groupby('school').count()
school_stu_read_df_cnt.reset_index(level=0,inplace=True)
school_stu_read_df_cnt=school_stu_read_df_cnt[["school","name"]]
#rename column to pass reading count 
school_stu_read_df_cnt=school_stu_read_df_cnt.rename(columns={"name":"pass_reading_cnt"})
school_stu_read_df_cnt

Unnamed: 0,school,pass_reading_cnt
0,Bailey High School,3946
1,Cabrera High School,1744
2,Figueroa High School,2313
3,Ford High School,2123
4,Griffin High School,1371
5,Hernandez High School,3624
6,Holden High School,396
7,Huang High School,2299
8,Johnson High School,3727
9,Pena High School,887


In [137]:
#merge school info and average scores for new summary dataframe1

school_mrg1= pd.merge(schools_ren_df ,students_df_groupsch_avg, on="school")
school_mrg1

Unnamed: 0,school,type,size,budget,avg_reading_score,avg_math_score
0,Huang High School,District,2917,1910635,81.182722,76.629414
1,Figueroa High School,District,2949,1884411,81.15802,76.711767
2,Shelton High School,Charter,1761,1056600,83.725724,83.359455
3,Hernandez High School,District,4635,3022020,80.934412,77.289752
4,Griffin High School,Charter,1468,917500,83.816757,83.351499
5,Wilson High School,Charter,2283,1319574,83.989488,83.274201
6,Cabrera High School,Charter,1858,1081356,83.97578,83.061895
7,Bailey High School,District,4976,3124928,81.033963,77.048432
8,Holden High School,Charter,427,248087,83.814988,83.803279
9,Pena High School,Charter,962,585858,84.044699,83.839917


In [161]:
#merge the dataframe with total students into the above mrg1 dataframe
school_mrg2 = pd.merge(school_mrg1,students_df_groupsch_totstudents, on="school")
school_mrg2


Unnamed: 0,school,type,size,budget,avg_reading_score,avg_math_score,Total_student_count
0,Huang High School,District,2917,1910635,81.182722,76.629414,2917
1,Figueroa High School,District,2949,1884411,81.15802,76.711767,2949
2,Shelton High School,Charter,1761,1056600,83.725724,83.359455,1761
3,Hernandez High School,District,4635,3022020,80.934412,77.289752,4635
4,Griffin High School,Charter,1468,917500,83.816757,83.351499,1468
5,Wilson High School,Charter,2283,1319574,83.989488,83.274201,2283
6,Cabrera High School,Charter,1858,1081356,83.97578,83.061895,1858
7,Bailey High School,District,4976,3124928,81.033963,77.048432,4976
8,Holden High School,Charter,427,248087,83.814988,83.803279,427
9,Pena High School,Charter,962,585858,84.044699,83.839917,962


Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
Huang High School        2917
Ford High School         2739
Wilson High School       2283
Cabrera High School      1858
Wright High School       1800
Shelton High School      1761
Thomas High School       1635
Griffin High School      1468
Pena High School          962
Holden High School        427
Name: school, dtype: int64

school
Bailey High School       3946
Cabrera High School      1744
Figueroa High School     2313
Ford High School         2123
Griffin High School      1371
Hernandez High School    3624
Holden High School        396
Huang High School        2299
Johnson High School      3727
Pena High School          887
Rodriguez High School    3109
Shelton High School      1631
Thomas High School       1519
Wilson High School       2129
Wright High School       1682
Name: name, dtype: int64

## School Summary

## Top Performing Schools (By Passing Rate)

## Bottom Performing Schools (By Passing Rate)

## Math Scores by Grade

## Reading Score by Grade 

## Scores by School Spending

## Scores by School Size

## Scores by School Type