# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [1]:
# -*- coding: utf-8 -*-
"""
Created on Tue May 29 14:38:16 2018
@author: Shiva
"""
import os
#import csv
import pandas as pd
import numpy as np

#create path variable and assign relevant file names.
fp_students=os.path.join("raw_data","students_complete.csv")
fp_schools=os.path.join("raw_data","schools_complete.csv")

#read files into pandas data frames.
students_df = pd.read_csv(fp_students)
print(students_df.head(3))

schools_df  = pd.read_csv(fp_schools)
print(schools_df.head(3))



   Student ID             name gender grade             school  reading_score  \
0           0     Paul Bradley      M   9th  Huang High School             66   
1           1     Victor Smith      M  12th  Huang High School             94   
2           2  Kevin Rodriguez      M  12th  Huang High School             90   

   math_score  
0          79  
1          61  
2          60  
   School ID                  name      type  size   budget
0          0     Huang High School  District  2917  1910635
1          1  Figueroa High School  District  2949  1884411
2          2   Shelton High School   Charter  1761  1056600


In [2]:

student_count = students_df["Student ID"].count()
student_count


39170

In [3]:
school_count= schools_df["name"].count()
school_count

15

In [4]:
students_df.columns


Index(['Student ID', 'name', 'gender', 'grade', 'school', 'reading_score',
       'math_score'],
      dtype='object')

In [5]:
tot_budget = schools_df["budget"].sum()
tot_budget

24649428

In [6]:
mean_reading_score = students_df["reading_score"].mean()
mean_reading_score

81.87784018381414

In [7]:
mean_math_score = students_df["math_score"].mean()
mean_math_score

78.98537145774827

In [15]:
pass_math_df_cnt = students_df.loc[students_df["math_score"] > 70,:]["Student ID"].count()
pass_math_df_cnt

28356

In [16]:
pass_math_percent = (pass_math_df_cnt / student_count)*100
pass_math_percent

72.39213683941792

In [17]:
pass_reading_df_cnt = students_df.loc[students_df["reading_score"] > 70]["Student ID"].count()
pass_reading_df_cnt

32500

In [18]:
pass_reading_percent = (pass_reading_df_cnt / student_count)*100
pass_reading_percent


82.97166198621395

In [12]:
overall_pass_percent = (pass_math_percent + pass_reading_percent) / 2 
overall_pass_percent

77.68189941281594

In [78]:
summary_dict={"Total Schools": [school_count], "Total Students": [student_count],"Total Budget":24649428,
              "Average Math Score":mean_math_score,"Average Reading Score":mean_reading_score,
              "% Passing Math":pass_math_percent,"% Passing Reading":pass_reading_percent,
              "Overall Passing Rate":overall_pass_percent
             }

summary_dist_df = pd.DataFrame(summary_dict,columns=["Total Schools","Total Students","Total Budget","Average Math Score",
                                                     "Average Reading Score","% Passing Math","% Passing Reading",
                                                     "Overall Passing Rate"
                                                    ])
summary_dist_df.head()

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,24649428,78.985371,81.87784,72.392137,82.971662,77.681899


In [79]:
summary_dist_df["Total Students"] = summary_dist_df["Total Students"].map("{:,}".format)


## District Summary

In [80]:
summary_dist_df["Total Budget"]   = summary_dist_df["Total Budget"].map("${:,.2f}".format)
summary_dist_df.head()

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,"$24,649,428.00",78.985371,81.87784,72.392137,82.971662,77.681899


In [22]:
schools_ren_df = schools_df.rename(columns={"name": "school"})
schools_ren_df.head(10)


Unnamed: 0,School ID,school,type,size,budget
0,0,Huang High School,District,2917,1910635
1,1,Figueroa High School,District,2949,1884411
2,2,Shelton High School,Charter,1761,1056600
3,3,Hernandez High School,District,4635,3022020
4,4,Griffin High School,Charter,1468,917500
5,5,Wilson High School,Charter,2283,1319574
6,6,Cabrera High School,Charter,1858,1081356
7,7,Bailey High School,District,4976,3124928
8,8,Holden High School,Charter,427,248087
9,9,Pena High School,Charter,962,585858


In [56]:
students_df_groupsch = students_df.groupby(['school'])
#students_df_groupsch.head(10)
#students_df_groupsch.reset_index(inplace=True) - not working...

In [94]:
students_df1 = students_df.set_index("school")
students_df1.head(25)
#students_df1.count()

Unnamed: 0_level_0,Student ID,name,gender,grade,reading_score,math_score
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Huang High School,0,Paul Bradley,M,9th,66,79
Huang High School,1,Victor Smith,M,12th,94,61
Huang High School,2,Kevin Rodriguez,M,12th,90,60
Huang High School,3,Dr. Richard Scott,M,12th,67,58
Huang High School,4,Bonnie Ray,F,9th,97,84
Huang High School,5,Bryan Miranda,M,9th,94,94
Huang High School,6,Sheena Carter,F,11th,82,80
Huang High School,7,Nicole Baker,F,12th,96,69
Huang High School,8,Michael Roth,M,10th,95,87
Huang High School,9,Matthew Greene,M,10th,96,84


In [97]:
students_df1.describe()
students_df1["grade"].value_counts() 

9th     11408
10th    10168
11th     9695
12th     7899
Name: grade, dtype: int64

In [52]:
#school_stu_avg_df = students_df_groupsch.mean()
#school_stu_avg_df
school_stu_detail=students_df_groupsch["reading_score","math_score"].mean()
school_stu_detail

Unnamed: 0_level_0,reading_score,math_score
school,Unnamed: 1_level_1,Unnamed: 2_level_1
Bailey High School,81.033963,77.048432
Cabrera High School,83.97578,83.061895
Figueroa High School,81.15802,76.711767
Ford High School,80.746258,77.102592
Griffin High School,83.816757,83.351499
Hernandez High School,80.934412,77.289752
Holden High School,83.814988,83.803279
Huang High School,81.182722,76.629414
Johnson High School,80.966394,77.072464
Pena High School,84.044699,83.839917


In [58]:
#determine count of students in each school.
school_stu_cnt_df = students_df["school"].value_counts()
#??school_stu_detail["Total Students"]= students_df_groupsch["school"].value_counts()
#school_stu_detail
school_stu_cnt_df

Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
Huang High School        2917
Ford High School         2739
Wilson High School       2283
Cabrera High School      1858
Wright High School       1800
Shelton High School      1761
Thomas High School       1635
Griffin High School      1468
Pena High School          962
Holden High School        427
Name: school, dtype: int64

In [74]:
#students_df_groupsch.loc[students_df_groupsch["math_score" > 70]]
#??pass_math_df_cnt = students_df.loc[students_df["math_score"] > 70,:]["Student ID"].count()
school_stu_math_df = students_df.loc[students_df["math_score"] > 70,:]

school_stu_math_df.reset_index(inplace=True)
school_stu_math_df.head()
#school_stu_math_df.groupby["school"]

Unnamed: 0,index,Student ID,name,gender,grade,school,reading_score,math_score
0,0,0,Paul Bradley,M,9th,Huang High School,66,79
1,4,4,Bonnie Ray,F,9th,Huang High School,97,84
2,5,5,Bryan Miranda,M,9th,Huang High School,94,94
3,6,6,Sheena Carter,F,11th,Huang High School,82,80
4,8,8,Michael Roth,M,10th,Huang High School,95,87


In [75]:
school_stu_read_df = students_df.loc[students_df["reading_score"] > 70,:]
school_stu_read_df.head()

Unnamed: 0,Student ID,name,gender,grade,school,reading_score,math_score
1,1,Victor Smith,M,12th,Huang High School,94,61
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60
4,4,Bonnie Ray,F,9th,Huang High School,97,84
5,5,Bryan Miranda,M,9th,Huang High School,94,94
6,6,Sheena Carter,F,11th,Huang High School,82,80


## School Summary

## Top Performing Schools (By Passing Rate)

## Bottom Performing Schools (By Passing Rate)

## Math Scores by Grade

## Reading Score by Grade 

## Scores by School Spending

## Scores by School Size

## Scores by School Type