- The school size and spending per student seems correlate. The larger the school size the lower passing rates and mean scores.
- Charter school has better performing than the district school. The top 5 perfroming schools are charter schools, and the bottom 5 are the district schools.
- Accoring to the math scores by grade and the reading scores by grade, you can see their mean scores stay consistent across grades.
# import files, create dataframe
import pandas as pd
csv_read = pd.read_csv("raw_data/schools_complete.csv")
school_df=pd.DataFrame(csv_read)
csvRead = pd.read_csv("raw_data/students_complete.csv")
student_df=pd.DataFrame(csvRead)
#rename school_df name to school:
school_df = school_df.rename(columns={"name":"school"})
school_df.head
<bound method NDFrame.head of School ID school type size budget
0 0 Huang High School District 2917 1910635
1 1 Figueroa High School District 2949 1884411
2 2 Shelton High School Charter 1761 1056600
3 3 Hernandez High School District 4635 3022020
4 4 Griffin High School Charter 1468 917500
5 5 Wilson High School Charter 2283 1319574
6 6 Cabrera High School Charter 1858 1081356
7 7 Bailey High School District 4976 3124928
8 8 Holden High School Charter 427 248087
9 9 Pena High School Charter 962 585858
10 10 Wright High School Charter 1800 1049400
11 11 Rodriguez High School District 3999 2547363
12 12 Johnson High School District 4761 3094650
13 13 Ford High School District 2739 1763916
14 14 Thomas High School Charter 1635 1043130>
# District Summary
total_school = school_df["school"].count()
#total students: merge 2 dataframe:
merge_df = pd.merge(school_df,student_df,on = "school")
total_students = merge_df["name"].count()
#total budget:
total_budget = school_df["budget"].sum()
#average math score, average reading score:
avg_math=merge_df["math_score"].mean()
avg_reading = merge_df["reading_score"].mean()
#%pass math, %pass reading
student_math= merge_df.loc[merge_df["math_score"]>=70]
pass_math = len(student_math)/total_students *100
student_reading = merge_df.loc[merge_df["reading_score"]>=70]
pass_reading = len(student_reading)/total_students *100
#overall passing rate
student_overall = merge_df.loc[(merge_df["math_score"]>=70)&(merge_df["reading_score"]>=70)]
overall= len(student_overall)/total_students *100
district_df = pd.DataFrame({"Total Schools":[total_school],
"Total Students":[total_students],
"Total Budget":[total_budget],
"Average Math Score":[avg_math],
"Average Reading Score":[avg_reading],
"% Passing Math":[pass_math],
"% Passing Reading":[pass_reading],
"% Overall Passing Rate":[overall]})
district_df = district_df[["Total Schools",
"Total Students",
"Total Budget",
"Average Math Score",
"Average Reading Score",
"% Passing Math",
"% Passing Reading",
"% Overall Passing Rate"]]
#format
district_df["Total Students"]=district_df["Total Students"].map("{:,}".format)
district_df["Total Budget"]=district_df["Total Budget"].map("${:,}".format)
district_df["Average Math Score"]=district_df["Average Math Score"].map("{:.2f}".format)
district_df["Average Reading Score"]=district_df["Average Reading Score"].map("{:.2f}".format)
district_df["% Passing Math"]=district_df["% Passing Math"].map("{:.2f}%".format)
district_df["% Passing Reading"]=district_df["% Passing Reading"].map("{:.2f}%".format)
district_df["% Overall Passing Rate"]=district_df["% Overall Passing Rate"].map("{:.2f}%".format)
district_df
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Total Schools | Total Students | Total Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|
0 | 15 | 39,170 | $24,649,428 | 78.99 | 81.88 | 74.98% | 85.81% | 65.17% |
# set school as index in school_df, get the total budget, and type of schools
school_index = school_df.set_index("school")
sch_budget = school_index["budget"]
sch_type = school_index["type"]
# budget per student
student_budget = sch_budget / school_index["size"]
# set school as index, and then group the school for merge_df
index = merge_df.set_index('school')
grouped = index.groupby(['school'])
# avg math score, avg reading score by school
mean_math = grouped["math_score"].mean()
mean_reading = grouped["reading_score"].mean()
total_stu = grouped["name"].count()
#pass math
grouped_math = student_math.groupby("school")
pass_math_percent = grouped_math["name"].count() / total_stu *100
#pass reading
grouped_reading = student_reading.groupby("school")
pass_reading_percent = grouped_reading["name"].count() / total_stu *100
#overall passing
grouped_overall = student_overall.groupby("school")
overall_passing = grouped_overall["name"].count() / total_stu *100
sch_summary=pd.DataFrame({"School Type":sch_type,
"Total Students":total_stu,
"Total School Budget":sch_budget,
"Per Student Budget":student_budget,
"Average Math Score":mean_math,
"Average Reading Score":mean_reading,
"% Passing Math":pass_math_percent,
"% Passing Reading":pass_reading_percent,
"% Overall Passing Rate":overall_passing})
sch_summary = sch_summary[["School Type",
"Total Students",
"Total School Budget",
"Per Student Budget",
"Average Math Score",
"Average Reading Score",
"% Passing Math",
"% Passing Reading",
"% Overall Passing Rate"]]
sch_summary.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total School Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Bailey High School | District | 4976 | 3124928 | 628.0 | 77.048432 | 81.033963 | 66.680064 | 81.933280 | 54.642283 |
Cabrera High School | Charter | 1858 | 1081356 | 582.0 | 83.061895 | 83.975780 | 94.133477 | 97.039828 | 91.334769 |
Figueroa High School | District | 2949 | 1884411 | 639.0 | 76.711767 | 81.158020 | 65.988471 | 80.739234 | 53.204476 |
Ford High School | District | 2739 | 1763916 | 644.0 | 77.102592 | 80.746258 | 68.309602 | 79.299014 | 54.289887 |
Griffin High School | Charter | 1468 | 917500 | 625.0 | 83.351499 | 83.816757 | 93.392371 | 97.138965 | 90.599455 |
#Top Performing Schools (By Passing Rate)
top_performing = sch_summary.sort_values(["% Overall Passing Rate"],ascending = False)
top_performing.head(5)
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total School Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Cabrera High School | Charter | 1858 | 1081356 | 582.0 | 83.061895 | 83.975780 | 94.133477 | 97.039828 | 91.334769 |
Thomas High School | Charter | 1635 | 1043130 | 638.0 | 83.418349 | 83.848930 | 93.272171 | 97.308869 | 90.948012 |
Griffin High School | Charter | 1468 | 917500 | 625.0 | 83.351499 | 83.816757 | 93.392371 | 97.138965 | 90.599455 |
Wilson High School | Charter | 2283 | 1319574 | 578.0 | 83.274201 | 83.989488 | 93.867718 | 96.539641 | 90.582567 |
Pena High School | Charter | 962 | 585858 | 609.0 | 83.839917 | 84.044699 | 94.594595 | 95.945946 | 90.540541 |
#Bottom Performing Schools (By passing Rate)
bottom_performing = sch_summary.sort_values(["% Overall Passing Rate"])
bottom_performing.head(5)
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total School Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Rodriguez High School | District | 3999 | 2547363 | 637.0 | 76.842711 | 80.744686 | 66.366592 | 80.220055 | 52.988247 |
Figueroa High School | District | 2949 | 1884411 | 639.0 | 76.711767 | 81.158020 | 65.988471 | 80.739234 | 53.204476 |
Huang High School | District | 2917 | 1910635 | 655.0 | 76.629414 | 81.182722 | 65.683922 | 81.316421 | 53.513884 |
Hernandez High School | District | 4635 | 3022020 | 652.0 | 77.289752 | 80.934412 | 66.752967 | 80.862999 | 53.527508 |
Johnson High School | District | 4761 | 3094650 | 650.0 | 77.072464 | 80.966394 | 66.057551 | 81.222432 | 53.539172 |
#Math Scores by Grade
ninth = student_df.loc[student_df["grade"]=='9th'].groupby("school")
ninth_math = ninth["math_score"].mean()
tenth = student_df.loc[student_df["grade"]=='10th'].groupby("school")
tenth_math = tenth["math_score"].mean()
eleventh=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleventh_math = eleventh["math_score"].mean()
twelfth = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelfth_math = twelfth["math_score"].mean()
math_grade = pd.DataFrame({"9th":ninth_math,
"10th":tenth_math,
"11th":eleventh_math,
"12th":twelfth_math})
math_grade=math_grade[["9th","10th","11th","12th"]]
math_grade.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
9th | 10th | 11th | 12th | |
---|---|---|---|---|
school | ||||
Bailey High School | 77.083676 | 76.996772 | 77.515588 | 76.492218 |
Cabrera High School | 83.094697 | 83.154506 | 82.765560 | 83.277487 |
Figueroa High School | 76.403037 | 76.539974 | 76.884344 | 77.151369 |
Ford High School | 77.361345 | 77.672316 | 76.918058 | 76.179963 |
Griffin High School | 82.044010 | 84.229064 | 83.842105 | 83.356164 |
#Reading Score by Grade
nine = student_df.loc[student_df["grade"]=='9th'].groupby("school")
nine_reading = nine["reading_score"].mean()
ten = student_df.loc[student_df["grade"]=='10th'].groupby("school")
ten_reading = ten["reading_score"].mean()
eleven=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleven_reading = eleven["reading_score"].mean()
twelve = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelve_reading = twelve["reading_score"].mean()
reading_grade = pd.DataFrame({"9th":nine_reading,
"10th":ten_reading,
"11th":eleven_reading,
"12th":twelve_reading})
reading_grade=reading_grade[["9th","10th","11th","12th"]]
reading_grade.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
9th | 10th | 11th | 12th | |
---|---|---|---|---|
school | ||||
Bailey High School | 81.303155 | 80.907183 | 80.945643 | 80.912451 |
Cabrera High School | 83.676136 | 84.253219 | 83.788382 | 84.287958 |
Figueroa High School | 81.198598 | 81.408912 | 80.640339 | 81.384863 |
Ford High School | 80.632653 | 81.262712 | 80.403642 | 80.662338 |
Griffin High School | 83.369193 | 83.706897 | 84.288089 | 84.013699 |
#Scores by School Spending
score_summary = pd.DataFrame(sch_summary)
bins = [0,585,615,645,675]
group_names = ["< $585","$585-615","$615-645","$645-675"]
score_summary["Per Student Budget"]=pd.cut(score_summary["Per Student Budget"],bins,labels = group_names)
#rename per student budget to "Spending Ranges (Per Student)"
score = score_summary.rename(columns={"Per Student Budget":"Spending Ranges (Per Student)"})
#group the "Spending Ranges (Per Student)"
score = score.groupby(["Spending Ranges (Per Student)"])
score_spending = pd.DataFrame(score["Average Math Score"].mean())
score_spending["Average Reading Score"]=score["Average Reading Score"].mean()
score_spending["% Passing Math"]=score["% Passing Math"].mean()
score_spending["% Passing Reading"]=score["% Passing Reading"].mean()
score_spending["% Overall Passing Rate"]=score["% Overall Passing Rate"].mean()
score_spending
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|
Spending Ranges (Per Student) | |||||
< $585 | 83.455399 | 83.933814 | 93.460096 | 96.610877 | 90.369459 |
$585-615 | 83.599686 | 83.885211 | 94.230858 | 95.900287 | 90.216324 |
$615-645 | 79.079225 | 81.891436 | 75.668212 | 86.106569 | 66.112060 |
$645-675 | 76.997210 | 81.027843 | 66.164813 | 81.133951 | 53.526855 |
sch_size=pd.DataFrame(sch_summary)
#rename total students to "School Size"
sch_size = sch_size.rename(columns={"Total Students":"School Size"})
#bin number to groups_size
bins_size = [0,1000,2000,5000]
groups_size = ["Small (<1000)","Medium (1000-2000)","Large (2000-5000)"]
sch_size["School Size"]=pd.cut(sch_size["School Size"],bins_size,labels = groups_size)
#groupby school size
size_group = sch_size.groupby(["School Size"])
size_score = pd.DataFrame(size_group["Average Math Score"].mean())
size_score["Average Reading Score"]=size_group["Average Reading Score"].mean()
size_score["% Passing Math"]=size_group["% Passing Math"].mean()
size_score["% Passing Reading"]=size_group["% Passing Reading"].mean()
size_score["% Overall Passing Rate"]=size_group["% Overall Passing Rate"].mean()
size_score
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|
School Size | |||||
Small (<1000) | 83.821598 | 83.929843 | 93.550225 | 96.099437 | 89.883853 |
Medium (1000-2000) | 83.374684 | 83.864438 | 93.599695 | 96.790680 | 90.621535 |
Large (2000-5000) | 77.746417 | 81.344493 | 69.963361 | 82.766634 | 58.286003 |
#groupby school type
group_type = sch_summary.groupby(["School Type"])
type_score = pd.DataFrame(group_type["Average Math Score"].mean())
type_score["Average Reading Score"]=group_type["Average Reading Score"].mean()
type_score["% Passing Math"]=group_type["% Passing Math"].mean()
type_score["% Passing Reading"]=group_type["% Passing Reading"].mean()
type_score["% Overall Passing Rate"]=group_type["% Overall Passing Rate"].mean()
type_score
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|
School Type | |||||
Charter | 83.473852 | 83.896421 | 93.620830 | 96.586489 | 90.432244 |
District | 76.956733 | 80.966636 | 66.548453 | 80.799062 | 53.672208 |