GitHub - weekysui/School-district-and-funding-report

PySchool Analysis:

The school size and spending per student seems correlate. The larger the school size the lower passing rates and mean scores.
Charter school has better performing than the district school. The top 5 perfroming schools are charter schools, and the bottom 5 are the district schools.
Accoring to the math scores by grade and the reading scores by grade, you can see their mean scores stay consistent across grades.

District summary

# import files, create dataframe
import pandas as pd
csv_read = pd.read_csv("raw_data/schools_complete.csv")
school_df=pd.DataFrame(csv_read)
csvRead = pd.read_csv("raw_data/students_complete.csv")
student_df=pd.DataFrame(csvRead)
#rename school_df name to school:
school_df = school_df.rename(columns={"name":"school"})
school_df.head

<bound method NDFrame.head of     School ID                 school      type  size   budget
0           0      Huang High School  District  2917  1910635
1           1   Figueroa High School  District  2949  1884411
2           2    Shelton High School   Charter  1761  1056600
3           3  Hernandez High School  District  4635  3022020
4           4    Griffin High School   Charter  1468   917500
5           5     Wilson High School   Charter  2283  1319574
6           6    Cabrera High School   Charter  1858  1081356
7           7     Bailey High School  District  4976  3124928
8           8     Holden High School   Charter   427   248087
9           9       Pena High School   Charter   962   585858
10         10     Wright High School   Charter  1800  1049400
11         11  Rodriguez High School  District  3999  2547363
12         12    Johnson High School  District  4761  3094650
13         13       Ford High School  District  2739  1763916
14         14     Thomas High School   Charter  1635  1043130>

# District Summary
total_school = school_df["school"].count()
#total students: merge 2 dataframe:
merge_df = pd.merge(school_df,student_df,on = "school")
total_students = merge_df["name"].count()
#total budget:
total_budget = school_df["budget"].sum()
#average math score, average reading score:
avg_math=merge_df["math_score"].mean()
avg_reading = merge_df["reading_score"].mean()
#%pass math, %pass reading
student_math= merge_df.loc[merge_df["math_score"]>=70]
pass_math = len(student_math)/total_students *100
student_reading = merge_df.loc[merge_df["reading_score"]>=70]
pass_reading = len(student_reading)/total_students *100
#overall passing rate
student_overall = merge_df.loc[(merge_df["math_score"]>=70)&(merge_df["reading_score"]>=70)]
overall= len(student_overall)/total_students *100

district_df = pd.DataFrame({"Total Schools":[total_school],
                           "Total Students":[total_students],
                           "Total Budget":[total_budget],
                           "Average Math Score":[avg_math],
                           "Average Reading Score":[avg_reading],
                           "% Passing Math":[pass_math],
                           "% Passing Reading":[pass_reading],
                           "% Overall Passing Rate":[overall]})
district_df = district_df[["Total Schools",
                           "Total Students",
                           "Total Budget",
                           "Average Math Score",
                           "Average Reading Score",
                           "% Passing Math",
                           "% Passing Reading",
                           "% Overall Passing Rate"]]
#format
district_df["Total Students"]=district_df["Total Students"].map("{:,}".format)
district_df["Total Budget"]=district_df["Total Budget"].map("${:,}".format)
district_df["Average Math Score"]=district_df["Average Math Score"].map("{:.2f}".format)
district_df["Average Reading Score"]=district_df["Average Reading Score"].map("{:.2f}".format)
district_df["% Passing Math"]=district_df["% Passing Math"].map("{:.2f}%".format)
district_df["% Passing Reading"]=district_df["% Passing Reading"].map("{:.2f}%".format)
district_df["% Overall Passing Rate"]=district_df["% Overall Passing Rate"].map("{:.2f}%".format)

district_df

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Total Schools	Total Students	Total Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
0	15	39,170	$24,649,428	78.99	81.88	74.98%	85.81%	65.17%

School Summary

# set school as index in school_df, get the total budget, and type of schools
school_index = school_df.set_index("school")
sch_budget = school_index["budget"]
sch_type = school_index["type"]
# budget per student
student_budget = sch_budget / school_index["size"]

# set school as index, and then group the school for merge_df
index = merge_df.set_index('school')
grouped = index.groupby(['school'])
# avg math score, avg reading score by school
mean_math = grouped["math_score"].mean()
mean_reading = grouped["reading_score"].mean()
total_stu = grouped["name"].count()
#pass math
grouped_math = student_math.groupby("school")
pass_math_percent = grouped_math["name"].count() / total_stu *100
#pass reading
grouped_reading = student_reading.groupby("school")
pass_reading_percent = grouped_reading["name"].count() / total_stu *100

#overall passing
grouped_overall = student_overall.groupby("school")
overall_passing = grouped_overall["name"].count() / total_stu *100

sch_summary=pd.DataFrame({"School Type":sch_type,
                         "Total Students":total_stu,
                         "Total School Budget":sch_budget,
                         "Per Student Budget":student_budget,
                         "Average Math Score":mean_math,
                         "Average Reading Score":mean_reading,
                         "% Passing Math":pass_math_percent,
                         "% Passing Reading":pass_reading_percent,
                         "% Overall Passing Rate":overall_passing})
sch_summary = sch_summary[["School Type",
                           "Total Students",
                           "Total School Budget",
                           "Per Student Budget",
                           "Average Math Score",
                           "Average Reading Score",
                           "% Passing Math",
                           "% Passing Reading",
                           "% Overall Passing Rate"]]

sch_summary.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Bailey High School	District	4976	3124928	628.0	77.048432	81.033963	66.680064	81.933280	54.642283
Cabrera High School	Charter	1858	1081356	582.0	83.061895	83.975780	94.133477	97.039828	91.334769
Figueroa High School	District	2949	1884411	639.0	76.711767	81.158020	65.988471	80.739234	53.204476
Ford High School	District	2739	1763916	644.0	77.102592	80.746258	68.309602	79.299014	54.289887
Griffin High School	Charter	1468	917500	625.0	83.351499	83.816757	93.392371	97.138965	90.599455

Top Performing Schools (By Passing Rate)

#Top Performing Schools (By Passing Rate)
top_performing = sch_summary.sort_values(["% Overall Passing Rate"],ascending = False)
top_performing.head(5)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Cabrera High School	Charter	1858	1081356	582.0	83.061895	83.975780	94.133477	97.039828	91.334769
Thomas High School	Charter	1635	1043130	638.0	83.418349	83.848930	93.272171	97.308869	90.948012
Griffin High School	Charter	1468	917500	625.0	83.351499	83.816757	93.392371	97.138965	90.599455
Wilson High School	Charter	2283	1319574	578.0	83.274201	83.989488	93.867718	96.539641	90.582567
Pena High School	Charter	962	585858	609.0	83.839917	84.044699	94.594595	95.945946	90.540541

Bottom Performing Schools (By Passing Rate)

#Bottom Performing Schools (By passing Rate)
bottom_performing = sch_summary.sort_values(["% Overall Passing Rate"])
bottom_performing.head(5)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Rodriguez High School	District	3999	2547363	637.0	76.842711	80.744686	66.366592	80.220055	52.988247
Figueroa High School	District	2949	1884411	639.0	76.711767	81.158020	65.988471	80.739234	53.204476
Huang High School	District	2917	1910635	655.0	76.629414	81.182722	65.683922	81.316421	53.513884
Hernandez High School	District	4635	3022020	652.0	77.289752	80.934412	66.752967	80.862999	53.527508
Johnson High School	District	4761	3094650	650.0	77.072464	80.966394	66.057551	81.222432	53.539172

Math scores by grade

#Math Scores by Grade
ninth = student_df.loc[student_df["grade"]=='9th'].groupby("school")
ninth_math = ninth["math_score"].mean()
tenth = student_df.loc[student_df["grade"]=='10th'].groupby("school")
tenth_math = tenth["math_score"].mean()
eleventh=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleventh_math = eleventh["math_score"].mean()
twelfth = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelfth_math = twelfth["math_score"].mean()

math_grade = pd.DataFrame({"9th":ninth_math,
                          "10th":tenth_math,
                          "11th":eleventh_math,
                          "12th":twelfth_math})
math_grade=math_grade[["9th","10th","11th","12th"]]

math_grade.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th
school
Bailey High School	77.083676	76.996772	77.515588	76.492218
Cabrera High School	83.094697	83.154506	82.765560	83.277487
Figueroa High School	76.403037	76.539974	76.884344	77.151369
Ford High School	77.361345	77.672316	76.918058	76.179963
Griffin High School	82.044010	84.229064	83.842105	83.356164

Reading Scores by Grade

#Reading Score by Grade
nine = student_df.loc[student_df["grade"]=='9th'].groupby("school")
nine_reading = nine["reading_score"].mean()
ten  = student_df.loc[student_df["grade"]=='10th'].groupby("school")
ten_reading = ten["reading_score"].mean()
eleven=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleven_reading = eleven["reading_score"].mean()
twelve = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelve_reading = twelve["reading_score"].mean()

reading_grade = pd.DataFrame({"9th":nine_reading,
                          "10th":ten_reading,
                          "11th":eleven_reading,
                          "12th":twelve_reading})
reading_grade=reading_grade[["9th","10th","11th","12th"]]

reading_grade.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th
school
Bailey High School	81.303155	80.907183	80.945643	80.912451
Cabrera High School	83.676136	84.253219	83.788382	84.287958
Figueroa High School	81.198598	81.408912	80.640339	81.384863
Ford High School	80.632653	81.262712	80.403642	80.662338
Griffin High School	83.369193	83.706897	84.288089	84.013699

Scores by School Spending

#Scores by School Spending
score_summary = pd.DataFrame(sch_summary)
bins = [0,585,615,645,675]
group_names = ["< $585","$585-615","$615-645","$645-675"]
score_summary["Per Student Budget"]=pd.cut(score_summary["Per Student Budget"],bins,labels = group_names)

#rename per student budget to "Spending Ranges (Per Student)"
score = score_summary.rename(columns={"Per Student Budget":"Spending Ranges (Per Student)"})

#group the "Spending Ranges (Per Student)"
score = score.groupby(["Spending Ranges (Per Student)"])
score_spending = pd.DataFrame(score["Average Math Score"].mean())
score_spending["Average Reading Score"]=score["Average Reading Score"].mean()
score_spending["% Passing Math"]=score["% Passing Math"].mean()
score_spending["% Passing Reading"]=score["% Passing Reading"].mean()
score_spending["% Overall Passing Rate"]=score["% Overall Passing Rate"].mean()

score_spending

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Spending Ranges (Per Student)
< $585	83.455399	83.933814	93.460096	96.610877	90.369459
$585-615	83.599686	83.885211	94.230858	95.900287	90.216324
$615-645	79.079225	81.891436	75.668212	86.106569	66.112060
$645-675	76.997210	81.027843	66.164813	81.133951	53.526855

Scores by School Size

sch_size=pd.DataFrame(sch_summary)
#rename total students to "School Size"
sch_size = sch_size.rename(columns={"Total Students":"School Size"})
#bin number to groups_size
bins_size = [0,1000,2000,5000]
groups_size = ["Small (<1000)","Medium (1000-2000)","Large (2000-5000)"]
sch_size["School Size"]=pd.cut(sch_size["School Size"],bins_size,labels = groups_size)
#groupby school size
size_group = sch_size.groupby(["School Size"])
size_score = pd.DataFrame(size_group["Average Math Score"].mean())
size_score["Average Reading Score"]=size_group["Average Reading Score"].mean()
size_score["% Passing Math"]=size_group["% Passing Math"].mean()
size_score["% Passing Reading"]=size_group["% Passing Reading"].mean()
size_score["% Overall Passing Rate"]=size_group["% Overall Passing Rate"].mean()

size_score

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
School Size
Small (<1000)	83.821598	83.929843	93.550225	96.099437	89.883853
Medium (1000-2000)	83.374684	83.864438	93.599695	96.790680	90.621535
Large (2000-5000)	77.746417	81.344493	69.963361	82.766634	58.286003

Scores by School Type

#groupby school type
group_type = sch_summary.groupby(["School Type"])
type_score = pd.DataFrame(group_type["Average Math Score"].mean())
type_score["Average Reading Score"]=group_type["Average Reading Score"].mean()
type_score["% Passing Math"]=group_type["% Passing Math"].mean()
type_score["% Passing Reading"]=group_type["% Passing Reading"].mean()
type_score["% Overall Passing Rate"]=group_type["% Overall Passing Rate"].mean()
type_score

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
School Type
Charter	83.473852	83.896421	93.620830	96.586489	90.432244
District	76.956733	80.966636	66.548453	80.799062	53.672208

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitattributes		.gitattributes
PyCity.ipynb		PyCity.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PySchool Analysis:

District summary

School Summary

Top Performing Schools (By Passing Rate)

Bottom Performing Schools (By Passing Rate)

Math scores by grade

Reading Scores by Grade

Scores by School Spending

Scores by School Size

Scores by School Type

About

Releases

Packages

Languages

weekysui/School-district-and-funding-report

Folders and files

Latest commit

History

Repository files navigation

PySchool Analysis:

District summary

School Summary

Top Performing Schools (By Passing Rate)

Bottom Performing Schools (By Passing Rate)

Math scores by grade

Reading Scores by Grade

Scores by School Spending

Scores by School Size

Scores by School Type

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages