## **1. Explore the data**

In [29]:
# Import packages and load data
import numpy as np
import pandas as pd
schools = pd.read_csv("schools.csv")

schools.head(10)

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7
5,Bard High School Early College,Manhattan,M097,634,641,639,70.8
6,Urban Assembly Academy of Government and Law,Manhattan,M445,389,395,381,80.8
7,Marta Valle High School,Manhattan,M025,438,413,394,35.6
8,University Neighborhood High School,Manhattan,M446,437,355,352,69.9
9,New Design High School,Manhattan,M445,381,396,372,73.7


## **2. Finding missing values**
It looks like the first school in our database had no data in the percent_tested column!

Let's identify how many schools have missing data for this column, indicating schools that did not report the percentage of students tested.

To understand whether this missing data problem is widespread in New York, we will also calculate the total number of schools in the database.

In [8]:
# count number of school with missing percent_tested
schools_null_tested = schools['percent_tested'].isnull().sum()
schools_null_tested

20

In [9]:
# count the number of total schools
num_schools = schools['school_name'].count()
num_schools

375

In [10]:
# calculate the % of missing percent_tested against the dataset
percent_missing = schools_null_tested/num_schools
percent_missing

0.05333333333333334

## **3. Schools by building code**
There are 20 schools with missing data for percent_tested, which only makes up 5% of all rows in the database.

Now let's turn our attention to how many schools there are. When we displayed the first ten rows of the database, several had the same value in the building_code column, suggesting there are multiple schools based in the same location. Let's find out how many unique school locations exist in our database.

In [14]:
# count the number of unique building_code
schools['building_code'].nunique()

233

## **4. Best schools for math**
Out of 375 schools, only 233 (62%) have a unique building_code!

Now let's start our analysis of school performance. As each school reports individually, we will treat them this way rather than grouping them by building_code.

First, let's find all schools with an average math score of at least 80% (out of 800).

In [77]:
# filtering out the schools with the best math scores
best_math_schools = schools[schools['average_math'] > 800*0.8][['school_name', 'average_math']].sort_values('average_math', ascending = False)
best_math_schools

Unnamed: 0,school_name,average_math
88,Stuyvesant High School,754
170,Bronx High School of Science,714
93,Staten Island Technical High School,711
365,Queens High School for the Sciences at York Co...,701
68,"High School for Mathematics, Science, and Engi...",683
280,Brooklyn Technical High School,682
333,Townsend Harris High School,680
174,High School of American Studies at Lehman College,669
0,"New Explorations into Science, Technology and ...",657
45,Eleanor Roosevelt High School,641


## **5. Lowest reading score**
Wow, there are only ten public schools in New York City with an average math score of at least 640!

Now let's look at the other end of the spectrum and find the single lowest score for reading. We will only select the score, not the school, to avoid naming and shaming!

In [15]:
# find the lowest average reading
lowest_avg_reading = schools['average_reading'].min()
lowest_avg_reading

302

In [16]:
percent_score = lowest_avg_reading/800
percent_score

0.3775

## **6. Best writing school**
The lowest average score for reading across schools in New York City is less than 40% of the total available points!

Now let's find the school with the highest average writing score.

In [19]:
highest_avg_writing = schools[schools['average_writing'] == schools['average_writing'].max()][['school_name', 'average_writing']]
highest_avg_writing

Unnamed: 0,school_name,average_writing
88,Stuyvesant High School,693


## **7. Top 10 schools**
An average writing score of 693 is pretty impressive!

This top writing score was at the same school that got the top math score, Stuyvesant High School. Stuyvesant is widely known as a perennial top school in New York.

What other schools are also excellent across the board? Let's look at scores across reading, writing, and math to find out.

In [78]:
# filtering out the top 10 performing schools based on scores across the 3 SAT sections
schools['total_SAT'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']
top_10_schools = schools.groupby('school_name', as_index = False)['total_SAT'].mean().sort_values('total_SAT', ascending = False)

top_10_schools = top_10_schools.head(10)
top_10_schools

Unnamed: 0,school_name,total_SAT
325,Stuyvesant High School,2144.0
324,Staten Island Technical High School,2041.0
55,Bronx High School of Science,2041.0
188,High School of American Studies at Lehman College,2013.0
334,Townsend Harris High School,1981.0
293,Queens High School for the Sciences at York Co...,1947.0
30,Bard High School Early College,1914.0
83,Brooklyn Technical High School,1896.0
121,Eleanor Roosevelt High School,1889.0
180,"High School for Mathematics, Science, and Engi...",1889.0


## **8. Ranking boroughs**
There are four schools with average SAT scores of over 2000! Now let's analyze performance by New York City borough.

We will build a query that calculates the number of schools and the average SAT score per borough!

In [34]:
# Create a column that calculates average total SAT scores by school
schools['avg_total_score'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']
schools.head(10)

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested,avg_total_score
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,,1859
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9,1193
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1,1261
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9,1529
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7,1197
5,Bard High School Early College,Manhattan,M097,634,641,639,70.8,1914
6,Urban Assembly Academy of Government and Law,Manhattan,M445,389,395,381,80.8,1165
7,Marta Valle High School,Manhattan,M025,438,413,394,35.6,1245
8,University Neighborhood High School,Manhattan,M446,437,355,352,69.9,1144
9,New Design High School,Manhattan,M445,381,396,372,73.7,1149


In [35]:
# find out average total SAT scores by borough
score_by_borough = schools.groupby('borough')['avg_total_score'].agg(['count', 'mean']).sort_values(by='mean', ascending=False)
score_by_borough

Unnamed: 0_level_0,count,mean
borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Staten Island,10,1439.0
Queens,69,1345.478261
Manhattan,89,1340.134831
Brooklyn,109,1230.256881
Bronx,98,1202.72449


## **9. Brooklyn numbers**
It appears that schools in Staten Island, on average, produce higher scores across all three categories. However, there are only 10 schools in Staten Island, compared to an average of 91 schools in the other four boroughs!

For our final query of the database, let's focus on Brooklyn, which has 109 schools. We wish to find the top five schools for math performance.

In [37]:
brooklyn_schools = schools[schools['borough'] == 'Brooklyn'][['school_name','average_math']].sort_values(by = 'average_math', ascending = False)
brooklyn_schools.head()

Unnamed: 0,school_name,average_math
280,Brooklyn Technical High School,682
237,Brooklyn Latin School,625
308,Leon M. Goldstein High School for the Sciences,563
275,Millennium Brooklyn High School,553
254,Midwood High School,550
