# 🧠 Day 3 – SQL via Python: NYC School Data Exploration
In this notebook, you'll connect to a PostgreSQL database and execute SQL queries to explore NYC school data.

Database Tables:

high_school_directory – School names, locations, types, programs

school_demographics – Enrollment data, ELL, FRPL, disabilities, etc.

school_safety_report – Reported incidents by type and location.

## 🔌 Step 1: Import Libraries

In [54]:
import pandas as pd
import psycopg2

## 🔐 Step 2: Connect to the Database

In [55]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

## 🔍 Step 3: Run a Test Query

In [56]:
query = "SELECT * FROM nyc_schools.high_school_directory LIMIT 5;"
df = pd.read_sql(query, conn)
df.head()

  df = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,number_programs,Location 1,Community Board,Council District,Census Tract,Zip Codes,Community Districts,Borough Boundaries,City Council Districts,Police Precincts
0,27Q260,Frederick Douglass Academy VI High School,Queens,Q465,718-471-2154,718-471-2890,9.0,12,,,...,1,"{'latitude': '40.601989336', 'longitude': '-73...",14,31,100802,20529,51,3,47,59
1,21K559,Life Academy High School for Film and Music,Brooklyn,K400,718-333-7750,718-333-7775,9.0,12,,,...,1,"{'latitude': '40.593593811', 'longitude': '-73...",13,47,306,17616,21,2,45,35
2,16K393,Frederick Douglass Academy IV Secondary School,Brooklyn,K026,718-574-2820,718-574-2821,9.0,12,,,...,1,"{'latitude': '40.692133704', 'longitude': '-73...",3,36,291,18181,69,2,49,52
3,08X305,Pablo Neruda Academy,Bronx,X450,718-824-1682,718-824-1663,9.0,12,,,...,1,"{'latitude': '40.822303765', 'longitude': '-73...",9,18,16,11611,58,5,31,26
4,03M485,Fiorello H. LaGuardia High School of Music & A...,Manhattan,M485,212-496-0700,212-724-5748,9.0,12,,,...,6,"{'latitude': '40.773670507', 'longitude': '-73...",7,6,151,12420,20,4,19,12


## ✅ How many schools are there in each borough?

In [57]:
# Example: Count schools by borough
query = """
SELECT borough, COUNT(*) AS school_count
FROM nyc_schools.high_school_directory
GROUP BY borough;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,school_count
0,Brooklyn,121
1,Queens,80
2,Staten Island,10
3,Manhattan,106
4,Bronx,118


In [58]:
# Joining Tables: High School Directory + School Demographics
query = """
SELECT * FROM nyc_schools.high_school_directory
LEFT JOIN nyc_schools.school_demographics
ON nyc_schools.high_school_directory.dbn = nyc_schools.school_demographics.dbn;
"""
df_dir_demo = pd.read_sql(query, conn)
df_dir_demo

  df_dir_demo = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,black_num,black_per,hispanic_num,hispanic_per,white_num,white_per,male_num,male_per,female_num,female_per
0,01M292,Henry Street School for International Studies,Manhattan,M056,212-406-9411,212-406-9417,6.0,12,,,...,106.0,36.1,133.0,45.2,10.0,3.4,160.0,54.4,134.0,45.6
1,01M292,Henry Street School for International Studies,Manhattan,M056,212-406-9411,212-406-9417,6.0,12,,,...,137.0,31.6,208.0,47.9,14.0,3.2,241.0,55.5,193.0,44.5
2,01M292,Henry Street School for International Studies,Manhattan,M056,212-406-9411,212-406-9417,6.0,12,,,...,158.0,30.7,272.0,52.8,12.0,2.3,281.0,54.6,234.0,45.4
3,01M292,Henry Street School for International Studies,Manhattan,M056,212-406-9411,212-406-9417,6.0,12,,,...,138.0,29.4,264.0,56.2,14.0,3.0,264.0,56.2,206.0,43.8
4,01M292,Henry Street School for International Studies,Manhattan,M056,212-406-9411,212-406-9417,6.0,12,,,...,141.0,27.6,290.0,56.8,16.0,3.1,297.0,58.1,214.0,41.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
463,14K685,El Puente Academy for Peace and Justice,Brooklyn,K778,718-387-1125,718-387-4229,9.0,12,,,...,,,,,,,,,,
464,22K555,Brooklyn College Academy,Brooklyn,K917,718-853-6184,718-853-6356,9.0,12,,,...,,,,,,,,,,
465,14K478,"The High School for Enterprise, Business and T...",Brooklyn,K450,718-387-2800,718-387-2748,9.0,12,,,...,,,,,,,,,,
466,24Q296,Pan American International High School,Queens,Q744,718-271-3602,718-271-4041,9.0,12,,,...,,,,,,,,,,


In [59]:
df_dir_demo['ell_percent'].isna().sum()

np.int64(428)

## What is the average % of ELL per borough?

In [60]:
# Count the average % of ELL students per borough
df_ell_count = df_dir_demo.groupby('borough')['ell_percent'].agg(['mean', 'count']).sort_values('mean', ascending=False)
df_ell_count

Unnamed: 0_level_0,mean,count
borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Manhattan,7.5725,40
Bronx,,0
Brooklyn,,0
Queens,,0
Staten Island,,0


The English Language Learners are only in Manhattan and the average of its percentage is 7.5 

## Top 3 schools in each borough with the highest percentage of special education students

Using the data from the school demographics and high school directory

In [90]:
# Show all rows that have duplicate DBNs (including both original and duplicate)
duplicates = df_dir_demo[df_dir_demo.duplicated(subset=['dbn'], keep=False)]
duplicates[['dbn', 'sped_percent']]

Unnamed: 0,dbn,dbn.1,sped_percent
0,01M292,01M292,19.0
1,01M292,01M292,22.4
2,01M292,01M292,24.1
3,01M292,01M292,25.1
4,01M292,01M292,21.9
5,01M292,01M292,23.7
6,01M292,01M292,24.9
7,01M448,01M448,15.7
8,01M448,01M448,15.6
9,01M448,01M448,19.0


In [None]:
# Remove duplicate columns if present
df_dir_demo_fixed = df_dir_demo.loc[:, ~df_dir_demo.columns.duplicated()]

# Remove duplicates by keeping the average sped_percent for each DBN
df_dir_demo_clean = df_dir_demo_fixed.groupby('dbn').agg({
    'school_name': 'first',  
    'borough': 'first',      
    'sped_percent': 'mean',  # Calculate average of sped_percent
})
df_dir_demo_clean

Unnamed: 0_level_0,school_name,borough,sped_percent
dbn,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01M292,Henry Street School for International Studies,Manhattan,23.014286
01M448,University Neighborhood High School,Manhattan,19.942857
01M450,East Side Community School,Manhattan,26.285714
01M509,Marta Valle High School,Manhattan,22.214286
01M539,"New Explorations into Science, Technology and ...",Manhattan,3.714286
...,...,...,...
32K545,EBC High School for Public Service - Bushwick,Brooklyn,
32K549,Bushwick School for Social Justice,Brooklyn,
32K552,Academy of Urban Planning,Brooklyn,
32K554,All City Leadership Secondary School,Brooklyn,


In [103]:
# Get top 3 schools per borough with highest special education percentages
top_3_sped = (
    df_dir_demo_clean
    .reset_index()  # 'dbn' as a column again
    .groupby('borough')
    .apply(lambda x: x.nlargest(3, 'sped_percent')[['dbn', 'school_name', 'sped_percent']])
)
top_3_sped

  .apply(lambda x: x.nlargest(3, 'sped_percent')[['dbn', 'school_name', 'sped_percent']])


Unnamed: 0_level_0,Unnamed: 1_level_0,dbn,school_name,sped_percent
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bronx,106,07X221,South Bronx Preparatory: A College Board School,
Bronx,107,07X223,M.S. 223 The Laboratory School of Finance and ...,
Bronx,108,07X259,"H.E.R.O. High (Health, Education, and Research...",
Brooklyn,224,13K265,Dr. Susan S. McKinney Secondary School of the ...,
Brooklyn,225,13K350,Urban Assembly High School of Music and Art,
Brooklyn,226,13K412,Brooklyn Community High School of Communicatio...,
Manhattan,2,01M450,East Side Community School,26.285714
Manhattan,0,01M292,Henry Street School for International Studies,23.014286
Manhattan,3,01M509,Marta Valle High School,22.214286
Queens,338,24Q236,International High School for Health Sciences,


## 🧠 Insights

1) There are 121 schools in Brooklyn, 80 in Queens, 10 in Staaten Island, 106 in Manhattan and 118 schoöls in Bronx. 

2) The English Language Learners are only in Manhattan and the average of its percentage is 7.5 

3) Top 3 schools in  Manhattan with the highest percentage of special education students are the schools with 'dbn' : 01M450, 01M292 and 01M509	