# 🧠 Day 3 – SQL via Python: NYC School Data Exploration
In this notebook, you'll connect to a PostgreSQL database and execute SQL queries to explore NYC school data.

## 🔌 Step 1: Import Libraries

In [10]:
!pip install psycopg2-binary



In [11]:
import pandas as pd
import psycopg2

## 🔐 Step 2: Connect to the Database

In [12]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

## 🔍 Step 3: Run a Test Query

In [13]:
query = "SELECT * FROM nyc_schools.high_school_directory LIMIT 5;"
df = pd.read_sql(query, conn)
df.head()

  df = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,number_programs,Location 1,Community Board,Council District,Census Tract,Zip Codes,Community Districts,Borough Boundaries,City Council Districts,Police Precincts
0,27Q260,Frederick Douglass Academy VI High School,Queens,Q465,718-471-2154,718-471-2890,9.0,12,,,...,1,"{'latitude': '40.601989336', 'longitude': '-73...",14,31,100802,20529,51,3,47,59
1,21K559,Life Academy High School for Film and Music,Brooklyn,K400,718-333-7750,718-333-7775,9.0,12,,,...,1,"{'latitude': '40.593593811', 'longitude': '-73...",13,47,306,17616,21,2,45,35
2,16K393,Frederick Douglass Academy IV Secondary School,Brooklyn,K026,718-574-2820,718-574-2821,9.0,12,,,...,1,"{'latitude': '40.692133704', 'longitude': '-73...",3,36,291,18181,69,2,49,52
3,08X305,Pablo Neruda Academy,Bronx,X450,718-824-1682,718-824-1663,9.0,12,,,...,1,"{'latitude': '40.822303765', 'longitude': '-73...",9,18,16,11611,58,5,31,26
4,03M485,Fiorello H. LaGuardia High School of Music & A...,Manhattan,M485,212-496-0700,212-724-5748,9.0,12,,,...,6,"{'latitude': '40.773670507', 'longitude': '-73...",7,6,151,12420,20,4,19,12


## ✅ Task Queries Below

In [14]:
# Example: Count unique dbn by borough
query = """
SELECT borough, COUNT(DISTINCT dbn) AS unique_school_count
FROM nyc_schools.high_school_directory
GROUP BY borough;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,unique_school_count
0,Bronx,118
1,Brooklyn,121
2,Manhattan,106
3,Queens,80
4,Staten Island,10


## 🧠 Insights

## 1. What is the average % of Englisch Language Learning(per borough)?

## ANS: Only Manhatten has offered the English Language Learning and average is 5.83% (average of each year and different schools)

In [15]:
query = """SELECT 
    sch.dbn, 
    sch.borough, 
    dem.ell_num, 
    dem.ell_percent,
    dem.schoolyear,
    dem.total_enrollment
FROM nyc_schools.high_school_directory AS sch
LEFT JOIN nyc_schools.school_demographics AS dem
    ON sch.dbn = dem.dbn;
"""
df_language = pd.read_sql(query, conn)
df_language.head(50)


  df_language = pd.read_sql(query, conn)


Unnamed: 0,dbn,borough,ell_num,ell_percent,schoolyear,total_enrollment
0,01M292,Manhattan,29.0,9.9,20052006.0,294.0
1,01M292,Manhattan,46.0,10.6,20062007.0,434.0
2,01M292,Manhattan,52.0,10.1,20072008.0,515.0
3,01M292,Manhattan,50.0,10.6,20082009.0,470.0
4,01M292,Manhattan,81.0,15.9,20092010.0,511.0
5,01M292,Manhattan,97.0,21.7,20102011.0,448.0
6,01M292,Manhattan,94.0,22.3,20112012.0,422.0
7,01M448,Manhattan,37.0,7.7,20052006.0,478.0
8,01M448,Manhattan,36.0,6.8,20062007.0,533.0
9,01M448,Manhattan,42.0,7.1,20072008.0,588.0


In [16]:
query = """
SELECT 
    sch.borough, 
    SUM(dem.ell_num) AS num_ell,
    AVG(dem.ell_percent) AS avg_ell_percent,
    SUM(dem.total_enrollment) AS total_enrollment
FROM nyc_schools.high_school_directory AS sch
LEFT JOIN nyc_schools.school_demographics AS dem
    ON sch.dbn = dem.dbn
GROUP BY sch.borough;
"""
df_ell = pd.read_sql(query, conn)
df_ell.head()

  df_ell = pd.read_sql(query, conn)


Unnamed: 0,borough,num_ell,avg_ell_percent,total_enrollment
0,Brooklyn,,,
1,Queens,,,
2,Staten Island,,,
3,Manhattan,1451.0,7.5725,24901.0
4,Bronx,,,


In [None]:
# Average of the percentage of English Language Learning
# Only Manhattan has data
1451/24901*100

5.8270752178627365

## 2. school supportoíng special needs
## find top 3 schools in each borough with the highest percentage of special education students
## (Using the data from the school demographics and high school directory)

## ANS: Based on the database, we only find Manhattan area offered the special education. The highest sped_percet is  East Side Community School(28.8,27.7,26.7).

In [17]:
query = """
WITH ranked_schools As(
SELECT 
    sch.dbn, 
    sch.school_name,
    sch.borough, 
    dem.sped_percent as sped_percent,
    ROW_NUMBER() OVER (PARTITION BY sch.borough ORDER BY dem.sped_percent DESC) AS rank

    
FROM nyc_schools.high_school_directory AS sch
LEFT JOIN nyc_schools.school_demographics AS dem
    ON sch.dbn = dem.dbn
WHERE dem.sped_percent IS NOT NULL
)
SELECT *
FROM ranked_schools
where rank <=3
ORDER by borough, sped_percent DESC;

"""
df_sped = pd.read_sql(query, conn)
df_sped.head(10)  

  df_sped = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,sped_percent,rank
0,01M450,East Side Community School,Manhattan,28.8,1
1,01M450,East Side Community School,Manhattan,27.7,2
2,01M450,East Side Community School,Manhattan,26.7,3


## Summary
- ### It is better to calculate the percentage as number_ell ÷ total_enrollment, because taking the average of percentages can introduce bias.
- ### The average percentage of ELL students in Manhattan is 5.83%, and the top three schools with the highest special education enrollment are East Side Community School,  the sped_value is 28.8, 27.7 and 26.7