# 🧠 Day 3 – SQL via Python: NYC School Data Exploration
In this notebook, you'll connect to a PostgreSQL database and execute SQL queries to explore NYC school data.

## 🔌 Step 1: Import Libraries

In [1]:
import pandas as pd
import psycopg2

## 🔐 Step 2: Connect to the Database

In [2]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

## 🔍 Step 3: Run a Test Query

In [3]:
query = "SELECT * FROM nyc_schools.high_school_directory LIMIT 5;"
df = pd.read_sql(query, conn)
df.head()


  df = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,number_programs,Location 1,Community Board,Council District,Census Tract,Zip Codes,Community Districts,Borough Boundaries,City Council Districts,Police Precincts
0,27Q260,Frederick Douglass Academy VI High School,Queens,Q465,718-471-2154,718-471-2890,9.0,12,,,...,1,"{'latitude': '40.601989336', 'longitude': '-73...",14,31,100802,20529,51,3,47,59
1,21K559,Life Academy High School for Film and Music,Brooklyn,K400,718-333-7750,718-333-7775,9.0,12,,,...,1,"{'latitude': '40.593593811', 'longitude': '-73...",13,47,306,17616,21,2,45,35
2,16K393,Frederick Douglass Academy IV Secondary School,Brooklyn,K026,718-574-2820,718-574-2821,9.0,12,,,...,1,"{'latitude': '40.692133704', 'longitude': '-73...",3,36,291,18181,69,2,49,52
3,08X305,Pablo Neruda Academy,Bronx,X450,718-824-1682,718-824-1663,9.0,12,,,...,1,"{'latitude': '40.822303765', 'longitude': '-73...",9,18,16,11611,58,5,31,26
4,03M485,Fiorello H. LaGuardia High School of Music & A...,Manhattan,M485,212-496-0700,212-724-5748,9.0,12,,,...,6,"{'latitude': '40.773670507', 'longitude': '-73...",7,6,151,12420,20,4,19,12


## ✅ Task Queries Below

In [4]:
query = """
SELECT borough, COUNT(DISTINCT dbn) AS school_count
FROM nyc_schools.high_school_directory
GROUP BY borough;
"""
df_result = pd.read_sql(query, conn)
df_result


  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,school_count
0,Bronx,118
1,Brooklyn,121
2,Manhattan,106
3,Queens,80
4,Staten Island,10


Column and Data Type Overview: `school_demographics`

In [5]:
#column and data type overview: `school_demographics`
pd.read_sql("""
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'nyc_schools'
  AND table_name  = 'school_demographics'
ORDER BY ordinal_position;
""", conn)




  pd.read_sql("""


Unnamed: 0,column_name,data_type
0,dbn,character varying
1,Name,character varying
2,schoolyear,integer
3,fl_percent,character varying
4,frl_percent,real
5,total_enrollment,integer
6,prek,character varying
7,k,character varying
8,grade1,character varying
9,grade2,character varying


Average ELL % per Borough

In [6]:
q_ell_by_borough = """
WITH latest AS (
  SELECT
    sd.dbn,
    sd.ell_percent,
    sd.schoolyear
  FROM nyc_schools.school_demographics sd
)
SELECT
  d.borough,
  AVG(sd.ell_percent) AS avg_ell_percent
FROM latest sd
JOIN nyc_schools.high_school_directory d
  ON TRIM(sd.dbn) = TRIM(d.dbn)
GROUP BY d.borough
ORDER BY d.borough;
"""
df_ell_by_borough = pd.read_sql(q_ell_by_borough, conn)
df_ell_by_borough


  df_ell_by_borough = pd.read_sql(q_ell_by_borough, conn)


Unnamed: 0,borough,avg_ell_percent
0,Manhattan,7.5725


Checking Borough Coverage in `school_demographics` as only Manhattan appears

In [7]:
pd.read_sql("""
SELECT DISTINCT d.borough
FROM nyc_schools.school_demographics sd
JOIN nyc_schools.high_school_directory d
  ON TRIM(sd.dbn) = TRIM(d.dbn)
""", conn)


  pd.read_sql("""


Unnamed: 0,borough
0,Manhattan


Top 3 SPED Schools per Borough

In [8]:
q_top_sped_by_borough = """
WITH latest AS (
  SELECT
    sd.dbn,
    sd.sped_percent,
    sd.schoolyear
  FROM nyc_schools.school_demographics sd
),
ranked AS (
  SELECT
    d.borough,
    d.school_name,
    latest.dbn,
    latest.sped_percent,
    ROW_NUMBER() OVER (
      PARTITION BY d.borough
      ORDER BY latest.sped_percent DESC
    ) AS rn
  FROM latest
  JOIN nyc_schools.high_school_directory d
    ON TRIM(latest.dbn) = TRIM(d.dbn)
)
SELECT borough, school_name, dbn, sped_percent
FROM ranked
WHERE rn <= 3
ORDER BY borough, sped_percent DESC;
"""
df_top_sped_by_borough = pd.read_sql(q_top_sped_by_borough, conn)
df_top_sped_by_borough


  df_top_sped_by_borough = pd.read_sql(q_top_sped_by_borough, conn)


Unnamed: 0,borough,school_name,dbn,sped_percent
0,Manhattan,East Side Community School,01M450,28.8
1,Manhattan,East Side Community School,01M450,27.7
2,Manhattan,East Side Community School,01M450,26.7


## 🧠 Insights

Summary of Observations and Findings

1. School Distribution 

• Brooklyn has the highest number of schools (121), followed by the Bronx (118) and Manhattan (106).  

• Staten Island has the fewest (10).  

• The count uses distinct school IDs (dbn), ensuring duplicate rows are excluded.

2. Average % of English Language Learners (ELL) per Borough

• The average is calculated across all available years, rather than using only the most recent year per school.

• Data limitation: In the current training database, `school_demographics` contains records only for Manhattan schools.  

• As a result, the average ELL percentage could only be calculated for Manhattan: 
7.6%.

3. Top 3 Schools per Borough by Special Education % (SPED) 

• The ranking includes all available years.

• Due to the same data limitation, results are available only for Manhattan.  

• The top 3 SPED schools in Manhattan have SPED percentages between 26.7% and 28.8%.

Overall Insights: 

• Brooklyn and the Bronx have the highest school counts, while Staten Island has the fewest.  

