# 🧠 Day 3 – SQL via Python: NYC School Data Exploration

###        In this notebook I connect to a PostgreSQL database and execute SQL queries to explore NYC school data.


## 🔌 Step 1: Import Libraries


In [11]:
import pandas as pd
import psycopg2


## 🔐 Step 2: Connect to the Database

In [12]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

In [37]:
df = pd.read_sql("SELECT * FROM nyc_schools.school_demographics LIMIT 5;", engine)
print(df.columns.tolist())

['dbn', 'Name', 'schoolyear', 'fl_percent', 'frl_percent', 'total_enrollment', 'prek', 'k', 'grade1', 'grade2', 'grade3', 'grade4', 'grade5', 'grade6', 'grade7', 'grade8', 'grade9', 'grade10', 'grade11', 'grade12', 'ell_num', 'ell_percent', 'sped_num', 'sped_percent', 'ctt_num', 'selfcontained_num', 'asian_num', 'asian_per', 'black_num', 'black_per', 'hispanic_num', 'hispanic_per', 'white_num', 'white_per', 'male_num', 'male_per', 'female_num', 'female_per']


In [39]:
pd.read_sql("SELECT * FROM nyc_schools.school_demographics LIMIT 5;", engine)

Unnamed: 0,dbn,Name,schoolyear,fl_percent,frl_percent,total_enrollment,prek,k,grade1,grade2,...,black_num,black_per,hispanic_num,hispanic_per,white_num,white_per,male_num,male_per,female_num,female_per
0,01M015,P.S. 015 ROBERTO CLEMENTE,20052006,89.4,,281,15,36,40,33,...,74,26.3,189,67.3,5,1.8,158,56.2,123,43.8
1,01M015,P.S. 015 ROBERTO CLEMENTE,20062007,89.4,,243,15,29,39,38,...,68,28.0,153,63.0,4,1.6,140,57.6,103,42.4
2,01M015,P.S. 015 ROBERTO CLEMENTE,20072008,89.4,,261,18,43,39,36,...,77,29.5,157,60.2,7,2.7,143,54.8,118,45.2
3,01M015,P.S. 015 ROBERTO CLEMENTE,20082009,89.4,,252,17,37,44,32,...,75,29.8,149,59.1,7,2.8,149,59.1,103,40.9
4,01M015,P.S. 015 ROBERTO CLEMENTE,20092010,,96.5,208,16,40,28,32,...,67,32.2,118,56.7,6,2.9,124,59.6,84,40.4


## 🔍 Step 3: Run a Test Query


In [27]:
%pip install SQLAlchemy psycopg2-binary

from sqlalchemy import create_engine, text
import pandas as pd

DB_URL = "postgresql+psycopg2://neondb_owner:npg_CeS9fJg2azZD@ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech:5432/neondb?sslmode=require"
pd.read_sql(text("SELECT 1 AS ok;"), engine)


Note: you may need to restart the kernel to use updated packages.


Unnamed: 0,ok
0,1


# 🧮 1. School distribution by borough

### Schools in each borough
#### Query counts unique schools (dbn) in each borough to show how schools are distributed across boroughs
### Result: 
- Brooklyn has the largest number of schools (121), followed by the Bronx (118) and Manhattan (106).  
- Queens has 80 schools.  
- Staten Island has the fewest (10).  

#### Insight: School resources and student opportunities are more concentrated in Brooklyn, the Bronx, and Manhattan, while Staten Island has very limited options.


In [24]:
pd.read_sql("""
SELECT
  borough,
  COUNT(DISTINCT dbn) AS total_schools
FROM nyc_schools.high_school_directory
GROUP BY borough
ORDER BY total_schools DESC;
""", engine)

Unnamed: 0,borough,total_schools
0,Brooklyn,121
1,Bronx,118
2,Manhattan,106
3,Queens,80
4,Staten Island,10


# 🎓 2. Average % of English Language Learners (ELL) per borough

#### Query joins the high school directory with demographics and calculates the average percentage of English Language Learners (`ell_percent`) for each borough.

### Result:  
- Manhattan: 7.57%  
- Other boroughs: no available data (`NULL` values).

### Insight: 
The dataset appears incomplete for this metric. Only Manhattan has recorded values for `ell_percent`, so borough-level comparisons are not possible. This highlights a limitation of the dataset and shows why data availability must always be checked before drawing conclusions.


In [42]:
pd.read_sql("""
SELECT
    h.borough,
    ROUND(AVG(COALESCE(d.ell_percent, 0))::numeric, 2) AS avg_ell_percent,
    COUNT(d.ell_percent) AS rows_with_ell
FROM nyc_schools.high_school_directory h
LEFT JOIN nyc_schools.school_demographics d USING (dbn)
GROUP BY h.borough
ORDER BY h.borough;
""", engine)

Unnamed: 0,borough,avg_ell_percent,rows_with_ell
0,Bronx,0.0,0
1,Brooklyn,0.0,0
2,Manhattan,2.18,40
3,Queens,0.0,0
4,Staten Island,0.0,0


# 🔗 3. Top-3 schools per borough by special-education share

In [45]:
pd.read_sql("""
SELECT borough, school_name, dbn, ROUND(sped_percent::numeric, 2) AS sped_percent
FROM (
  SELECT
    h.borough, h.school_name, h.dbn,
    MAX(d.sped_percent) AS sped_percent,
    DENSE_RANK() OVER (
      PARTITION BY h.borough
      ORDER BY AVG(d.sped_percent) DESC NULLS LAST
    ) AS rk
  FROM nyc_schools.high_school_directory h
  JOIN nyc_schools.school_demographics d USING (dbn)
  GROUP BY h.borough, h.school_name, h.dbn
) s
WHERE rk <= 3
ORDER BY borough, sped_percent DESC;
""", engine)


Unnamed: 0,borough,school_name,dbn,sped_percent
0,Manhattan,East Side Community School,01M450,28.8
1,Manhattan,Marta Valle High School,01M509,25.9
2,Manhattan,Henry Street School for International Studies,01M292,25.1


#### Query ranks schools within each borough by the percentage of students in special education (sped_percent) and returns the top 3 per borough

#### Insight:
In Manhattan, these three schools have the highest proportion of students receiving special education services. The same logic can be applied to all boroughs to identify local leaders in special-education support.