# 🧠 Day 3 – SQL via Python: NYC School Data Exploration
In this notebook, you'll connect to a PostgreSQL database and execute SQL queries to explore NYC school data.

## 🔌 Step 1: Import Libraries

In [57]:
import pandas as pd
import psycopg2

## 🔐 Step 2: Connect to the Database

In [56]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

## 🔍 Step 3: Run a Test Query

In [58]:
query = "SELECT * FROM nyc_schools.high_school_directory LIMIT 5;"
df = pd.read_sql(query, conn)
df.head()

  df = pd.read_sql(query, conn)


Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,number_programs,Location 1,Community Board,Council District,Census Tract,Zip Codes,Community Districts,Borough Boundaries,City Council Districts,Police Precincts
0,27Q260,Frederick Douglass Academy VI High School,Queens,Q465,718-471-2154,718-471-2890,9.0,12,,,...,1,"{'latitude': '40.601989336', 'longitude': '-73...",14,31,100802,20529,51,3,47,59
1,21K559,Life Academy High School for Film and Music,Brooklyn,K400,718-333-7750,718-333-7775,9.0,12,,,...,1,"{'latitude': '40.593593811', 'longitude': '-73...",13,47,306,17616,21,2,45,35
2,16K393,Frederick Douglass Academy IV Secondary School,Brooklyn,K026,718-574-2820,718-574-2821,9.0,12,,,...,1,"{'latitude': '40.692133704', 'longitude': '-73...",3,36,291,18181,69,2,49,52
3,08X305,Pablo Neruda Academy,Bronx,X450,718-824-1682,718-824-1663,9.0,12,,,...,1,"{'latitude': '40.822303765', 'longitude': '-73...",9,18,16,11611,58,5,31,26
4,03M485,Fiorello H. LaGuardia High School of Music & A...,Manhattan,M485,212-496-0700,212-724-5748,9.0,12,,,...,6,"{'latitude': '40.773670507', 'longitude': '-73...",7,6,151,12420,20,4,19,12


## ✅ Step 4: Task Queries Below

**Q1. Count schools by borough**

In [64]:
query = """
SELECT borough, COUNT(DISTINCT school_name) AS school_count
FROM nyc_schools.high_school_directory
GROUP BY 
    borough
ORDER BY
    school_count DESC;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,school_count
0,Brooklyn,121
1,Bronx,118
2,Manhattan,106
3,Queens,80
4,Staten Island,10


**Insight:** *Brooklyn has the highest number of schools (121), while Staten Island has significantly fewer (10), indicating a varied distribution of high schools across NYC boroughs.*

**Q2. Average % of English Language Learners (ELL) per borough**

In [51]:
try:
    conn = psycopg2.connect(
        dbname="neondb",
        user="neondb_owner",
        password="npg_CeS9fJg2azZD",
        host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
        port="5432",
        sslmode="require"
    )

    ell_query = """
    SELECT
        hsd.borough,
        ROUND(AVG(sd.ell_percent)::numeric, 2)::text || '%' AS avg_ell_percentage
    FROM
        nyc_schools.school_demographics sd
    INNER JOIN
        nyc_schools.high_school_directory hsd
    ON
        sd.dbn = hsd.dbn
    GROUP BY
        hsd.borough
    ORDER BY
        hsd.borough; 
    """
    df_ell_result = pd.read_sql(ell_query, conn)
    print("\n Average % of English Language Learners (ELL) per Borough:")
    print(df_ell_result.to_string(index=False))
     # Use to_string(index=False) to hide DataFrame index

except Exception as e:
    print(f"\nError connecting to database or executing query: {e}")
finally:
    # Close the connection if it was opened
    if 'conn' in locals() and conn:
        conn.close()


  df_ell_result = pd.read_sql(ell_query, conn)



 Average % of English Language Learners (ELL) per Borough:
  borough avg_ell_percentage
Manhattan              7.57%


**Insight:** *Only Manhattan appears in the results, showing an average ELL percentage of 7.57%.*

**Q3. Top 3 schools in each borough with the highest percentage of special education students (sped_percent)**

In [49]:
sped_query = """
    WITH RankedSpedSchools AS (
        SELECT
            hsd.borough,
            hsd.school_name,
            sd.sped_percent,
            ROW_NUMBER() OVER(PARTITION BY hsd.borough ORDER BY sd.sped_percent DESC) as rn
        FROM
            nyc_schools.school_demographics sd
        INNER JOIN
            nyc_schools.high_school_directory hsd
        ON
            sd.dbn = hsd.dbn
        WHERE
            sd.sped_percent IS NOT NULL 
    )
    SELECT
        borough,
        school_name,
        ROUND(sped_percent::numeric, 2)::text || '%' AS special_education_student_percentage
    FROM
        RankedSpedSchools
    WHERE
        rn <= 3
    ORDER BY
        borough,
        sped_percent DESC;
    """
df_sped_result = pd.read_sql(sped_query, conn)
print("\n Top 3 Schools in Each Borough with Highest Special Education Percentage:")
print(df_sped_result.to_string(index=False))



  df_sped_result = pd.read_sql(sped_query, conn)



 Top 3 Schools in Each Borough with Highest Special Education Percentage:
  borough                school_name special_education_student_percentage
Manhattan East Side Community School                               28.80%
Manhattan East Side Community School                               27.70%
Manhattan East Side Community School                               26.70%


**Insight:** *"East Side Community School" for Manhattan is appearing three times with varying special education percentages (28.80%, 27.70%, 26.70%).*