# 🧠 Day 3 – SQL via Python: NYC School Data Exploration
In this notebook, you'll connect to a PostgreSQL database and execute SQL queries to explore NYC school data.

## 🔌 Step 1: Import Libraries

In [1]:
import pandas as pd
import psycopg2

## 🔐 Step 2: Connect to the Database

In [6]:
# DB connection setup using hardcoded credentials (for onboarding only)
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="a9Am7Yy5r9_T7h4OF2GN",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"
)
cur = conn.cursor()

## ✅ Task Queries Below

### 🧮 School Distribution
**Q1 : How many schools in each borough?**
* Used the high_schools_directory to keep in line with what we did yesterday (school_safety_table also has boroughs but the values are very different)
* Only counted distinct dbn

In [44]:
query = """
SELECT borough, COUNT(DISTINCT dbn) AS school_count
FROM nyc_schools.high_school_directory
GROUP BY borough;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,school_count
0,Bronx,118
1,Brooklyn,121
2,Manhattan,106
3,Queens,80
4,Staten Island,10


### 🎓 Language Learners
**Q2 : What is the average % of English Language Learners (ELL) per borough?**
- Could only find data for Borough Manahatan in the school_demographics table after joining with the highschool data so the other boroughs show as null
- Used a LEFT JOIN so that no rows are left off and I can show all the boroughs even if they have NULL

In [3]:
query = """
    SELECT borough, 
        AVG(sd.ell_percent)AS avg_ell_percent
    FROM nyc_schools.high_school_directory hsd 
    LEFT JOIN nyc_schools.school_demographics sd
    USING(dbn)
    GROUP BY borough
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,borough,avg_ell_percent
0,Brooklyn,
1,Queens,
2,Staten Island,
3,Manhattan,7.5725
4,Bronx,


### 🔗School supporting special needs
**Q3 : Using the data from the school demographics and high school directory, write a query to find the top 3 schools in each borough with the highest percentage of special education students (sped_percent)**
- Again the only borough that is here is Manhattan so only have Top 3 schools for this borough

In [8]:
query = """
    SELECT *
    FROM(
        SELECT
            RANK() OVER (PARTITION BY borough ORDER BY sd.sped_percent DESC) AS rank,
            borough, 
            hsd.school_name , 
            sd.sped_percent AS special_ed_percent
        FROM nyc_schools.high_school_directory hsd 
        JOIN nyc_schools.school_demographics sd 
    USING(dbn)
    ) AS ranked
    WHERE rank <= 3
    ORDER BY rank;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,rank,borough,school_name,special_ed_percent
0,1,Manhattan,East Side Community School,28.8
1,2,Manhattan,East Side Community School,27.7
2,3,Manhattan,East Side Community School,26.7


## 🧠 Insights

- Insight 1: Using the schools_demographic table we can see that the Average amount of Hispanic students is a lot higher then the rest, this is only for the Manhatten area.

In [38]:
query = """
    SELECT 'Asian' AS race, ROUND(AVG(asian_num), 2) AS avg_student_count
    FROM nyc_schools.school_demographics
    UNION ALL
    SELECT 'Black', ROUND(AVG(black_num), 2)
    FROM nyc_schools.school_demographics
    UNION ALL
    SELECT 'Hispanic', ROUND(AVG(hispanic_num), 2)
    FROM nyc_schools.school_demographics
    UNION ALL
    SELECT 'White', ROUND(AVG(white_num), 2)
    FROM nyc_schools.school_demographics
    ORDER BY avg_student_count DESC;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,race,avg_student_count
0,Hispanic,189.39
1,Asian,76.48
2,Black,70.99
3,White,44.36


- The Average percentage of females is just slightly higher then males in the Manhatten borough

In [42]:
query = """
    SELECT 
        ROUND(AVG(male_per::NUMERIC), 2) as avg_male_per, 
        ROUND(AVG(female_per::NUMERIC), 2) as avg_female_per
    FROM nyc_schools.school_demographics;
"""
df_result = pd.read_sql(query, conn)
df_result

  df_result = pd.read_sql(query, conn)


Unnamed: 0,avg_male_per,avg_female_per
0,51.5,48.5
