#  Database Exploration - Hakim Murphy

## Import Libraries

In [1]:
import pandas as pd
import psycopg2
import warnings
warnings.filterwarnings("ignore")

## Connect to Database

In [2]:
conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="npg_CeS9fJg2azZD",
    host="ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech",
    port="5432",
    sslmode="require"  
)
cur = conn.cursor()

## 🧮 School Distribution

In [4]:
query = """
SELECT 
    borough, 
    COUNT(DISTINCT dbn) AS unique_school_count
FROM nyc_schools.high_school_directory
GROUP BY borough;
"""
num_school_boroughs = pd.read_sql(query, conn)
num_school_boroughs

Unnamed: 0,borough,unique_school_count
0,Bronx,118
1,Brooklyn,121
2,Manhattan,106
3,Queens,80
4,Staten Island,10


The number of schools in `Staten Island` are very low, which is strange for a population of over 450,000 people.

## 🎓 Language Learners

In [7]:
query = """
SELECT
  dir.borough,
  AVG(sd.ell_percent) AS avg_ell_pct
FROM nyc_schools.high_school_directory AS dir
JOIN nyc_schools.school_demographics AS sd
  USING (dbn)
GROUP BY dir.borough;
"""
ell_per_borough = pd.read_sql(query, conn)
print(f'{ell_per_borough.round(2)}%')

     borough  avg_ell_pct
0  Manhattan         7.57%


Checked the data and found that the `dbn` key are all `[Null]` values, and therefore only `Manhattan` is returned, on average, `7.57%` of students in Manhattan high schools are classified as `English Language Learners (ELL)`.  

## 🔗Schools supporting special needs

In [8]:
query = """
SELECT *
FROM (
  SELECT
    dir.borough,
    dir.school_name,
    sd.sped_percent,
    ROW_NUMBER() OVER (PARTITION BY dir.borough ORDER BY sd.sped_percent DESC) AS rn
  FROM nyc_schools.high_school_directory AS dir
  JOIN nyc_schools.school_demographics AS sd
    USING (dbn)
  WHERE sd.sped_percent IS NOT NULL
) ranked
WHERE rn <= 3;
"""
sped_per_borough = pd.read_sql(query, conn)
print(sped_per_borough)

     borough                 school_name  sped_percent  rn
0  Manhattan  East Side Community School          28.8   1
1  Manhattan  East Side Community School          27.7   2
2  Manhattan  East Side Community School          26.7   3


## Conclusion: NYC High School Data Analysis with PostgreSQL

This analysis explored NYC high school data. The key link between the `high_school_directory` and `school_demographics` tables is the `dbn` column.

However, only Manhattan schools have valid `dbn` codes, so results for other boroughs are missing. This limits borough-level insights for metrics like ELL and special education percentages. For complete analysis, ensure all boroughs have valid `dbn` codes in both tables.