# Final Project: Advanced SQL Techniques


## Scenario
- You have to analyse the following datasets for the city of Chicago, as available on the Chicago City data portal.

- - Socioeconomic indicators in Chicago
- - Chicago public schools
- - Chicago crime data

- Based on the information available in the different tables, you have to run specific queries using Advanced SQL techniques that generate the required result sets.

- The lab will be followed by a graded quiz that will have questions on all problems in this lab. Hence, remember to take screenshots of your SQL queries and their outputs for reference.

In [1]:
import mysql.connector as mysql 
import pandas as pd 
import os
from dotenv import load_dotenv

In [2]:
!docker stop mysql-container
!docker restart mysql-container
!docker ps 

mysql-container
mysql-container
CONTAINER ID   IMAGE     COMMAND                  CREATED      STATUS                  PORTS                                                  NAMES
50e33c579e3f   mysql     "docker-entrypoint.s…"   4 days ago   Up Less than a second   0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql-container


In [5]:
load_dotenv('/workspaces/IBM-DS-Course/.env')
user=os.getenv('USER')
password=os.getenv('PASSWORD')
host = 'localhost'
port=3306

def get_db_connection(db=None):
    return mysql.connect(
        host=host, user=user, password=password,port=port, database=db

    )
db = 'chicago_data'
create = f'CREATE DATABASE IF NOT EXISTS {db};'
use = f'USE {db};'


conn = get_db_connection()
cursor = conn.cursor()
cursor.execute(create)
conn.commit()
cursor.close()
conn.close()

In [6]:
file_paths = {
    'chicago_socioeconomic_data': '/workspaces/IBM-DS-Course/Course 6 Db and SQL /Module 6 /6.3 advance sql asm /6.3 .sql /chicago_socioeconomic_data.sql',
    'chicago_public_schools': '/workspaces/IBM-DS-Course/Course 6 Db and SQL /Module 6 /6.3 advance sql asm /6.3 .sql /chicago_public_schools.sql',
    'chicago_crime': '/workspaces/IBM-DS-Course/Course 6 Db and SQL /Module 6 /6.3 advance sql asm /6.3 .sql /chicago_crime.sql'
}


In [7]:
# check table exists
conn = get_db_connection(db)
cursor = conn.cursor()
# cursor.execute(use)

for table_name, file_path in file_paths.items():
    
    table_exists_query = f'SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = "{table_name}" AND TABLE_SCHEMA="{db}";'
    cursor.execute(table_exists_query)
    rspn = cursor.fetchone()
    if not rspn:
        try:
            with open(file_path, 'r') as sql_script:
                sql_cmds = sql_script.read()
                
                for cmd in sql_cmds.split(';'):
                    if cmd.strip():
                        cursor.execute(cmd)
                    conn.commit()
                    print(f'{file_path} was preformed successfully')
    
        except Exception as err:
            print(f'{file_path} failed because {err}')
    else:
        table_info_query = f'SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = "{table_name}" AND TABLE_SCHEMA="{db}";'
        cursor.execute(table_info_query)
        rslt = cursor.fetchall()
        column_names_df = pd.DataFrame(rslt, columns=["COLUMN_NAME"])
        globals()[f'{table_name}_df'] = pd.DataFrame(column_names_df)
    
# finally:
cursor.close()
conn.close()

In [8]:
for table_name, file_path in file_paths.items():
    f'{table_name}_df'

In [9]:
chicago_socioeconomic_data_df
chicago_public_schools_df
chicago_crime_df


Unnamed: 0,COLUMN_NAME
0,ARREST
1,BEAT
2,BLOCK
3,CASE_NUMBER
4,COMMUNITY_AREA_NUMBER
5,DATE
6,DESCRIPTION
7,DISTRICT
8,DOMESTIC
9,FBICODE


## Ex 1 

Using Joins
You have been asked to produce some reports about the communities and crimes in the Chicago area. You will need to use SQL join queries to access the data stored across multiple tables.


### Question 1
Write and execute a SQL query to list the school names, community names and average attendance for communities with a hardship index of 98.


In [10]:
conn = get_db_connection(db)
cursor = conn.cursor()


In [11]:
q1_1 = '''
SELECT S.NAME_OF_SCHOOL, SE.HARDSHIP_INDEX, SE.COMMUNITY_AREA_NUMBER, SE.COMMUNITY_AREA_NAME FROM chicago_public_schools AS S LEFT JOIN chicago_socioeconomic_data AS SE ON S.COMMUNITY_AREA_NUMBER = SE.COMMUNITY_AREA_NUMBER WHERE SE.HARDSHIP_INDEX = 98;'''



In [12]:
cursor.execute(q1_1)
rslt1_1 = cursor.fetchall()
rslt1_1_df = pd.DataFrame(rslt1_1)


In [13]:
rslt1_1_df

Unnamed: 0,0,1,2,3
0,George Washington Carver Military Academy High...,98,54,Riverdale
1,George Washington Carver Primary School,98,54,Riverdale
2,Ira F Aldridge Elementary School,98,54,Riverdale
3,William E B Dubois Elementary School,98,54,Riverdale


### Question 2
Write and execute a SQL query to list all crimes that took place at a school. Include case number, crime type and community name.


In [14]:
q1_2 = '''
SELECT C.ID, C.PRIMARY_TYPE, C.LOCATION_DESCRIPTION, SE.COMMUNITY_AREA_NAME FROM chicago_crime AS C LEFT JOIN chicago_socioeconomic_data AS SE ON C.COMMUNITY_AREA_NUMBER = SE.COMMUNITY_AREA_NUMBER WHERE C.LOCATION_DESCRIPTION LIKE "%SCHOOL%" OR "SCHOOL%" OR "%SCHOOL";'''



In [15]:
cursor.execute(q1_2)
rslt1_2 = cursor.fetchall()
rslt1_2_df = pd.DataFrame(rslt1_2)
rslt1_2_df

Unnamed: 0,0,1,2,3
0,4006321,BATTERY,"SCHOOL, PUBLIC, GROUNDS",South Shore
1,4430638,BATTERY,"SCHOOL, PUBLIC, BUILDING",Lincoln Square
2,6644618,BATTERY,"SCHOOL, PUBLIC, BUILDING",Douglas
3,2341955,BATTERY,"SCHOOL, PUBLIC, BUILDING",Austin
4,11110571,BATTERY,"SCHOOL, PUBLIC, GROUNDS",Ashburn
5,7399281,CRIMINAL DAMAGE,"SCHOOL, PUBLIC, GROUNDS",Austin
6,3530721,NARCOTICS,"SCHOOL, PUBLIC, GROUNDS",Rogers Park
7,7502426,NARCOTICS,"SCHOOL, PUBLIC, BUILDING",Brighton Park
8,8082600,ASSAULT,"SCHOOL, PUBLIC, GROUNDS",East Garfield Park
9,7174283,CRIMINAL TRESPASS,"SCHOOL, PUBLIC, GROUNDS",Ashburn


In [16]:
cursor.close()
conn.close()

## Ex2: Creating a View


For privacy reasons, you have been asked to create a view that enables users to select just the school name and the icon fields from the CHICAGO_PUBLIC_SCHOOLS table. By providing a view, you can ensure that users cannot see the actual scores given to a school, just the icon associated with their score. You should define new names for the view columns to obscure the use of scores and icons in the original table.



Question 1
Write and execute a SQL statement to create a view showing the columns listed in the following table, with new column names as shown in the second column.
| Column name in CHICAGO_PUBLIC_SCHOOLS|	Column name in view|
|-----|----|
|NAME_OF_SCHOOL|	School_Name|
|Safety_Icon	|Safety_Rating|
|Family_Involvement_Icon|	Family_Rating|
|Environment_Icon|	Environment_Rating|
|Instruction_Icon|	Instruction_Rating|
|Leaders_Icon	|Leaders_Rating|
|Teachers_Icon|	Teachers_Rating|


Write and execute a SQL statement that returns all of the columns from the view.



In [17]:
view = '''
CREATE OR REPLACE VIEW P_SCHOOL_VIEW AS 
SELECT NAME_OF_SCHOOL AS School_Name,
Safety_Icon	AS Safety_Rating,
Family_Involvement_Icon AS	Family_Rating , 
Environment_Icon AS	Environment_Rating, 
Instruction_Icon AS	Instruction_Rating , 
Leaders_Icon AS	Leaders_Rating , 
Teachers_Icon AS	Teachers_Rating 
FROM chicago_public_schools;

''' 

In [18]:
conn = get_db_connection(db)
cursor = conn.cursor()
cursor.execute(view)
q_view = '''SELECT School_Name,Safety_Rating, Family_Rating, Environment_Rating, Instruction_Rating, Leaders_Rating , Teachers_Rating FROM P_SCHOOL_VIEW;'''
cursor.execute(q_view)
rslt2_1 = cursor.fetchall()
# first_row = rslt2_1[0]
rslt2_1_df = pd.DataFrame(rslt2_1, columns=['School_Name','Safety_Rating', 'Family_Rating', 'Environment_Rating', 'Instruction_Rating', 'Leaders_Rating' , 'Teachers_Rating' ])
rslt2_1_df

Unnamed: 0,School_Name,Safety_Rating,Family_Rating,Environment_Rating,Instruction_Rating,Leaders_Rating,Teachers_Rating
0,Abraham Lincoln Elementary School,Very Strong,Very Strong,Strong,Strong,Weak,Strong
1,Adam Clayton Powell Paideia Community Academy ...,Average,Strong,Strong,Very Strong,Weak,Strong
2,Adlai E Stevenson Elementary School,Strong,NDA,Average,Weak,Weak,NDA
3,Agustin Lara Elementary Academy,Average,Average,Average,Weak,Weak,Average
4,Air Force Academy High School,Average,Strong,Strong,Average,Weak,Average
...,...,...,...,...,...,...,...
561,William T Sherman Elementary School,Weak,NDA,Average,Average,Weak,NDA
562,William W Carter Elementary School,Very Weak,Average,Weak,Weak,Weak,Strong
563,Wolfgang A Mozart Elementary School,Average,NDA,Average,Weak,Weak,NDA
564,Woodlawn Community Elementary School,Strong,NDA,Very Strong,Strong,Weak,NDA


Write and execute a SQL statement that returns just the school name and leaders rating from the view.


In [19]:
cursor.execute(view)
q_view = '''SELECT School_Name, Leaders_Rating  FROM P_SCHOOL_VIEW;'''
cursor.execute(q_view)
rslt2_2 = cursor.fetchall()
# first_row = rslt2_1[0]
rslt2_2_df = pd.DataFrame(rslt2_2, columns=['School_Name','Leaders_Rating' ])
rslt2_2_df

Unnamed: 0,School_Name,Leaders_Rating
0,Abraham Lincoln Elementary School,Weak
1,Adam Clayton Powell Paideia Community Academy ...,Weak
2,Adlai E Stevenson Elementary School,Weak
3,Agustin Lara Elementary Academy,Weak
4,Air Force Academy High School,Weak
...,...,...
561,William T Sherman Elementary School,Weak
562,William W Carter Elementary School,Weak
563,Wolfgang A Mozart Elementary School,Weak
564,Woodlawn Community Elementary School,Weak


In [20]:
cursor.close()
conn.close()

## Ex 3: Creating a Stored Procedure


The icon fields are calculated based on the value in the corresponding score field. You need to make sure that when a score field is updated, the icon field is updated too. To do this, you will write a stored procedure that receives the school id and a leaders score as input parameters, calculates the icon setting and updates the fields appropriately.



### Question 1
Write the structure of a query to create or replace a stored procedure called UPDATE_LEADERS_SCORE that takes a in_School_ID parameter as an integer and a in_Leader_Score parameter as an integer.
Take a screenshot showing the SQL query.



### Question 2
Inside your stored procedure, write a SQL statement to update the Leaders_Score field in the CHICAGO_PUBLIC_SCHOOLS table for the school identified by in_School_ID to the value in the in_Leader_Score parameter.
Take a screenshot showing the SQL query.



In [21]:
conn = get_db_connection(db)
cursor = conn.cursor()


In [30]:
sp_3= '''
CREATE PROCEDURE UPDATE_LEADER_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
BEGIN
    IF in_School_ID IS NOT NULL AND in_Leader_Score IS NOT NULL THEN
        UPDATE chicago_public_schools
        SET Leaders_Score = in_Leader_Score
        WHERE School_ID = in_School_ID;
    END IF;
END;
'''
drop_sp_3 = '''DROP PROCEDURE IF EXISTS UPDATE_LEADER_SCORE;'''

In [31]:
cursor.execute(sp_3)


In [None]:
sp_3= '''
CREATE PROCEDURE UPDATE_LEADER_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
BEGIN
    IF in_School_ID IS NOT NULL AND in_Leader_Score IS NOT NULL THEN
        UPDATE chicago_public_schools
        SET Leaders_Score = in_Leader_Score
        WHERE School_ID = in_School_ID;
    END IF;
END;
'''

### Question 3
Inside your stored procedure, write a SQL IF statement to update the Leaders_Icon field in the CHICAGO_PUBLIC_SCHOOLS table for the school identified by in_School_ID using the following information.
|Score lower limit|	Score upper limit|	Icon|
|----|----|----|
|80	|99|	Very strong|
|60	|79	|Strong|
|40|	59|	Average|
|20	|39	|Weak|
|0|	19|	Very weak|



In [36]:
sp_3= '''
CREATE PROCEDURE UPDATE_LEADER_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
BEGIN
    IF in_School_ID IS NOT NULL AND in_Leader_Score IS NOT NULL THEN
        UPDATE chicago_public_schools
        SET Leaders_Score = in_Leader_Score
        WHERE School_ID = in_School_ID;
        
        IF in_Leader_Score >= 80 AND in_Leader_Score <= 99 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Very strong'
            WHERE School_ID = in_School_ID;
        
        ELSEIF in_Leader_Score >= 60 AND in_Leader_Score <=79 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Strong'
            WHERE School_ID = in_School_ID;  
            
        ELSEIF in_Leader_Score >=40 AND in_Leader_Score <= 59 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Average'
            WHERE School_ID = in_School_ID;
        
        ELSEIF in_Leader_Score >=20 AND in_Leader_Score <= 39 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Weak'
            WHERE School_ID = in_School_ID;
        
        ELSEIF in_Leader_Score >= 0 AND in_Leader_Score <= 19 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Very weak'
            WHERE School_ID = in_School_ID;
        
        END IF;
    END IF; 
END;
'''

### Question 4
Run your code to create the stored procedure.

Write a query to call the stored procedure, passing a valid school ID and a leader score of 50, to check that the procedure works as expected.


In [37]:
cursor.execute(sp_3)


In [39]:
get_school_id = '''SELECT * FROM chicago_public_schools where Leaders_Score != 50;'''
cursor.execute(get_school_id)
rows = cursor.fetchall()
# print(*[row for row in rows], sep ='\n')
#(610038, 'Abraham Lincoln Elementary School', 'ES', '615 W Kemper Pl', 'Chicago', 'IL', 60614, '(773) 534-5720', 'http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf', 'Fullerton Elementary Network', 'NORTH-NORTHWEST SIDE COLLABORATIVE', 'No', 'Standard', 'Not on Probation', 'Level 1', 'Yes', 'Very Strong', '99', 'Very Strong', '99', 'Strong', '74', 'Strong', '66', 'Weak', '65', 'Strong', '70', 'Strong', '56', 'Average', '47', '96.00%', Decimal('2.0'), '96.40%', '95.80%', '80.1', '43.3', '89.6', '84.', '60.', '62.', '81.', '85.', '52', '62.', '66.', '77.', '69.7', '64.4', '0.2', '0.9', 'Yellow', 'Green', '67.1', '54.5', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 813, 33, 'NDA', Decimal('1171699.458'), Decimal('1915829.428'), Decimal('41.92449696'), Decimal('-87.64452163'), 7, 'LINCOLN PARK', 43, 18, '(41.92449696, -87.64452163)')


(610038, 'Abraham Lincoln Elementary School', 'ES', '615 W Kemper Pl', 'Chicago', 'IL', 60614, '(773) 534-5720', 'http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf', 'Fullerton Elementary Network', 'NORTH-NORTHWEST SIDE COLLABORATIVE', 'No', 'Standard', 'Not on Probation', 'Level 1', 'Yes', 'Very Strong', '99', 'Very Strong', '99', 'Strong', '74', 'Strong', '66', 'Weak', '65', 'Strong', '70', 'Strong', '56', 'Average', '47', '96.00%', Decimal('2.0'), '96.40%', '95.80%', '80.1', '43.3', '89.6', '84.', '60.', '62.', '81.', '85.', '52', '62.', '66.', '77.', '69.7', '64.4', '0.2', '0.9', 'Yellow', 'Green', '67.1', '54.5', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 'NDA', 813, 33, 'NDA', Decimal('1171699.458'), Decimal('1915829.428'), Decimal('41.92449696'), Decimal('-87.64452163'), 7, 'LINCOLN PARK', 43, 18, '(41.92449696, -87.64452163)')
(610281, 'Adam Clayton Powell Paideia Community Academy Elementary School', 'ES', '7511 S South Shore Dr

In [43]:
test_call = "CALL UPDATE_LEADER_SCORE(610038, 50);"
cursor.execute(test_call)
sp_res = cursor.fetchall()


In [45]:
test_q = "SELECT School_ID, Leaders_Score, Teachers_Icon FROM chicago_public_schools WHERE School_ID = 610038"
cursor.execute(test_q)
sp_out = cursor.fetchall()
sp_out_df = pd.DataFrame(sp_out)

In [46]:
sp_out_df

Unnamed: 0,0,1,2
0,610038,50,Strong


In [47]:
cursor.close()
conn.close()

## Ex 4: Using Transactions


You realise that if someone calls your code with a score outside of the allowed range (0-99), then the score will be updated with the invalid data and the icon will remain at its previous value. There are various ways to avoid this problem, one of which is using a transaction.



### Question 1
Update your stored procedure definition. Add a generic ELSE clause to the IF statement that rolls back the current work if the score did not fit any of the preceding categories.


### Question 2
Update your stored procedure definition again. Add a statement to commit the current unit of work at the end of the procedure.

Run your code to replace the stored procedure.



In [57]:
drop_sp_3 = '''DROP PROCEDURE IF EXISTS UPDATE_LEADER_SCORE;'''
conn = get_db_connection(db)
cursor = conn.cursor()
cursor.execute(drop_sp_3)


In [58]:
sp_3 = '''
CREATE PROCEDURE UPDATE_LEADER_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
BEGIN
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    START TRANSACTION;

    IF in_School_ID IS NOT NULL AND in_Leader_Score IS NOT NULL THEN
        UPDATE chicago_public_schools
        SET Leaders_Score = in_Leader_Score
        WHERE School_ID = in_School_ID;

        IF in_Leader_Score >= 80 AND in_Leader_Score <= 99 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Very strong'
            WHERE School_ID = in_School_ID;

        ELSEIF in_Leader_Score >= 60 AND in_Leader_Score <= 79 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Strong'
            WHERE School_ID = in_School_ID;  

        ELSEIF in_Leader_Score >= 40 AND in_Leader_Score <= 59 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Average'
            WHERE School_ID = in_School_ID;

        ELSEIF in_Leader_Score >= 20 AND in_Leader_Score <= 39 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Weak'
            WHERE School_ID = in_School_ID;

        ELSEIF in_Leader_Score >= 0 AND in_Leader_Score <= 19 THEN 
            UPDATE chicago_public_schools
            SET Leaders_Icon = 'Very weak'
            WHERE School_ID = in_School_ID;

        ELSE
            ROLLBACK;
        END IF;

        COMMIT;
    ELSE
        ROLLBACK;
    END IF; 
END
'''


In [59]:

cursor.execute(sp_3)


In [61]:

test_q = "SELECT School_ID, Leaders_Score, Teachers_Icon FROM chicago_public_schools WHERE School_ID = 610038"
cursor.execute(test_q)
sp_out = cursor.fetchall()
sp_out_df = pd.DataFrame(sp_out)
sp_out_df

Unnamed: 0,0,1,2
0,610038,65,Strong


Write and run one query to check that the updated stored procedure works as expected when you use a valid score of 38.



In [62]:
test_call = "CALL UPDATE_LEADER_SCORE(610038, 50);"
cursor.execute(test_call)
sp_res = cursor.fetchall()

In [64]:
test_q = "SELECT School_ID, Leaders_Score, Teachers_Icon FROM chicago_public_schools WHERE School_ID = 610038"
cursor.execute(test_q)
sp_out = cursor.fetchall()
sp_out_df = pd.DataFrame(sp_out)
sp_out_df

Unnamed: 0,0,1,2
0,610038,50,Strong


Write and run another query to check that the updated stored procedure works as expected when you use an invalid score of 101.



In [65]:
test_call = "CALL UPDATE_LEADER_SCORE(610038, 101);"
cursor.execute(test_call)
sp_res = cursor.fetchall()

In [66]:
test_q = "SELECT School_ID, Leaders_Score, Teachers_Icon FROM chicago_public_schools WHERE School_ID = 610038"
cursor.execute(test_q)
sp_out = cursor.fetchall()
sp_out_df = pd.DataFrame(sp_out)
sp_out_df

Unnamed: 0,0,1,2
0,610038,50,Strong


In [67]:
cursor.close()
conn.close()