# Issue 4: Business and Analytical Questions

#### Business Objective: What was the impact of the COVID-19 pandemic on the level of home court advantage?
- Question 1: How has the difference in win-loss ratio between home and away teams changed from the 2019-20 season to the 2022-23 season? (wl_home and wl_away columns in game table)
- Question 2: How has the difference in average points scored between home and away teams changed from the 2019-20 season to the 2022-23 season? (pts_home and pts_away columns in game table)
- Question 3: How has the difference in offensive and defensive rebounds between home and away teams changed from 2019-20 to 2022-23?(oreb_home, oreb_away, dreb_home, dreb_away in the game table)
- Question 4: How does the average number of three-point field goals made by home teams compare to that of away teams from the 2019-20 season to the 2022-23 season? (fg3_pct_home and fg3_pct_away columns in game table)
- Question 5: How does the free throw percentage of home teams compare to that of away teams from the 2019-20 season to the 2022-23 season? (ft_pct_home and ft_pct_away in the game table)

# Issue 5: Descriptive statistics

#### Question 1: How has the difference in win-loss ratio between home and away teams changed from the 2019-20 season to the 2022-23 season? (wl_home and wl_away columns in game table)

In [28]:
#import pandas and sqlite, and produce connection
import pandas as pd
import sqlite3
con = sqlite3.connect('C:/Users/antar/OneDrive/Documents/GitHub/UMD-INST627-Fall2024/data/nba.sqlite')

#2019-2020 season data

query_q1_2019 = '''
SELECT
    season_id,
    SUM(CASE WHEN wl_home = 'W' THEN 1 ELSE 0 END) AS home_wins,
    SUM(CASE WHEN wl_home = 'L' THEN 1 ELSE 0 END) AS home_losses,
    SUM(CASE WHEN wl_away = 'W' THEN 1 ELSE 0 END) AS away_wins,
    SUM(CASE WHEN wl_away = 'L' THEN 1 ELSE 0 END) AS away_losses
FROM game
WHERE season_id LIKE '_2019'
GROUP BY season_id

'''
q1_table_2019 = pd.read_sql_query(query_q1_2019, con)
print("2019-2020 season data")
print(q1_table_2019.describe())


#2022-2023 season data

query_q1_2022 = '''
SELECT
    season_id,
    SUM(CASE WHEN wl_home = 'W' THEN 1 ELSE 0 END) AS home_wins,
    SUM(CASE WHEN wl_home = 'L' THEN 1 ELSE 0 END) AS home_losses,
    SUM(CASE WHEN wl_away = 'W' THEN 1 ELSE 0 END) AS away_wins,
    SUM(CASE WHEN wl_away = 'L' THEN 1 ELSE 0 END) AS away_losses
FROM game
WHERE season_id LIKE '_2022'
GROUP BY season_id

'''
q1_table_2022 = pd.read_sql_query(query_q1_2022, con)
print("2022-2023 season data")
print(q1_table_2022.describe())


2019-2020 season data
        home_wins  home_losses   away_wins  away_losses
count    3.000000     3.000000    3.000000     3.000000
mean   208.000000   173.000000  173.000000   208.000000
std    326.239176   262.381402  262.381402   326.239176
min      0.000000     1.000000    1.000000     0.000000
25%     20.000000    22.000000   22.000000    20.000000
50%     40.000000    43.000000   43.000000    40.000000
75%    312.000000   259.000000  259.000000   312.000000
max    584.000000   475.000000  475.000000   584.000000
2022-2023 season data
        home_wins  home_losses   away_wins  away_losses
count    4.000000     4.000000    4.000000     4.000000
mean   199.000000   147.500000  147.500000   199.000000
std    343.955423   246.194368  246.194368   343.955423
min      0.000000     2.000000    2.000000     0.000000
25%     24.000000    26.000000   26.000000    24.000000
50%     41.000000    36.000000   36.000000    41.000000
75%    216.000000   157.500000  157.500000   216.000000
max 

#### Question 2: How has the difference in average points scored between home and away teams changed from the 2019-20 season to the 2022-23 season? (pts_home and pts_away columns in game table)

In [29]:
#2019-2020 season data

query_q2_2019 = '''
SELECT season_id, AVG(pts_home), AVG(pts_away)
FROM game
GROUP BY season_id
HAVING season_id LIKE '_2019'
'''

q2_table_2019 = pd.read_sql_query(query_q2_2019, con)
print("2019-2020 season data")
print(q2_table_2019.describe())

#2022-2023 season data

query_q2_2022 = '''
SELECT season_id, AVG(pts_home), AVG(pts_away)
FROM game
GROUP BY season_id
HAVING season_id LIKE '_2022'
'''
q2_table_2022 = pd.read_sql_query(query_q2_2022, con)
print("2022-2023 season data")
print(q2_table_2022.describe())

2019-2020 season data
       AVG(pts_home)  AVG(pts_away)
count       3.000000       3.000000
mean      125.974440     125.597669
std        25.175908      27.208071
min       110.060241     109.060241
25%       111.461660     109.896504
50%       112.863078     110.732767
75%       133.931539     133.866383
max       155.000000     157.000000
2022-2023 season data
       AVG(pts_home)  AVG(pts_away)
count       4.000000       4.000000
mean      128.536919     128.368467
std        31.042568      37.177098
min       111.583333     107.595238
25%       111.617262     108.230952
50%       113.782172     110.939315
75%       130.701829     131.076829
max       175.000000     184.000000


#### Question 3: How has the difference in offensive and defensive rebounds between home and away teams changed from 2019-20 to 2022-23?(oreb_home, oreb_away, dreb_home, dreb_away in the game table)

In [30]:
#2019-2020 season data

query_q3_2019 = '''
SELECT season_id, team_name_home, AVG(oreb_home), AVG(oreb_away), AVG(dreb_home), AVG(dreb_away)
FROM game
WHERE season_id LIKE '_2019'
GROUP BY season_id, team_name_home
ORDER BY team_name_home
'''
q3_table_2019 = pd.read_sql_query(query_q3_2019, con)
print("2019-2020 season data:")
print(q3_table_2019.describe())

#2022-2023 season data

query_q3_2022 = '''
SELECT season_id, team_name_home, AVG(oreb_home), AVG(oreb_away), AVG(dreb_home), AVG(dreb_away)
FROM game
WHERE season_id LIKE '_2022'
GROUP BY season_id, team_name_home
ORDER BY team_name_home
'''

q3_table_2022 = pd.read_sql_query(query_q3_2022, con)
print("2022-2023 season data:")
print(q3_table_2022.describe())

2019-2020 season data:
       AVG(oreb_home)  AVG(oreb_away)  AVG(dreb_home)  AVG(dreb_away)
count       47.000000       47.000000       47.000000       47.000000
mean         9.987154        9.453378       35.032945       34.809570
std          1.792427        1.696943        2.980981        2.908626
min          5.500000        5.333333       28.000000       29.333333
25%          8.947253        8.722222       33.464286       33.027778
50%         10.138889        9.666667       35.051282       34.388889
75%         11.032258       10.319701       36.361111       35.985714
max         15.000000       13.333333       46.000000       46.500000
2022-2023 season data:
       AVG(oreb_home)  AVG(oreb_away)  AVG(dreb_home)  AVG(dreb_away)
count       77.000000       77.000000       77.000000       77.000000
mean        10.494083       10.618133       34.409153       33.427327
std          2.518303        2.380763        3.097153        3.966718
min          3.000000        5.000000       

#### Question 4: How does the average number of three-point field goals made by home teams compare to that of away teams from the 2019-20 season to the 2022-23 season? (fg3_pct_home and fg3_pct_away columns in game table)

In [31]:
#2019-2020 season data

query_q4_2019 = '''
SELECT season_id, team_name_home, fg3_pct_home, fg3_pct_away
FROM game
WHERE season_id LIKE '_2019'
'''

q4_table_2019 = pd.read_sql_query(query_q4_2019, con)
print("2019-2020 season data:")
print(q4_table_2019.describe())

#2022-2023 season data

query_q4_2022 = '''
SELECT season_id, team_name_home, fg3_pct_home, fg3_pct_away
FROM game
WHERE season_id LIKE '_2022'
'''

q4_table_2022 = pd.read_sql_query(query_q4_2022, con)
print("2022-2023 season data:")
print(q4_table_2022.describe())

2019-2020 season data:
       fg3_pct_home  fg3_pct_away
count   1143.000000   1143.000000
mean       0.361411      0.353990
std        0.085972      0.083539
min        0.118000      0.094000
25%        0.303000      0.300000
50%        0.360000      0.353000
75%        0.419000      0.406000
max        0.629000      0.680000
2022-2023 season data:
       fg3_pct_home  fg3_pct_away
count   1386.000000   1386.000000
mean       0.362962      0.351737
std        0.086659      0.083581
min        0.103000      0.105000
25%        0.303000      0.296000
50%        0.362000      0.349000
75%        0.421750      0.411250
max        0.636000      0.618000


#### Question 5: How does the free throw percentage of home teams compare to that of away teams from the 2019-20 season to the 2022-23 season? (ft_pct_home and ft_pct_away in the game table)

In [32]:
#2019-2020 season data

query_q5_2019 = '''
SELECT season_id, team_name_home, ft_pct_home, ft_pct_away
FROM game
WHERE season_id LIKE '_2019'
'''

q5_table_2019 = pd.read_sql_query(query_q5_2019, con)
print("2019-2020 season data:")
print(q5_table_2019.describe())

#2022-2023 season data

query_q5_2022 = '''
SELECT season_id, team_name_home, ft_pct_home, ft_pct_away
FROM game
WHERE season_id LIKE '_2022'
'''
q5_table_2022 = pd.read_sql_query(query_q5_2022, con)
print("2022-2023 season data:")
print(q5_table_2022.describe())

2019-2020 season data:
       ft_pct_home  ft_pct_away
count  1143.000000  1143.000000
mean      0.771852     0.772411
std       0.098059     0.103027
min       0.400000     0.375000
25%       0.708000     0.708000
50%       0.781000     0.778000
75%       0.840000     0.844000
max       1.000000     1.000000
2022-2023 season data:
       ft_pct_home  ft_pct_away
count  1384.000000  1386.000000
mean      0.782598     0.780615
std       0.096220     0.098461
min       0.400000     0.300000
25%       0.723500     0.714000
50%       0.789000     0.785000
75%       0.850000     0.846000
max       1.000000     1.000000
