IM02 Online Section Group Notebook

#Question 1: Which teams have most consistently had the highest scores?

In [3]:
import pandas as pd
import sqlite3


#using entire database
con = sqlite3.connect("data/nba.sqlite")
games = pd.read_sql_query("SELECT * FROM game", con)
active_players = pd.read_sql_query("SELECT full_name FROM player WHERE is_active=TRUE", con)


# games data excluding all-star games (2019-2023)

games_recent_5yrs = pd.read_sql_query("""
	SELECT * FROM game 
	WHERE season_type NOT IN ('All Star', 'All-Star') 
	AND game_date BETWEEN '2019-01-01 00:00:00' AND '2023-12-31 23:59:59'
    ORDER BY game_date DESC
""", con)

print(games_recent_5yrs.head(5))



  season_id team_id_home team_abbreviation_home  team_name_home     game_id  \
0     42022   1610612743                    DEN  Denver Nuggets  0042200405   
1     42022   1610612748                    MIA      Miami Heat  0042200404   
2     42022   1610612748                    MIA      Miami Heat  0042200403   
3     42022   1610612743                    DEN  Denver Nuggets  0042200402   
4     42022   1610612743                    DEN  Denver Nuggets  0042200401   

             game_date matchup_home wl_home  min  fgm_home  ...  reb_away  \
0  2023-06-12 00:00:00  DEN vs. MIA       W  240      38.0  ...      44.0   
1  2023-06-09 00:00:00  MIA vs. DEN       L  240      35.0  ...      34.0   
2  2023-06-07 00:00:00  MIA vs. DEN       L  240      34.0  ...      58.0   
3  2023-06-04 00:00:00  DEN vs. MIA       L  240      39.0  ...      31.0   
4  2023-06-01 00:00:00  DEN vs. MIA       W  240      40.0  ...      43.0   

   ast_away  stl_away  blk_away  tov_away  pf_away  pts_away  

In [8]:
# mean, standard deviation of scores for each team
score_stats_home = games_recent_5yrs.groupby(['team_id_home','team_name_home'])['pts_home'].agg(['mean', 'std', 'count'])
score_stats_away = games_recent_5yrs.groupby(['team_id_away','team_name_away'])['pts_away'].agg(['mean', 'std', 'count'])

score_stats_home.index.names = ['team_id', 'team_name']
score_stats_away.index.names = ['team_id', 'team_name']

combined_stats = pd.concat([score_stats_home, score_stats_away])
score_stats_all = combined_stats.groupby(['team_id', 'team_name']).mean()
score_stats_all['total_games'] = combined_stats.groupby(['team_id', 'team_name'])['count'].sum()
score_stats_all = score_stats_all[score_stats_all['total_games'] >= 100]


# teams with highest average
print("Teams with highest average scores:")
print(score_stats_all.sort_values(by='mean', ascending=False).head(25))


Teams with highest average scores:
                                         mean        std  count  total_games
team_id    team_name                                                        
1610612749 Milwaukee Bucks         116.435716  12.726194  217.0          434
1610612762 Utah Jazz               114.124126  11.353901  196.5          393
1610612737 Atlanta Hawks           114.082051  12.425118  195.0          390
1610612750 Minnesota Timberwolves  114.046378  11.995903  184.0          368
1610612758 Sacramento Kings        113.760321  12.404773  186.5          373
1610612744 Golden State Warriors   113.553368  12.502250  207.5          415
1610612751 Brooklyn Nets           113.466575  12.269783  195.5          391
1610612740 New Orleans Pelicans    113.429579  11.357125  184.5          369
1610612763 Memphis Grizzlies       113.219401  12.564816  196.5          393
1610612756 Phoenix Suns            113.174049  11.591735  205.5          411
1610612743 Denver Nuggets          112.98

In [9]:
# teams with consistent scoring
print("Most consistent scoring teams (lowest standard deviation):")
print(score_stats_all.sort_values(by='std').head(25))

Most consistent scoring teams (lowest standard deviation):
                                         mean        std  count  total_games
team_id    team_name                                                        
1610612762 Utah Jazz               114.124126  11.353901  196.5          393
1610612739 Cleveland Cavaliers     107.244895  11.354891  182.0          364
1610612740 New Orleans Pelicans    113.429579  11.357125  184.5          369
1610612756 Phoenix Suns            113.174049  11.591735  205.5          411
1610612748 Miami Heat              108.997146  11.596273  217.5          435
1610612761 Toronto Raptors         111.561036  11.654833  203.0          406
1610612743 Denver Nuggets          112.986301  11.774490  219.0          438
1610612753 Orlando Magic           107.049138  11.822853  189.0          378
1610612750 Minnesota Timberwolves  114.046378  11.995903  184.0          368
1610612765 Detroit Pistons         106.855801  11.997323  183.0          366
1610612747 Los An

#Introduction
#A "good" game isn't just about who wins. However, when it comes to which team to bet our money on, their points certainly matter. In evaluating teams for investment potential, scoring ability has traditionally been a key metric. With NBA scoring increasing in recent seasons, investors need to look deeper than just offensive output. Is a high-scoring offense truly indicative of a team's value and future success? To identify the most promising investment opportunities for the upcoming season, we analyzed scoring patterns from 2019-2023, examining not just point totals but offensive consistency. This comprehensive analysis aims to uncover teams that demonstrate sustainable excellence.
#(visualization) NBA scoring increasing in recent seasons

#Rising Action
#Initial market analysis might attract investors to the Milwaukee Bucks, who lead with an average of 116.44 points per game. However, high performance must be sustainable to ensure reliable returns. Our analysis shows that the top 10 teams all average over 113 points, indicating intense competition and multiple investment opportunities. The key question now is “Which team's offensive success is most sustainable?"
#(visualization) average points per game by team (top 10) & average = 113 points

#Climax
#The most meaningful insight came when we examined scoring consistency. While the Milwaukee Bucks led in scoring power, the Utah Jazz stood out in offensive efficiency, maintaining a high scoring average (114.12 points) with the league's lowest standard deviation (11.35). This remarkable consistency, coupled with their scoring volume, suggests an optimal balance of return and risk.
#Additionally, teams like the New Orleans Pelicans, Phoenix Suns, and Denver Nuggets demonstrated exceptional consistency across over 400 games, despite not leading in raw scoring numbers. This highlights that offensive stability might be just as valuable as scoring power.

#Falling Action
#This discovery showed us that being the best isn't just about scoring the most points. It's about being able to perform well game after game. While teams with higher scoring averages might attract immediate attention, the Utah Jazz's combination of strong scoring and consistency supports long-term value appreciation. 
#(visualization) 1st axis 80~ / 2nd axis standard deviation

#Resolution
#(Add later: we selected these teams because point over 114, stds under 12) Our analysis suggests three investment opportunities:
#Premium Investment (Utah Jazz): Balance of high scoring and consistency, suggesting strong fundamentals and reliable returns
#Value Opportunities (New Orleans Pelicans, Phoenix Suns, Denver Nuggets): Strong consistency metrics suggest possible market undervaluation
#Growth Potential (Minnesota Timberwolves): Higher scoring but more variable The Utah Jazz’s combination of offensive excellence and consistency makes them a particularly attractive investment option.

#Question 2: Which team has had below average for the most consecutive seasons?

In [10]:
# average points per season
season_avg = pd.concat([games_recent_5yrs['pts_home'], games_recent_5yrs['pts_away']]).mean()

home_games = games_recent_5yrs[['season_id', 'team_id_home', 'team_name_home', 'pts_home']].rename(columns={
    'team_id_home': 'team_id',
    'team_name_home': 'team_name',
    'pts_home': 'points'
})
away_games = games_recent_5yrs[['season_id', 'team_id_away', 'team_name_away', 'pts_away']].rename(columns={
    'team_id_away': 'team_id',
    'team_name_away': 'team_name',
    'pts_away': 'points'
})
all_games = pd.concat([home_games, away_games])



    season_id     team_id       team_name      points  league_avg  \
60      12022       15019  Adelaide 36ers  116.000000  111.924325   
0       12020  1610612737   Atlanta Hawks  112.750000  111.924325   
30      12021  1610612737   Atlanta Hawks  103.250000  111.924325   
61      12022  1610612737   Atlanta Hawks  112.750000  111.924325   
92      22018  1610612737   Atlanta Hawks  116.391304  111.924325   
122     22019  1610612737   Atlanta Hawks  111.761194  111.924325   
152     22020  1610612737   Atlanta Hawks  113.694444  111.924325   
182     22021  1610612737   Atlanta Hawks  113.939024  111.924325   
212     22022  1610612737   Atlanta Hawks  118.426829  111.924325   
274     42020  1610612737   Atlanta Hawks  106.277778  111.924325   

     below_average  
60           False  
0            False  
30            True  
61           False  
92           False  
122           True  
152          False  
182          False  
212          False  
274           True  
Teams wit

In [13]:
# average points per team per season
team_season_avg = all_games.groupby(['season_id', 'team_id', 'team_name'])['points'].mean().reset_index()

# below average seasons
team_season_avg['league_avg'] = season_avg
team_season_avg['below_average'] = team_season_avg['points'] < team_season_avg['league_avg']
team_season_avg = team_season_avg.sort_values(['team_id', 'season_id'])
print(team_season_avg.head(10))

    season_id     team_id       team_name      points  league_avg  \
60      12022       15019  Adelaide 36ers  116.000000  111.924325   
0       12020  1610612737   Atlanta Hawks  112.750000  111.924325   
30      12021  1610612737   Atlanta Hawks  103.250000  111.924325   
61      12022  1610612737   Atlanta Hawks  112.750000  111.924325   
92      22018  1610612737   Atlanta Hawks  116.391304  111.924325   
122     22019  1610612737   Atlanta Hawks  111.761194  111.924325   
152     22020  1610612737   Atlanta Hawks  113.694444  111.924325   
182     22021  1610612737   Atlanta Hawks  113.939024  111.924325   
212     22022  1610612737   Atlanta Hawks  118.426829  111.924325   
274     42020  1610612737   Atlanta Hawks  106.277778  111.924325   

     below_average  
60           False  
0            False  
30            True  
61           False  
92           False  
122           True  
152          False  
182          False  
212          False  
274           True  


In [14]:

# consecutive streaks
def consecutive_streak(group):
   current_streak = 0
   max_streak = 0
   
   for below_avg in group:
       if below_avg:
           current_streak += 1
           max_streak = max(max_streak, current_streak)
       else:
           current_streak = 0
           
   return max_streak

team_streaks = team_season_avg.groupby('team_name')['below_average'].apply(consecutive_streak)
team_streaks = team_streaks.sort_values(ascending=False)

print("Teams with most consecutive below-average seasons:")
print(team_streaks.head(10))


team_streaks = team_streaks.sort_values(ascending=True)

print("Teams with least consecutive below-average seasons:")
print(team_streaks.head(10))


Teams with most consecutive below-average seasons:
team_name
Orlando Magic          9
Detroit Pistons        9
Miami Heat             7
Cleveland Cavaliers    7
Charlotte Hornets      5
Houston Rockets        5
San Antonio Spurs      4
Chicago Bulls          4
New York Knicks        4
Los Angeles Lakers     4
Name: below_average, dtype: int64
Teams with least consecutive below-average seasons:
team_name
Adelaide 36ers               0
Ra'anana Maccabi Ra'anana    1
Golden State Warriors        1
New Orleans Pelicans         1
Sacramento Kings             1
Utah Jazz                    2
Boston Celtics               2
Brooklyn Nets                2
Dallas Mavericks             2
Phoenix Suns                 2
Name: below_average, dtype: int64


#Introduction
#NBA scoring ability is often viewed as a key indicator of team success. However, it's not unheard of for a team to score poorly at the start of the playing year and still move on to the playoffs. Taking a look at consecutive underperformance can give us a better understanding of a team’s overall scoring ability. Analyzing game-point patterns across all season types—preseason, regular season, and playoffs—from 2019 to 2023, we investigated which teams have consecutively struggled to keep up with the league's scoring average.
#(Visual: line chart across seasons)

#Rising Action
#Our analysis showed that a league's average scoring was 111.92 points per game, establishing a clear benchmark for team performance evaluation. Using this threshold, our goal was to see which teams have had the most consecutive seasons averaged a score below the league standard  – potentially indicating systematic issues that could affect their value.
#(Visual: distribution and box plot)

#Climax
#We discovered a striking contrast in scoring consistency across the NBA. The Orlando Magic and Detroit Pistons have notably struggled with scoring, each recording 9 consecutive seasons below the league average - a persistent challenge rather than a temporary slump. The Miami Heat and Cleveland Cavaliers followed with 7-season streaks, suggesting potential value opportunities if these playing challenges can be addressed. Meanwhile, teams like the New Orleans Pelicans, Utah Jazz, and Phoenix Suns demonstrated remarkable scoring stability, rarely falling below the league average.
#(Visual: Bar chart)

#Falling Action
#There are many reasons a team can do poorly – from poor coaching, a poor player line-up, or insufficient financial support.  However, when we look for which teams had the least consecutive underperforming seasons, we can see that there are many teams that can turn it around within the 4 possible seasons types. This trend highlights a key market insight: a team's historical prestige or market size does not guarantee consistency in scoring capabilities, emphasizing that organization and player effectiveness is a critical factor in securing a game-win.
#(Visual: limited bar chart)

#Resolution
#From an investment perspective, persistent underperformers may present unique investment opportunities if that team demonstrates turnaround potential. There are many teams, including the Golden State Warriors, Sacramento Kings, and New Orleans Pelicans who appear as safer opportunities with one season or less of consecutive underperformance – implying an ability to recover from inevitable losses.


 Question 3: Which team has consistently made it to the playoffs?

In [7]:

# playoff games
playoff_games = games_recent_5yrs[games_recent_5yrs['season_type']=='Playoffs']

# playoff appearances (seasons)
playoff_home_seasons = playoff_games[['season_id', 'team_name_home']].rename(
   columns={'team_name_home': 'team_name'}).drop_duplicates()
playoff_away_seasons = playoff_games[['season_id', 'team_name_away']].rename(
   columns={'team_name_away': 'team_name'}).drop_duplicates()
all_playoff_seasons = pd.concat([playoff_home_seasons, playoff_away_seasons]).drop_duplicates()
playoff_appearances = all_playoff_seasons.groupby('team_name').size().sort_values(ascending=False)

# total playoff games (games)
playoff_home_games = playoff_games[['team_name_home', 'game_id']].rename(
   columns={'team_name_home': 'team_name'})
playoff_away_games = playoff_games[['team_name_away', 'game_id']].rename(
   columns={'team_name_away': 'team_name'})
all_playoff_games = pd.concat([playoff_home_games, playoff_away_games])
playoff_games_count = all_playoff_games.groupby('team_name').size()
print(playoff_games_count.head(5)) 

# playoff appearances, for both seasons & games
playoff_all = pd.DataFrame({
    'team_name': playoff_appearances.index,
    'seasons': playoff_appearances.values,
    'total_games': playoff_games_count[playoff_appearances.index].values
})  

print(playoff_all.head(5))

# average playoff game per season 
playoff_all['games_per_season'] = (playoff_all['total_games'] / playoff_all['seasons'])

# success rate
total_seasons = games_recent_5yrs['season_id'].nunique()
playoff_all['success_rate'] = (playoff_all['seasons'] / total_seasons * 100)

print("\nPlayoff appearances and games (2019-2023):")
print(playoff_all[['team_name', 'seasons', 'success_rate', 'total_games']].head(10))

# playoff appearance rate 
total_seasons = games_recent_5yrs['season_id'].nunique()
playoff_all['appearance_rate'] = (playoff_all['seasons'] / total_seasons * 100)


# playoff win percentage
home_wins = playoff_games[playoff_games['wl_home'] == 'W'][['team_name_home', 'game_id']].rename(
   columns={'team_name_home': 'team_name'})
# Playoff win 계산 (away games)
away_wins = playoff_games[playoff_games['wl_away'] == 'W'][['team_name_away', 'game_id']].rename(
   columns={'team_name_away': 'team_name'})
all_wins = pd.concat([home_wins, away_wins])
playoff_wins_count = all_wins.groupby('team_name').size()

playoff_all['win_percentage'] = (playoff_wins_count.reindex(playoff_all['team_name']).fillna(0).values / playoff_all['total_games'] * 100)

# average playoff game per season 
playoff_all['games_per_season'] = (playoff_all['total_games'] / playoff_all['seasons'])


print("\nPlayoff appearances and performance (2019-2023):")
print(playoff_all[['team_name', 'seasons', 'appearance_rate', 'total_games', 'games_per_season', 'win_percentage']]
     .sort_values(by='total_games',ascending=False).head(20))

team_name
Atlanta Hawks          29
Boston Celtics         75
Brooklyn Nets          29
Chicago Bulls           5
Cleveland Cavaliers     5
dtype: int64
            team_name  seasons  total_games
0       Brooklyn Nets        5           29
1      Denver Nuggets        5           68
2  Philadelphia 76ers        5           51
3     Milwaukee Bucks        5           65
4      Boston Celtics        5           75

Playoff appearances and games (2019-2023):
            team_name  seasons  success_rate  total_games
0       Brooklyn Nets        5     38.461538           29
1      Denver Nuggets        5     38.461538           68
2  Philadelphia 76ers        5     38.461538           51
3     Milwaukee Bucks        5     38.461538           65
4      Boston Celtics        5     38.461538           75
5          Miami Heat        4     30.769231           66
6           Utah Jazz        4     30.769231           29
7         LA Clippers        4     30.769231           43
8   Memphis Grizz

#Introduction
#In the NBA, playoff success is a crucial indicator of a team's value and investment potential. Making the playoffs not only validates a team's competitiveness but also drives revenue through ticket sales, merchandise, and media exposure. We analyzed playoff performance from 2019-2023 to identify which teams have demonstrated the most reliable postseason presence and how this might inform investment decisions.
#(Visual: bar chart, by season type)

#Rising action
#Our analysis revealed interesting patterns in playoff participation. Five teams - the Brooklyn Nets, Denver Nuggets, Philadelphia 76ers, Milwaukee Bucks, and Boston Celtics - achieved a perfect record, making the playoffs in all five seasons. This consistent postseason presence suggests strong organizational stability and reliable performance.
#However, playoff appearances alone don't tell the complete story. To truly understand a team's postseason success and potential investment value, we needed to examine how deep these teams went in the playoffs. This meant looking at the total number of playoff games played, which indicates both consistency and championship potential.
#(Visual: bar chart - seasons in playoffs by team)

#Climax
#During 2019-2023, the Boston Celtics led all teams with 75 playoff games, followed by the Denver Nuggets (68 games) and Miami Heat (66 games). Both the Boston Celtics and Denver Nuggets maintained perfect playoff attendance while achieving high win rates (54.7% and 54.4% respectively), demonstrating consistent excellence. 
#The Golden State Warriors’ performance is particularly impressive, reaching 57 games in just 3 playoff appearances (19 average games per season) and the highest win percentage (63%), suggesting they went deep in every playoff they were present.  The Milwaukee Bucks round out the top performers with 65 games and the win percentage (60%) among frequent playoff teams. These metrics showcase why these teams stand out as the NBA's most reliable playoff performers, making them particularly attractive from both competitive and investment perspectives.
#(Visual: bar chart, colored by season attendance and average games played)

#Rising Action
#On the opposite end of the spectrum, eight teams struggled with playoff consistency, making only one appearance in five years. These included established franchises like the Chicago Bulls (5 games) and Detroit Pistons (4 games). This stark contrast in performance highlights the significant gap between the league's most and least successful teams in terms of postseason achievement.
#The data also reveals that historical prestige doesn't guarantee current success, even teams with rich playoff histories can face extended periods of postseason struggles.
#(Visual: Bar chart, teams with one playoff appearance x games played)

#Resolution
#Based on our analysis, here are the key investment recommendations:
#Premier Investments (Boston Celtics, Denver Nuggets): Highest playoff game count with perfect attendance, Consistent win rates above 54%, Most reliable returns through consistent playoff success
#High-Efficiency Plays (Golden State Warriors, Miami Heat): Exceptional playoff performance when qualifying, League-leading win percentages (63% and 57.5%), Strong potential for deep playoff runs despite fewer appearances
#Stable Value (Milwaukee Bucks, Philadelphia 76ers): Consistent playoff presence, Solid win percentages (60% and 52.9%), Demonstrated potential to advance in postseason
#The Boston Celtics and Denver Nuggets stand out as the most attractive investment options, combining reliable playoff qualification with consistent deep runs. The Golden State Warriors' exceptional efficiency metrics suggest particularly strong returns during playoff years, though with slightly higher volatility due to fewer appearances.

#Question 4: Which teams have shown the greatest improvement over the past five years?

In [26]:
#Question 4: Which teams have shown the greatest improvement over the past five years?
# extract year from game_date
games_recent_5yrs['game_year'] = pd.to_datetime(games_recent_5yrs['game_date']).dt.year

# limit to regular season
regular_games = games_recent_5yrs[games_recent_5yrs['season_type']=='Regular Season']

# home and away game data 
reg_home_games = regular_games[['team_name_home', 'game_year', 'pts_home']].rename(
    columns={'team_name_home': 'team_name', 'pts_home': 'points'})
reg_away_games = regular_games[['team_name_away', 'game_year', 'pts_away']].rename(
    columns={'team_name_away': 'team_name', 'pts_away': 'points'})
reg_all_games = pd.concat([reg_home_games, reg_away_games])

# average points for each team by year
team_year_avg = reg_all_games.groupby(['team_name', 'game_year'])['points'].mean().reset_index()

# improvement
improvement_list = []

for team in team_year_avg['team_name'].unique():
    team_data = team_year_avg[team_year_avg['team_name'] == team].sort_values('game_year')
    
    total_improvement = team_data['points'].diff().sum() 
    
    improvement_list.append({
        'team_name_home': team, 
        'total_improvement': total_improvement
    })

team_improvement = pd.DataFrame(improvement_list)
team_improvement = team_improvement.sort_values(by='total_improvement', ascending=False)

print("Teams with the greatest improvement over the past five years:")
print(team_improvement.head(20))


Teams with the greatest improvement over the past five years:
            team_name_home  total_improvement
19         New York Knicks          14.695726
25        Sacramento Kings          12.650000
14       Memphis Grizzlies          10.526330
0            Atlanta Hawks           9.028261
11          Indiana Pacers           8.326020
9    Golden State Warriors           7.655134
5      Cleveland Cavaliers           7.627350
13      Los Angeles Lakers           6.813824
20   Oklahoma City Thunder           6.298844
21           Orlando Magic           5.524613
7           Denver Nuggets           5.377463
4            Chicago Bulls           5.149422
1           Boston Celtics           4.875494
28               Utah Jazz           4.740310
22      Philadelphia 76ers           4.458629
6         Dallas Mavericks           4.343460
3        Charlotte Hornets           4.276423
17  Minnesota Timberwolves           4.053968
12             LA Clippers           2.143659
15              Mi

#Introduction
#A team's ability to improve year after year can be used to predict their future potential. Our previous data has discussed which teams have done 'best' historically, this doesn’t take into consideration which teams may surprise us in the upcoming season . We analyzed scoring patterns from 2019-2023 to identify teams showing the strongest upward trajectories.
#(Visual: Line graph with range)

#Rising action
#Looking at regular season games, we discovered interesting patterns in scoring trends. According to the boxplot, the median score has shown a gradual increase from 2019 to 2023. Additionally, the lower end of the box, representing the 25th percentile, indicates that the scoring of the bottom 25% of teams has also risen steadily over the years.
#(Visual: Box plot)

#Climax
#Looking at the data, we uncovered remarkable developments in several NBA teams' scoring capabilities, both positive and negative. The New York Knicks indicated the most improvement with a 14.7-point increase in points-per-game. This was followed by Sacramento Kings and Memphis Grizzlies. In contrast, The Houston Rockets have indicated a decrease in point potential within the past 5 years.
#(Visual: box plot and/or bar chart)

#Falling action
#What this data is aiming to explain is a team’s ability to strategize and improve. Rather than showing gradual, year-over-year progress, both the New York Knicks and Sacramento Kings displayed sharp inflection points in their scoring output. Their leap from a bottom-tier offensive team to leading the league in improvement showcased successful strategic shifts that could rapidly alter a team's performance ceiling. 
#(Visual: multi-line graph)

#Resolution
#Teams who haven’t shown much improvement or have even decreased their scoring ability may still be historical winners. Improvements highlight teams that have successfully modernized their approach – teams that can be winners in the future. The NewYork Knicks and Sacramento Kings, in particular, have demonstrated the ability to significantly enhance their scoring output, suggesting potential for continued growth and value appreciation.


Question 5: Which teams have demonstrated the best defensive metrics (e.g., blocks and steals), and how has this impacted their winning consistency?

In [12]:
import pandas as pd
import sqlite3 
con = sqlite3.connect("data/nba.sqlite")
games_recent_5yrs = pd.read_sql_query("""
	SELECT * FROM game 
	WHERE season_type NOT IN ('All Star', 'All-Star') 
	AND game_date BETWEEN '2019-01-01 00:00:00' AND '2023-12-31 23:59:59'
    ORDER BY game_date DESC
""", con)

regular_games = games_recent_5yrs[games_recent_5yrs['season_type']=='Regular Season']

# home games stats 
home_defense = regular_games[['team_name_home', 'blk_home', 'stl_home', 'wl_home', 'oreb_home', 'fga_home', 'tov_home', 'fta_home', 'pts_home']].rename(
   columns={'team_name_home': 'team_name', 
           'blk_home': 'blocks',
           'stl_home': 'steals',
           'wl_home': 'win_loss',
           'oreb_home': 'off_rebounds',
           'fga_home': 'attempt_fg',
           'tov_home': 'turnovers',
           'fta_home': 'attempt_freethrow',
           'pts_home': 'total_points'})

# away games stats 
away_defense = regular_games[['team_name_away', 'blk_away', 'stl_away', 'wl_away', 'oreb_away', 'fga_away', 'tov_away', 'fta_away', 'pts_away']].rename(
   columns={'team_name_away': 'team_name',
           'blk_away': 'blocks', 
           'stl_away': 'steals',
           'wl_away': 'win_loss',
           'oreb_away': 'off_rebounds',
           'fga_away': 'attempt_fg',
           'tov_away': 'turnovers',
           'fta_away': 'attempt_freethrow',
           'pts_away': 'total_points'})

# defense metrics 
all_defense = pd.concat([home_defense, away_defense])
defense_stats = all_defense.groupby('team_name').agg(
    avg_blocks=('blocks', 'mean'),
    avg_steals=('steals', 'mean'),
    avg_rebound=('off_rebounds', 'mean'),
    avg_fieldgoal=('attempt_fg', 'mean'),
    avg_turnover=('turnovers', 'mean'),
    avg_freethrow=('attempt_freethrow', 'mean'),
    avg_pts=('total_points', 'mean'),
).reset_index()

# win rate 
wins = all_defense[all_defense['win_loss'] == 'W'].groupby('team_name').size()
total_games = all_defense.groupby('team_name').size()
win_rates = (wins / total_games).reset_index(name='win_rate')

# defense rating 
defense_analysis = pd.merge(defense_stats, win_rates, on='team_name')
defense_analysis['defense_rating'] = (defense_analysis['avg_pts'] / (.96 * defense_analysis['avg_fieldgoal'] + defense_analysis['avg_turnover'] + .44 * defense_analysis['avg_freethrow'] - defense_analysis['avg_rebound']))*100
defense_sorted = defense_analysis.sort_values(
   by=['win_rate', 'defense_rating'], 
   ascending=[False, True] #a lower defense rating is better than a higher one apparently
)

print("Teams with the best defensive metrics and their win consistency:")
print(defense_sorted[['team_name', 'avg_blocks', 'avg_steals', 'avg_pts', 'defense_rating']].head(10))
print("All Defensive Metrics")
print(defense_sorted.head(10))

Teams with the best defensive metrics and their win consistency:
             team_name  avg_blocks  avg_steals     avg_pts  defense_rating
16     Milwaukee Bucks    4.957865    7.359551  117.884831      116.599615
22  Philadelphia 76ers    5.336158    8.050847  112.870056      115.677022
7       Denver Nuggets    4.296919    7.627451  113.400560      116.524058
1       Boston Celtics    5.485876    7.502825  113.909605      116.288191
28           Utah Jazz    5.056657    6.603399  114.716714      116.798305
12         LA Clippers    4.564972    7.211864  113.166667      115.181762
23        Phoenix Suns    4.628895    7.824363  113.702550      115.370022
15          Miami Heat    3.823034    7.676966  109.252809      113.377720
27     Toronto Raptors    5.036932    8.914773  112.068182      114.303464
2        Brooklyn Nets    5.235795    6.821023  114.028409      114.775624
All Defensive Metrics
             team_name  avg_blocks  avg_steals  avg_rebound  avg_fieldgoal  \
16     Mil

#Introduction
#In the NBA, defensive prowess is often the foundation for championship success. To uncover the impact of defense on team performance, we analyzed key defensive metrics-blocks and steals- from 2019 to 2023. Our goal was to identify teams that not only excel defensively but also maintain consistent winning records. While offensive statistics often dominate headlines, our analysis reveals the role defense plays in driving victories. These insights provide a data-driven perspective, aiding more informed decisions about team valuation and investment strategies.

#Rising action
#Our analysis uncovered compelling defensive patterns across NBA teams. The Memphis Grizzlies demonstrated exceptional defensive metrics, leading with 5.62 blocks and 8.65 steals per game, while the Toronto Raptors excelled specifically in steals with 8.91 per game. These metrics stand out historically, but they raise critical investment questions.
#To truly understand the investment potential, we needed to look beyond raw defensive statistics. How well do these defensive metrics translate to actual team success? Are teams with elite defensive numbers consistently outperforming the market? By examining the correlation between defensive prowess and winning percentage, we aim to identify teams that combine defensive excellence with proven success - a potential indicator of sustainable long-term value.
#(Visual: bar chart)

#Climax
#The data reveals a compelling investment story through defensive metrics and winning consistency. The Milwaukee Bucks emerge as the top performer with a league-leading 69.1% win rate while maintaining solid defensive numbers (5.0 blocks, 7.4 steals). Following closely, the Philadelphia 76ers demonstrate exceptional defensive capabilities with 5.3 blocks and 8.1 steals, translating to a 63.6% win rate. The Denver Nuggets and Boston Celtics round out the top tier, each converting strong defensive metrics into win rates above 62%.
#Particularly noteworthy is that while teams like the Toronto Raptors show superior defensive statistics (ranking highest in total defense rating at 13.95), their lower win rate (56.8%) suggests challenges in converting defensive excellence into consistent victories.
#This pattern indicates that for investment purposes, teams successfully balancing defensive capabilities with winning consistency, like the Bucks and 76ers, represent more stable investment opportunities.
#Climax 2
#Analysis reveals surprisingly weak correlations between defensive metrics and winning performance, with blocks (0.223), steals (0.061), and overall defense rating (0.167) showing minimal correlation with win rates. The scatter plots visualize this disconnect clearly - teams with high defensive ratings frequently fail to achieve corresponding win rates.
#These findings challenge conventional wisdom about defensive metrics as team performance indicators. For instance, while the Toronto Raptors excel in defensive statistics, their lower win rate demonstrates that defensive prowess alone doesn't guarantee success. The scattered distribution across all metrics suggests investors need a more comprehensive evaluation framework beyond defensive capabilities to assess team potential.
#(Visual: 2 scatter plots with trendline – one on avg blocks and steals, one on defensive rating)

#Falling Action
#The disconnect between defensive metrics and win rates has significant market implications. Valuation heavily weighing defensive statistics may be overvaluing certain teams while missing other investment opportunities. This misalignment creates potential market inefficiencies, particularly in teams like the Milwaukee Bucks and Philadelphia 76ers, who demonstrate success beyond pure defensive excellence. Moreover, teams focusing solely on defensive improvement might be misallocating resources, suggesting a need for a more balanced approach to team development and valuation.

#Resolution
#Based on our analysis, we identify three distinct investment categories:
#1. Premium Investments (Milwaukee Bucks, Philadelphia 76ers):
# - Balance of defensive capability and winning consistency
 #- Proven ability to convert defensive skills into victories
 #- Most stable investment prospects
#2. Value Opportunities (Denver Nuggets, Boston Celtics):
 #- Strong overall performance metrics
 #- Effective translation of defensive capabilities into wins
#3. Cautionary Investments (Toronto Raptors, Memphis Grizzlies):
 #- Superior defensive statistics but lower win rates
 #- Need for additional performance factors beyond defense
 #- Higher risk profile despite strong defensive metrics
#The Milwaukee Bucks and Philadelphia 76ers stand out as particularly attractive investments, demonstrating the crucial balance between defensive capabilities and consistent winning performance.

