Permalink
Browse files

ballr shot charts

0 parents commit 252b96073244d3e9d1d7bf700cca2019631c0307 @toddwschneider committed Mar 8, 2016
Showing with 1,226 additions and 0 deletions.
  1. +3 −0 .gitignore
  2. +21 −0 LICENSE
  3. +64 −0 README.md
  4. +81 −0 fetch_shots.R
  5. +26 −0 heatmap_chart.R
  6. +11 −0 helpers.R
  7. +148 −0 hex_chart.R
  8. +14 −0 install_packages.R
  9. +80 −0 lebron.R
  10. +53 −0 players_data.R
  11. +111 −0 plot_court.R
  12. +16 −0 scatter_chart.R
  13. +306 −0 server.R
  14. +143 −0 ui.R
  15. +29 −0 www/ballr.js
  16. BIN www/basketball.png
  17. +102 −0 www/custom_styles.css
  18. +11 −0 www/flatly.css
  19. +7 −0 www/google-analytics.js
@@ -0,0 +1,3 @@
+.Rapp.history
+.DS_Store
+rsconnect/
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c) 2016
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,64 @@
+# BallR: Interactive NBA Shot Charts with R and Shiny
+
+[BallR](http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/) uses the [NBA Stats API](http://stats.nba.com/) to visualize every shot taken by a player during an NBA season dating back to 1996.
+
+[See this post for a live version of the app](http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/)
+
+## Run your own local version
+
+You can run BallR as on your own machine by pasting the following code into the R console (you'll have to [install R](https://cran.rstudio.com/) first):
+
+```R
+packages = c("shiny", "ggplot2", "hexbin", "dplyr", "httr", "jsonlite")
+install.packages(packages, repos = "https://cran.rstudio.com/")
+library(shiny)
+runGitHub("ballr", "toddwschneider")
+```
+
+## Screenshot
+
+[![ballr](https://cloud.githubusercontent.com/assets/70271/13547819/b74dca58-e2ae-11e5-8f00-7c3c768e77e3.png)](http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/)
+
+There are three chart types to choose from: **hexagonal**, **scatter**, and **heat map**
+
+### Hexagonal
+
+Hexagonal charts, which are influenced by the work of [Kirk Goldsberry at Grantland](https://grantland.com/contributors/kirk-goldsberry/), use R's `hexbin` package to bin shots into hexagonal regions. The size and opacity of each hexagon are proportional to the number of shots taken within that region, and the color of each hexagon represents your choice of metric, which can be one of:
+
+- FG% vs. league average
+- FG%
+- Points per shot
+
+There are two sliders to adjust the maximum hexagon sizes, and also the variability of sizes across hexagons, e.g. [here's the same Stephen Curry chart](https://cloud.githubusercontent.com/assets/70271/13547845/63f4101e-e2af-11e5-9a57-13a8a61b367a.png) but with larger hexagons, and plotting points per shot as the color metric.
+
+Note that the color metrics are not plotted at the individual hexagon level, but at the court region level, e.g. all hexagons on the left side of the court that are 16-24 feet from the basket will have the same color. If BallR were extended to, say, chart all shots for an entire team, then it might make sense to assign colors at the hexagon-level, but for single players that tends to produce excessive noise.
+
+### Scatter
+
+Scatter charts are the most straightforward option: they show the location of each individual shot, with color-coding for makes and misses
+
+![scatter](https://cloud.githubusercontent.com/assets/70271/13382173/dfae7f46-de3b-11e5-9ca6-1e2740904b60.png)
+
+### Heat map
+
+Heat map charts use [two-dimensional kernel density estimation](https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation) to show the distribution of shot attempts across the court.
+
+Anecdotally I've found that heat maps often show, unsurprisingly, that most shot attempts are taken in the restricted area near the basket. It might be more interesting to filter out restricted area shots when generating heat maps, for example here's the heat map of Stephen Curry's shot attempts *excluding* shots from within the restricted area:
+
+![heat map excluding restricted area](https://cloud.githubusercontent.com/assets/70271/13382185/0e3fce5a-de3c-11e5-93a7-e8a807dfbf83.png)
+
+### Filters
+
+BallR lets you filter shots along a few dimensions (zone, angle, distance, made/missed) by adjusting the inputs in the sidebar. When you apply filters, the shot chart and summary stats update automatically to reflect whatever subset of shots you have chosen.
+
+### Data
+
+The data comes directly from the NBA Stats API via the `shotchartdetail` endpoint. See [fetch_shots.R](fetch_shots.R) for the API call itself. The player select input lets you choose any player and season back to 1996, so you can compare, for example, Michael Jordan of 1996 to LeBron James of 2012.
+
+### Acknowledgments
+
+Posts by [Savvas Tjortjoglou](http://savvastjortjoglou.com/nba-shot-sharts.html) and [Eduardo Maia](http://thedatagame.com.au/2015/09/27/how-to-create-nba-shot-charts-in-r/) about making NBA shot charts in Python and R, respectively, served as useful resources
+
+## Questions/issues/contact
+
+todd@toddwschneider.com, or open a GitHub issue
@@ -0,0 +1,81 @@
+fetch_shots_by_player_id_and_season = function(player_id, season) {
+ req(player_id, season)
+
+ request = GET(
+ "http://stats.nba.com/stats/shotchartdetail",
+ query = list(
+ PlayerID = player_id,
+ Season = season,
+ ContextMeasure = "FGA",
+ DateFrom = "",
+ DateTo = "",
+ GameID = "",
+ GameSegment = "",
+ LastNGames = 0,
+ LeagueID = "00",
+ Location = "",
+ Month = 0,
+ OpponentTeamID = 0,
+ Outcome = "",
+ Period = 0,
+ Position = "",
+ RookieYear = "",
+ SeasonSegment = "",
+ SeasonType = "Regular Season",
+ TeamID = 0,
+ VsConference = "",
+ VsDivision = ""
+ )
+ )
+
+ stop_for_status(request)
+
+ data = content(request)
+
+ raw_shots_data = data$resultSets[[1]]$rowSet
+ col_names = tolower(as.character(data$resultSets[[1]]$headers))
+
+ if (length(raw_shots_data) == 0) {
+ shots = data.frame(
+ matrix(nrow = 0, ncol = length(col_names))
+ )
+ } else {
+ shots = data.frame(
+ matrix(
+ unlist(raw_shots_data),
+ ncol = length(col_names),
+ byrow = TRUE
+ )
+ )
+ }
+
+ shots = tbl_df(shots)
+ names(shots) = col_names
+
+ shots = mutate(shots,
+ loc_x = as.numeric(as.character(loc_x)) / 10,
+ loc_y = as.numeric(as.character(loc_y)) / 10 + hoop_center_y,
+ shot_distance = as.numeric(as.character(shot_distance)),
+ shot_made_numeric = as.numeric(as.character(shot_made_flag)),
+ shot_made_flag = factor(shot_made_flag, levels = c("1", "0"), labels = c("made", "missed")),
+ shot_attempted_flag = as.numeric(as.character(shot_attempted_flag)),
+ shot_value = ifelse(tolower(shot_type) == "3pt field goal", 3, 2)
+ )
+
+ raw_league_avg_data = data$resultSets[[2]]$rowSet
+ league_avg_names = tolower(as.character(data$resultSets[[2]]$headers))
+ league_averages = tbl_df(data.frame(
+ matrix(unlist(raw_league_avg_data), ncol = length(league_avg_names), byrow = TRUE)
+ ))
+ names(league_averages) = league_avg_names
+ league_averages = mutate(league_averages,
+ fga = as.numeric(as.character(fga)),
+ fgm = as.numeric(as.character(fgm)),
+ fg_pct = as.numeric(as.character(fg_pct)),
+ shot_value = ifelse(shot_zone_basic %in% c("Above the Break 3", "Backcourt", "Left Corner 3", "Right Corner 3"), 3, 2)
+ )
+
+ return(list(player = shots, league_averages = league_averages))
+}
+
+default_shots = fetch_shots_by_player_id_and_season(default_player$person_id, default_season)
@@ -0,0 +1,26 @@
+generate_heatmap_chart = function(shots, use_short_three = FALSE) {
+ if (use_short_three) {
+ base_court = short_three_court
+ } else {
+ base_court = court
+ }
+
+ base_court +
+ stat_density_2d(
+ data = shots,
+ aes(x = loc_x, y = loc_y,
+ fill = ..density..),
+ geom = "raster", contour = FALSE, interpolate = TRUE, n = 200
+ ) +
+ geom_path(data = court_points,
+ aes(x = x, y = y, group = desc, linetype = dash),
+ color = "#999999") +
+ scale_fill_gradientn(colors = inferno_colors, guide = FALSE ) +
+ scale_colour_gradientn("Shot frequency ",
+ limits = c(0, 1),
+ breaks = c(0, 1),
+ labels = c("lower", "higher"),
+ colours = inferno_colors,
+ guide = guide_colorbar(barwidth = 15)) +
+ theme(legend.text = element_text(size = rel(0.6)))
+}
@@ -0,0 +1,11 @@
+fraction_to_percent_format = function(frac, digits = 1) {
+ paste0(format(round(frac * 100, digits), nsmall = digits), "%")
+}
+
+bg_color = '#000004'
+
+# from viridis::inferno(100). hard-coded to avoid requiring the viridis package
+inferno_colors = c('#000004', '#010107', '#02020C', '#030312', '#050417', '#07051D', '#0A0723', '#0D0829', '#100A2F', '#140B35', '#170C3B', '#1B0C41', '#1F0C48', '#230C4E', '#280B53', '#2C0B58', '#310A5D', '#350960', '#3A0963', '#3E0966', '#430A68', '#470B6A', '#4B0C6B', '#4F0D6C', '#540F6D', '#58106E', '#5C126E', '#60136E', '#64156E', '#68166E', '#6C186E', '#70196E', '#741B6E', '#781C6D', '#7D1E6D', '#811F6C', '#85216B', '#89226A', '#8D2369', '#912568', '#952667', '#992865', '#9D2964', '#A12B62', '#A52D60', '#A92E5E', '#AD305C', '#B1325A', '#B53458', '#B93656', '#BD3853', '#C03A51', '#C43C4E', '#C83F4C', '#CB4149', '#CF4446', '#D24644', '#D54941', '#D84C3E', '#DB4F3B', '#DE5338', '#E15635', '#E45A32', '#E65D2F', '#E9612B', '#EB6528', '#ED6925', '#EF6D22', '#F1711E', '#F3751B', '#F47A18', '#F67E14', '#F78311', '#F8870E', '#F98C0A', '#FA9008', '#FB9506', '#FB9A06', '#FC9F07', '#FCA409', '#FCA80D', '#FCAD12', '#FCB217', '#FBB71C', '#FBBC22', '#FAC128', '#F9C72E', '#F8CC35', '#F7D13C', '#F6D643', '#F5DB4B', '#F4E054', '#F3E45D', '#F2E967', '#F1EE71', '#F2F27C', '#F3F587', '#F5F991', '#F8FC9B', '#FCFFA4')
+
+# viridis::viridis(100)
+viridis_colors = c('#440154', '#450558', '#46085C', '#470C5F', '#471063', '#481466', '#48176A', '#481B6D', '#481E70', '#482273', '#482575', '#482878', '#472B7A', '#472F7D', '#46327F', '#453580', '#453882', '#443B84', '#433E85', '#424186', '#414487', '#3F4788', '#3E4A89', '#3D4D8A', '#3C508B', '#3A538B', '#39558C', '#38588C', '#375B8D', '#355D8D', '#34608D', '#33638D', '#32658E', '#31688E', '#2F6A8E', '#2E6D8E', '#2D6F8E', '#2C728E', '#2B748E', '#2A778E', '#29798E', '#287C8E', '#277E8E', '#26808E', '#26838E', '#25858E', '#24888E', '#238A8D', '#228C8D', '#218F8D', '#20918C', '#20948C', '#1F968B', '#1F998A', '#1E9B8A', '#1F9D89', '#1FA088', '#1FA287', '#20A586', '#22A785', '#23A983', '#25AC82', '#28AE80', '#2BB07E', '#2EB37D', '#31B57B', '#35B779', '#39BA76', '#3DBC74', '#42BE72', '#46C06F', '#4BC26C', '#50C469', '#56C666', '#5BC863', '#61CA60', '#66CC5D', '#6CCE59', '#72CF56', '#79D152', '#7FD34E', '#85D44A', '#8CD646', '#92D742', '#99D93D', '#A0DA39', '#A6DB35', '#ADDC30', '#B4DD2C', '#BBDE27', '#C2DF23', '#C9E01F', '#D0E11C', '#D6E21A', '#DDE318', '#E4E418', '#EAE51A', '#F1E51D', '#F7E620', '#FDE725')
@@ -0,0 +1,148 @@
+# from ggplot2 hexbin.R: https://github.com/hadley/ggplot2/blob/master/R/hexbin.R
+hex_bounds <- function(x, binwidth) {
+ c(
+ plyr::round_any(min(x), binwidth, floor) - 1e-6,
+ plyr::round_any(max(x), binwidth, ceiling) + 1e-6
+ )
+}
+
+calculate_hex_coords = function(shots, binwidths) {
+ xbnds = hex_bounds(shots$loc_x, binwidths[1])
+ xbins = diff(xbnds) / binwidths[1]
+ ybnds = hex_bounds(shots$loc_y, binwidths[2])
+ ybins = diff(ybnds) / binwidths[2]
+
+ hb = hexbin(
+ x = shots$loc_x,
+ y = shots$loc_y,
+ xbins = xbins,
+ xbnds = xbnds,
+ ybnds = ybnds,
+ shape = ybins / xbins,
+ IDs = TRUE
+ )
+
+ shots = mutate(shots, hexbin_id = hb@cID)
+
+ hexbin_stats = shots %>%
+ group_by(hexbin_id) %>%
+ summarize(
+ hex_attempts = n(),
+ hex_pct = mean(shot_made_numeric),
+ hex_points_scored = sum(shot_made_numeric * shot_value),
+ hex_points_per_shot = mean(shot_made_numeric * shot_value)
+ )
+
+ hexbin_ids_to_zones = shots %>%
+ group_by(hexbin_id, shot_zone_range, shot_zone_area) %>%
+ summarize(attempts = n()) %>%
+ ungroup() %>%
+ arrange(hexbin_id, desc(attempts)) %>%
+ group_by(hexbin_id) %>%
+ filter(row_number() == 1) %>%
+ select(hexbin_id, shot_zone_range, shot_zone_area)
+
+ hexbin_stats = inner_join(hexbin_stats, hexbin_ids_to_zones, by = "hexbin_id")
+
+ # from hexbin package, see: https://github.com/edzer/hexbin
+ sx = hb@xbins / diff(hb@xbnds)
+ sy = (hb@xbins * hb@shape) / diff(hb@ybnds)
+ dx = 1 / (2 * sx)
+ dy = 1 / (2 * sqrt(3) * sy)
+ origin_coords = hexcoords(dx, dy)
+
+ hex_centers = hcell2xy(hb)
+
+ hexbin_coords = bind_rows(lapply(1:hb@ncells, function(i) {
+ data.frame(
+ x = origin_coords$x + hex_centers$x[i],
+ y = origin_coords$y + hex_centers$y[i],
+ center_x = hex_centers$x[i],
+ center_y = hex_centers$y[i],
+ hexbin_id = hb@cell[i]
+ )
+ }))
+
+ inner_join(hexbin_coords, hexbin_stats, by = "hexbin_id")
+}
+
+calculate_hexbins_from_shots = function(shots, league_averages, binwidths = c(1, 1), min_radius_factor = 0.6, fg_diff_limits = c(-0.12, 0.12), fg_pct_limits = c(0.2, 0.7), pps_limits = c(0.5, 1.5)) {
+ if (nrow(shots) == 0) {
+ return(list())
+ }
+
+ grouped_shots = group_by(shots, shot_zone_range, shot_zone_area)
+
+ zone_stats = grouped_shots %>%
+ summarize(
+ zone_attempts = n(),
+ zone_pct = mean(shot_made_numeric),
+ zone_points_scored = sum(shot_made_numeric * shot_value),
+ zone_points_per_shot = mean(shot_made_numeric * shot_value)
+ )
+
+ league_zone_stats = league_averages %>%
+ group_by(shot_zone_range, shot_zone_area) %>%
+ summarize(league_pct = sum(fgm) / sum(fga))
+
+ hex_data = calculate_hex_coords(shots, binwidths = binwidths)
+
+ join_keys = c("shot_zone_area", "shot_zone_range")
+
+ hex_data = hex_data %>%
+ inner_join(zone_stats, by = join_keys) %>%
+ inner_join(league_zone_stats, by = join_keys)
+
+ max_hex_attempts = max(hex_data$hex_attempts)
+
+ hex_data = mutate(hex_data,
+ radius_factor = min_radius_factor + (1 - min_radius_factor) * log(hex_attempts + 1) / log(max_hex_attempts + 1),
+ adj_x = center_x + radius_factor * (x - center_x),
+ adj_y = center_y + radius_factor * (y - center_y),
+ bounded_fg_diff = pmin(pmax(zone_pct - league_pct, fg_diff_limits[1]), fg_diff_limits[2]),
+ bounded_fg_pct = pmin(pmax(zone_pct, fg_pct_limits[1]), fg_pct_limits[2]),
+ bounded_points_per_shot = pmin(pmax(zone_points_per_shot, pps_limits[1]), pps_limits[2]))
+
+ list(hex_data = hex_data, fg_diff_limits = fg_diff_limits, fg_pct_limits = fg_pct_limits, pps_limits = pps_limits)
+}
+
+generate_hex_chart = function(hex_data, use_short_three = FALSE, metric = "bounded_fg_diff", alpha_range = c(0.85, 0.98)) {
+ if (use_short_three) {
+ base_court = short_three_court
+ } else {
+ base_court = court
+ }
+
+ if (length(hex_data) == 0) {
+ return(base_court)
+ }
+
+ if (metric == "bounded_fg_diff") {
+ fill_limit = hex_data$fg_diff_limits
+ fill_label = "FG% vs. League Avg"
+ label_formatter = scales::percent
+ } else if (metric == "bounded_fg_pct") {
+ fill_limit = hex_data$fg_pct_limits
+ fill_label = "FG%"
+ label_formatter = scales::percent
+ } else if (metric == "bounded_points_per_shot") {
+ fill_limit = hex_data$pps_limits
+ fill_label = "Points Per Shot"
+ label_formatter = scales::comma
+ } else {
+ stop("invalid metric")
+ }
+
+ base_court +
+ geom_polygon(data = hex_data$hex_data,
+ aes_string(x = "adj_x", y = "adj_y", group = "hexbin_id",
+ fill = metric, alpha = "hex_attempts"),
+ size = 0) +
+ scale_fill_gradientn(paste0(fill_label, " "),
+ colors = viridis_colors,
+ limit = fill_limit,
+ labels = label_formatter,
+ guide = guide_colorbar(barwidth = 15)) +
+ scale_alpha_continuous(guide = FALSE, range = alpha_range, trans = "sqrt") +
+ theme(legend.text = element_text(size = rel(0.6)))
+}
@@ -0,0 +1,14 @@
+required_packages = c(
+ "shiny",
+ "ggplot2",
+ "hexbin",
+ "dplyr",
+ "httr",
+ "jsonlite"
+)
+
+packages_to_install = required_packages[!(required_packages %in% installed.packages()[, 1])]
+
+if (length(packages_to_install) > 0) {
+ install.packages(packages_to_install, repos = "https://cran.rstudio.com")
+}
Oops, something went wrong.

0 comments on commit 252b960

Please sign in to comment.