-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-analysis.qmd
342 lines (239 loc) · 9.09 KB
/
02-analysis.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
---
title: "Analysis 2024"
---
## Goals of this notebook
::: callout-important
The most recent data for this analysis was released in May 2024. The MLS will update salaries again after the summer transfer window, usually in September.
:::
We'll explore MLS Salaries through history. We'll start with the most recent data, then look back historically. A couple of questions that come to mind:
- Which players are getting paid the most this year?
- Which teams have the highest salary bill this year?
- How do team salary rankings compare over time?
[Per the MLS Player's Association](https://mlsplayers.org/resources/salary-guide), "compensation" is: The Annual Average Guaranteed Compensation (Guaranteed Comp) number includes a player's base salary and all signing and guaranteed bonuses annualized over the term of the player's contract, including option years.
## Setup
```{r}
#| label: setup
#| echo: true
#| results: hide
#| message: false
#| warning: false
library(tidyverse)
library(janitor)
library(scales)
library(ggrepel)
library(DT)
```
## Import
Getting the cleaned data.
```{r}
salaries <- read_rds("data-processed/mls-salaries.rds")
salaries |> glimpse()
```
### Setting the most recent year of data
I'm creating an object called `recent_year` because at some point I'll have new data and might want to just change the year.
```{r}
recent_year <- "2024"
```
## Players with highest salaries
### Over all time
A searchable table of all players, all time.
```{r}
sal_high <- salaries |>
arrange(compensation |> desc()) |>
select(!c(club_long, conference))
sal_high |> datatable()
```
#### Data takeaway: Messi money
Upon his signing on July 15, 2023, Lionel Messi became the highest paid player in the history of the MLS with a total compensation of \$20.4 million. Lorenzo Insigne of Toronto was second at \$15.5 million, the only other player earning more than $10 million within a year.
### In most recent year
```{r}
sal_high_recent <- salaries |>
filter(year == recent_year) |>
mutate(rank = min_rank(desc(compensation))) |>
relocate(rank) |>
arrange(compensation |> desc()) |>
select(!c(club_long, conference))
sal_high_recent
```
Teams with more than one player from top 10
```{r}
sal_high_recent |>
filter(rank <= 10) |>
count(club_short, sort = T)
```
#### Data takeaway: Driussi 5th in early 2024
For the early 2024 release, Austin FC's Sebastián Driussi is the 5th highest player in the MLS and Houston's Hector Herrera is the 8th highest-paid player.
Both Inter Miami and Toronto FC have two top 10 earners on their rosters.
### Difference with just base pay in 2024?
This looks at just the base salary as opposed to total compensation. No great changes at the top of the list.
```{r}
sal_high_base <- salaries |>
arrange(base_salary |> desc()) |>
select(!c(club_long, conference, compensation))
sal_high_base |> filter(year == recent_year, base_salary >= 2000000)
```
## Team salaries
We'll get per-year salaries by team, then look at just this year.
### Highest team salaries over time
First get the total compensation for each team in each year.
```{r}
sal_team <- salaries |>
group_by(year, club_long) |>
summarise(total_compensation = sum(compensation)) |>
arrange(total_compensation |> desc())
# peek at the top
sal_team |> filter(total_compensation > 20000000)
```
Then find the top team for each year.
```{r}
#| message: false
top_sal_team_yr <- salaries |>
group_by(year, club_long) |>
summarise(total_compensation = sum(compensation)) |>
slice_max(total_compensation)
top_sal_team_yr
```
And note how often the teams have been the top spender.
```{r}
top_sal_team_yr |>
ungroup() |>
tabyl(club_long) |>
adorn_totals("row") |>
adorn_pct_formatting() |>
as_tibble()
```
#### Data takeaways: Toronto historically spends high
Looking at the most expensive rosters in the MLS of time, Toronto FC has six of the top 10 highest entries. Over the past 17 years, Toronto has been the top spending team seven times, or 40% of the time. The L.A. Galaxy is next with five highest-spending years.
### Highest salaries this year
And let's look at this year.
```{r}
sal_team_recent <- sal_team |> filter(year == recent_year)
# peek
sal_team_recent |> head(10)
```
Let's round the numbers for our chart.
```{r}
sal_team_recent_mil <- sal_team_recent |>
mutate(total_millions = (total_compensation / 1000000) |> round(1))
sal_team_recent_mil
```
Let's chart this
```{r}
#| label: fig-2024-team-salary
#| fig-cap: "Messi drives Miami team salary"
#| fig-alt: "Bar chart showing MLS team salaries from highest to lowest. Inter Miami and Toronto FC top the list."
sal_team_recent_mil_plot <- sal_team_recent_mil |>
filter(club_long != "Major League Soccer") |>
ggplot(mapping = aes(
x = total_millions,
y = club_long |> reorder(total_compensation)
)) +
geom_bar(stat='identity') +
geom_text(aes(label = paste("$", as.character(total_millions), sep = "")), color = "white", hjust = 1.25) +
scale_x_continuous(labels = label_dollar()) +
labs(
x = "Total team spending in $ millions",
y = "",
title = "Messi makes Miami top MLS spender in 2023",
subtitle = str_wrap("Salaries includes each player's base salary plus all signing and guaranteed bonuses annualized over the term of the player's contract, including option years."),
caption = "By: Christian McDonald. Source: Major League Soccer Players Association"
) +
theme_minimal()
ggsave("figures/team-salary-recent.png")
```
![](figures/team-salary-2024.png)
One more look to see how many high-paid players on each team.
```{r}
sal_high_recent |>
filter(compensation >= 5000000) |>
count(club_short, sort = T)
```
#### Data Takeaway: Miami, Toronto tops
Given the historic signing of Lionel Messi in 2023, it is no surprise that Inter Miami have the highest team salary for the 2024 season. Toronto ranks second on the power of having two players making over $5 million, Lorenzo Insigne and Federico Bernardeschi.
### Team spending over time
Let's look at team spending over the past five years. To do this, we have to create a ranking for the spending.
- I'm removing players not affiliated with teams
- When I added a third column to the group because I wanted to use long names for something, the ranking broke. I had to break the group then use the `.by` argument for `rank()`.
```{r}
sal_team_rank <- salaries |>
filter(club_short != "MLS" | club_short |> is.na()) |>
group_by(year, club_short, club_long) |>
summarise(
total_comp = sum(compensation, na.rm = TRUE)
) |>
arrange(year, total_comp |> desc()) |>
ungroup() |> #<1>
mutate(rank = rank(-total_comp), .by = year) #<2>
# peek
sal_team_rank |> head(20)
```
1. I break the `group_by` here.
2. Then I set the ranking to work by year.
Visualizing all of them would be tricky. Let's do the top five over last five years.
```{r}
sal_team_rank_top <- sal_team_rank |>
filter(rank <= 5,
year >= (as.numeric(recent_year) - 4))
sal_team_rank_top
```
Peek at this a different way
```{r}
sal_team_rank_top |>
select(-total_comp) |>
pivot_wider(names_from = year, values_from = rank)
```
### Let's visualze spending rank
> This has to be adjusted when new data is added.
I want to use a list of team-specific colors. I'm trying this manually first, based on the results of the chart. The colors were pulled [from here](https://teamcolorcodes.com/soccer/mls-team-color-codes/), though I wish I could instead join with a data package and pull from the team colors, but I haven't found one that is up-to-date.
```{r}
#| label: club_colors
sal_team_rank_top |> count(club_short)
club_color_list <- c(
"#80000A", # ATL
"#7CCDEF", # CHI
"#FE5000", # CIN or 003087
"#FEDD00", # CLB
"#00245D", # LA
"#C39E6D", # LAFC
"#F7B5CD", # MIA
"#CE0E2D", # NE
"#ECE83A", #NSH
# "#5D9741", # SEA
"#B81137" # TOR
# "#EF3E42" # DC
# "#00B140", # AUS
)
```
And now the chart
```{r}
#| label: chart_rank_t5
sal_team_rank_top_plot <- sal_team_rank_top |>
# filter()
ggplot(aes(x=year, y=rank, group = club_short)) +
geom_line(aes(color = club_short)) +
geom_point(aes(color = club_short, size = 3)) +
geom_label_repel(aes(label = club_short), size = 3) +
scale_y_reverse() +
scale_colour_manual(values = club_color_list) +
scale_size_continuous(guide = "none") +
labs(
title = "Miami's spending was increasing before Messi",
subtitle = str_wrap("The L.A. Galaxy are the only MLS team to rank as a top five spender in each of the past five years."),
color = "Club",
x = NULL,
y = "Spending Rank",
caption = "By: Christian McDonald. Source: Major League Soccer Players Association"
) +
theme_minimal() +
guides(color = FALSE)
# sal_team_rank_top_plot
ggsave("figures/sal_team_rank.png")
```
![](figures/sal_team_rank.png)
Let's count how many times each team is in this list.
```{r}
sal_team_rank_top |>
count(club_long, sort = T)
```
#### Data Takeaway: Miami and LA
Among the teams spending the most on their rosters over the past five years, Miami has ranked in the top five each year. The LA Galaxy and Toronto FC have been top spenders in four of the last five years.