Skip to content

Maybe move to more of a statistical position predictor? #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nicidob opened this issue Mar 31, 2021 · 8 comments
Closed

Maybe move to more of a statistical position predictor? #375

nicidob opened this issue Mar 31, 2021 · 8 comments

Comments

@nicidob
Copy link
Contributor

nicidob commented Mar 31, 2021

https://github.com/nicidob/bbgm/blob/master/real-pos.ipynb

Just using the simple trick from BPM.

image

I got weights like
const -0.973880
diq 0.000882
dnk -0.010013
drb -0.002438
endu 0.003022
fg -0.000388
ft 0.001457
hgt 0.074287
ins -0.002776
jmp -0.008852
oiq 0.003409
pss -0.022231
reb 0.019224
spd 0.002773
stre 0.007971
tp -0.004918

@dumbmatter
Copy link
Member

This seems like a good idea... any downsides you can think of?

@nicidob
Copy link
Contributor Author

nicidob commented Mar 31, 2021

Well this is a linear method so it has all the issues that linearity does: sometimes a nonlinear difference in ratings causes a position shift (which is how the current position estimator basically works). Adding extra polynomial feature expansions would fix some of that.

My main issue is that I hate my training data.

  • I only have a single position per player (from 'bios), despite having years of different ratings. It'd be nice to use the basketball ref per-year position estimates
  • I don't have minutes played, and I'd love to weight this for "minutes played" not just uniform across all players
  • I wish I could adjust for "quality of player" (such as normalizing the ratings by their mean or ovr) but doing so led to worse predictions with this method (it's in the code, commented out)
  • I really think a linear 0 to 4 predictor isn't even the right formulation. I wish I had the DISTRIBUTIONS of positions played each year (again, as basketball ref has in the play-by-play section), then I'd try to match those 5 position probability distributions and then GF/F/G/FC would just be a heuristic on top of those estimates. This could even be user visible, like in basketball ref
  • All of this leads to kind of... a lack of clarity on whether it's good. You can see me testing a few players in the bottom of the code and while it works okay... I wish it was a little less sensitive to regularization and parameter settings. You can move Jordan from SG to GF to G to SF depending on the year and settings. Likewise with LeBron... anything from PG to GF to SF; sometimes he got SG and that drove me wild (although he shouldn't with the settings I posted... but linear estimates have SG between PG and SF so of course he got it sometimes)

Second issue is this is all calibrated to real player leagues and throwing in some random player data in would be good (even if it's Guard/Wing/Big going 0, 2, 4 as labels).

Third issue is that, as with any statistical thing, you'd have to recalibrate this if you change stuff but that's already the case.

Fourth issue is (and this is true of the current system too), position labels should probably be based on team composition. BPM does something like this by driving the average position of the on-court lineups to be a SF, and nudging everyone as that changes. LeBron you could imagine shifting from PG on a KCP/Kuzma/AD/Gasol lineup to even C on a Dennis/Caruso/KCP/Kuzma. It'd be neat if roster changes updated positions. But, once again, that's external to this.

@dumbmatter
Copy link
Member

Well the good thing about basketball is it doesn't matter that much if the position labels are "perfect" because they aren't "perfect" IRL either. I think the main concern is how it applies to random players... which in the first test I did resulted in 6 PGs in the whole league :(

Could probably be fixed with additional training data from random players (labeling is a concern) or by better generation of real player ratings (so the distributions are the same)... but not sure if I care enough about this specific application to actually do something about it. Like if you give me a universally better position formula on a silver platter, I will gladly use it, but beyond that I'm not sure I want to spend time working on it :)

@nicidob
Copy link
Contributor Author

nicidob commented Mar 31, 2021

I find that very surprising.. since the distributions, using those coefficients.. seem realistic
image

@dumbmatter
Copy link
Member

Commit e49b95b shows the code I used, maybe I messed something up...

@nicidob
Copy link
Contributor Author

nicidob commented Mar 31, 2021

Maybe I made a mistake somewhere with the coefficients? I actually added all the random players and was looking to give you new coefficients (and how much bETTER they were) but found the distributions were about the same. I looked at your code and nothing jumped out at me

EDIT: well this is maddening. What is wrong!?

@nicidob
Copy link
Contributor Author

nicidob commented Apr 1, 2021

Plus signs.

@dumbmatter
Copy link
Member

I swear I've done this before too... there is probably an ESLint rule that would catch things like this, but at some point it gets tiring to configure a million different finnicky ESLint rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants