-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Standardization before modeling #15
Comments
Use correlations and semi-correlations between outcome variable and predicators as feature importance |
In regression, it is often recommended to center the variables so that the predictors have mean 0. This makes it easier to interpret the intercept term as the expected value of 𝑌𝑖 when the predictor values are set to their means. Otherwise, the intercept is interpreted as the expected value of 𝑌𝑖 when the predictors are set to 0, which may not be a realistic or interpretable situation (e.g. what if the predictors were height and weight?). , centering/scaling does not affect your statistical inference in regression models - the estimates are adjusted appropriately and the p-values will be the same. Other situations where centering and/or scaling may be useful:
Note that scaling is not necessary in the last two bullet points I mentioned and centering may not be necessary in the first bullet I mentioned, so the two do not need to go hand and hand at all times. |
|
https://stats.stackexchange.com/questions/86434/is-standardisation-before-lasso-really-necessary
-standardizing is needed when using regularization
|
https://stats.stackexchange.com/questions/86434/is-standardisation-before-lasso-really-necessary
|
Summary:
|
|
Two seemingly conflicts: interpretability and feature importance
The text was updated successfully, but these errors were encountered: