-
Notifications
You must be signed in to change notification settings - Fork 37
FEAT - Basic GeneralizedLinearEstimatorCV #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT - Basic GeneralizedLinearEstimatorCV #311
Conversation
skglm/cv.py
Outdated
est = GeneralizedLinearEstimator( | ||
datafit=self.datafit, penalty=pen, solver=self.solver | ||
) | ||
est.solver.warm_start = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using warm start here will have no impact since you just intanciated the model
@floko can you provide some timings on datasets of reasonable sizes ? We should handle warm start better, by storing Right now warm start is not used since we recreate an estimator at each alpha. |
skglm/tests/test_cv.py
Outdated
from skglm.cv import GeneralizedLinearEstimatorCV | ||
|
||
|
||
@pytest.mark.parametrize("seed", [0, 42]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just fix a seed equal to your favorite number, testing 2 seems overkill
skglm/tests/test_cv.py
Outdated
"""Test GeneralizedLinearEstimatorCV matches sklearn GridSearchCV for ElasticNet.""" | ||
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=seed) | ||
|
||
alphas = np.array([0.001, 0.01, 0.1, 1.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picking absolute values for alphas is dangerous ; scale them by \Vert X.T @ y \Vert_\infty / n
which is the smallest alpha for which the solution is 0
…ferent datasets in unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @floriankozikowski for the great work, here are some remarks
For the example, what do you think having something similar to
https://contrib.scikit-learn.org/skglm/auto_examples/plot_ucurve.html#sphx-glr-auto-examples-plot-ucurve-py
Co-authored-by: Badr MOUFAD <65614794+Badr-MOUFAD@users.noreply.github.com>
…le_huber_quick_test.py, cv_scikit.py
@floriankozikowski, I'm unable to spot where btw, pin me when you are done with the PR, to make another review |
@Badr-MOUFAD Does this make sense to you? Other than that, I'm done for now (except that maybe in the examples file we could consider deleting a plot). Thanks for reviewing again! |
Make sense. I was asking because it was mentioned in the PR description. @floriankozikowski , can you pls clean up the |
@Badr-MOUFAD good catch, I removed it from the PR description (initially I had it like this, but then changed it at some point). Plus, I made the unit test shorter now. Lets me know if thats fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT @floriankozikowski, thanks for this amazing job 💪
just minor remarks before merging
@Badr-MOUFAD everything addressed and ready to merge. Thanks for the feedback! |
Context of the PR
Implements a cross-validation wrapper for
GeneralizedLinearEstimator
along with a standalone example script. Provides hyperparameter grids overalpha
and optionall1_ratio
, custom k-fold splitting, warm starts, scoring, and final full-data refit (closes Issue #308 ).Contributions of the PR
Checks before merging PR