FEAT - Basic GeneralizedLinearEstimatorCV #311

floriankozikowski · 2025-05-22T12:16:02Z

Context of the PR

Implements a cross-validation wrapper for GeneralizedLinearEstimator along with a standalone example script. Provides hyperparameter grids over alpha and optional l1_ratio, custom k-fold splitting, warm starts, scoring, and final full-data refit (closes Issue #308 ).

Contributions of the PR

class inherits from GeneralizedLinearEstimator, CV inspired by Celer
full pass over l1_ratio (if available) and alpha grids
warm_start: carry forward each fold’s solution (w) as the initial w_start for the next alpha
score & select: compute MSE (or user-provided scorer) per fold, track the mean loss in mse_path, and pick the (alpha_, l1_ratio_) with lowest average loss
final refit: once the best hyperparameters are found, update self.penalty in place and call super().fit(X, y) to train the returned model on the entire dataset
plot_generalized_linear_estimator.cv provides a simple example file

Checks before merging PR

added documentation for any new feature
added unit tests
edited the what's new (if applicable)

skglm/penalties/generalized_linear_cv.py

skglm/cv.py

examples/plot_generalized_linear_estimator_cv.py

mathurinm · 2025-06-16T11:52:12Z

skglm/cv.py

+            est = GeneralizedLinearEstimator(
+                datafit=self.datafit, penalty=pen, solver=self.solver
+            )
+            est.solver.warm_start = True


Using warm start here will have no impact since you just intanciated the model

skglm/cv.py

mathurinm · 2025-06-16T11:56:49Z

@floko can you provide some timings on datasets of reasonable sizes ?

We should handle warm start better, by storing K estimators (one for each fold) and using warm start on them (so that on fold k, when fitting for one value of alpha, we use warm start with the previous value of alpha on the same fold.

Right now warm start is not used since we recreate an estimator at each alpha.

mathurinm · 2025-06-16T11:59:52Z

skglm/tests/test_cv.py

+from skglm.cv import GeneralizedLinearEstimatorCV
+
+
+@pytest.mark.parametrize("seed", [0, 42])


just fix a seed equal to your favorite number, testing 2 seems overkill

mathurinm · 2025-06-16T12:03:22Z

skglm/tests/test_cv.py

+    """Test GeneralizedLinearEstimatorCV matches sklearn GridSearchCV for ElasticNet."""
+    X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=seed)
+
+    alphas = np.array([0.001, 0.01, 0.1, 1.0])


picking absolute values for alphas is dangerous ; scale them by \Vert X.T @ y \Vert_\infty / n which is the smallest alpha for which the solution is 0

…ferent datasets in unit test

Badr-MOUFAD

Thanks @floriankozikowski for the great work, here are some remarks

For the example, what do you think having something similar to
https://contrib.scikit-learn.org/skglm/auto_examples/plot_ucurve.html#sphx-glr-auto-examples-plot-ucurve-py

skglm/cv.py

…ring)

Co-authored-by: Badr MOUFAD <65614794+Badr-MOUFAD@users.noreply.github.com>

…le_huber_quick_test.py, cv_scikit.py

Badr-MOUFAD · 2025-06-18T11:52:54Z

@floriankozikowski, I'm unable to spot where _kfold_split is implemented, can you point me to where it is ?

btw, pin me when you are done with the PR, to make another review

floriankozikowski · 2025-06-18T14:34:33Z

@Badr-MOUFAD
There is no function or method named kfold_split implemented in the code. Instead, I used scikit-learn's built-in KFold and StratifiedKFold classes directly for splitting the data into folds (around line 127 to 133).
For classification problems (e.g., Logistic, QuadraticSVC), I use:
kf = StratifiedKFold(n_splits=self.cv, shuffle=True, random_state=self.random_state) split_iter = kf.split(np.arange(n_samples), y)
For regression or other problems, I use:
kf = KFold(n_splits=self.cv, shuffle=True, random_state=self.random_state) split_iter = kf.split(np.arange(n_samples))

Does this make sense to you?

Other than that, I'm done for now (except that maybe in the examples file we could consider deleting a plot). Thanks for reviewing again!

Badr-MOUFAD · 2025-06-19T12:09:27Z

Does this make sense to you?

Make sense. I was asking because it was mentioned in the PR description.

@floriankozikowski , can you pls clean up the test_cv.py (prints, ...), if it has something important for the review, maybe, put in a debug_script.py that we can delete afterward

floriankozikowski · 2025-06-20T13:10:58Z

@Badr-MOUFAD good catch, I removed it from the PR description (initially I had it like this, but then changed it at some point). Plus, I made the unit test shorter now. Lets me know if thats fine.
@mathurinm initially you added a part where you also test for alpha = 1e-4. For now I removed it as it was in the debugging sections of the script. But if you want it back let me know. When I tried it out the pytest still worked, but was much slower and I get convergence warnings.

Badr-MOUFAD

LGMT @floriankozikowski, thanks for this amazing job 💪

just minor remarks before merging

skglm/tests/test_cv.py

skglm/cv.py

floriankozikowski · 2025-06-26T09:14:29Z

@Badr-MOUFAD everything addressed and ready to merge. Thanks for the feedback!

All changes were addressed

floriankozikowski added 2 commits May 22, 2025 13:57

first try at simple CV

63ffb0f

add todo comment

6622cf0

mathurinm previously requested changes May 22, 2025

View reviewed changes

adressing comments until l1_ratio is None

ed6f5f9

This was referenced May 23, 2025

FEAT - GeneralizedLinearEstimatorCV #309

Closed

FEAT - WIP - GridSearchCV compatibility for GeneralizedLinearEstimator #307

Closed

floriankozikowski added 2 commits June 2, 2025 16:22

adress further comments from mathurin

fefc1a8

remove old file from first commit for cv in penalties

6288205

mathurinm reviewed Jun 13, 2025

View reviewed changes

skglm/cv.py Outdated Show resolved Hide resolved

mathurinm reviewed Jun 13, 2025

View reviewed changes

examples/plot_generalized_linear_estimator_cv.py Show resolved Hide resolved

floriankozikowski added 2 commits June 13, 2025 16:35

change to sklearn Kfold()

f8be785

add unit test and minor improvements in cv.py

2c7162d

mathurinm reviewed Jun 16, 2025

View reviewed changes

skglm/cv.py Show resolved Hide resolved

mathurinm requested review from Badr-MOUFAD June 16, 2025 11:57

mathurinm reviewed Jun 16, 2025

View reviewed changes

mathurinm and others added 2 commits June 16, 2025 14:08

try other setups test cv

70a07fd

attempt at fixing warm_start, adressing comments, timing measure, dif…

e761de7

…ferent datasets in unit test

Badr-MOUFAD reviewed Jun 17, 2025

View reviewed changes

skglm/cv.py Outdated Show resolved Hide resolved

skglm/cv.py Outdated Show resolved Hide resolved

skglm/cv.py Outdated Show resolved Hide resolved

Badr-MOUFAD reviewed Jun 17, 2025

View reviewed changes

skglm/cv.py Outdated Show resolved Hide resolved

floriankozikowski and others added 7 commits June 17, 2025 15:59

implement comments from Badr (docstrings, invert sequence, remove sco…

6e1ecc4

…ring)

fix linter

e8aec2a

add whats new section

a96899e

implement suggested change from Badr in skglm/cv.py

5c80870

Co-authored-by: Badr MOUFAD <65614794+Badr-MOUFAD@users.noreply.github.com>

fix naming of l1_ratio in last commit

f6b010d

fetch upstream and merge

8faacf8

Remove accidentally added files: quantile_huber_experiment.py, quanti…

299fdb0

…le_huber_quick_test.py, cv_scikit.py

add example for webpage,add cv to api

494221c

floriankozikowski marked this pull request as ready for review June 18, 2025 14:34

reduce unit test to only necessary part

bc50a49

Badr-MOUFAD approved these changes Jun 25, 2025

View reviewed changes

skglm/tests/test_cv.py Outdated Show resolved Hide resolved

skglm/cv.py Outdated Show resolved Hide resolved

skglm/cv.py Outdated Show resolved Hide resolved

floriankozikowski added 2 commits June 26, 2025 10:43

merge with current upstream main branch to update PR

fb1487c

adress final remarks

707fbb3

change to assert equal

317a071

Badr-MOUFAD merged commit 4b2344c into scikit-learn-contrib:main Jun 26, 2025
4 checks passed

floriankozikowski deleted the GLM_CV_simple branch June 26, 2025 11:22

mathurinm mentioned this pull request Jun 27, 2025

FEAT - Implement LogisticRegressionCV and ElasticNetCV? #308

Closed

floriankozikowski mentioned this pull request Jul 2, 2025

Elastic Net penalty for LogisticRegression and LogisticRegressionCV mathurinm/celer#302

Closed

		from skglm.cv import GeneralizedLinearEstimatorCV


		@pytest.mark.parametrize("seed", [0, 42])

FEAT - Basic GeneralizedLinearEstimatorCV #311

FEAT - Basic GeneralizedLinearEstimatorCV #311

Uh oh!

Conversation

floriankozikowski commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context of the PR

Contributions of the PR

Checks before merging PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathurinm Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathurinm commented Jun 16, 2025

Uh oh!

mathurinm Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

mathurinm Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Badr-MOUFAD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Badr-MOUFAD commented Jun 18, 2025

Uh oh!

floriankozikowski commented Jun 18, 2025

Uh oh!

Badr-MOUFAD commented Jun 19, 2025

Uh oh!

floriankozikowski commented Jun 20, 2025

Uh oh!

Badr-MOUFAD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

floriankozikowski commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

floriankozikowski commented May 22, 2025 •

edited

Loading