[TO_REVIEW] Add automatic target label masking to prevent data leakage #330

YanisLalou · 2025-06-25T08:36:45Z

This PR introduces a mechanism to automatically mask target labels in unsupervised domain adaptation settings. This feature prevent data leakage from the target domain during the fit process of the estimators.

Key Changes:

Automatic Label Masking: A new _auto_mask_target_labels method has been added to automatically replace target labels with a default masked value before they are passed to the estimators. This is enabled by default to ensure that no data leakage can occur.
Control via mask_target_labels parameter: The masking behavior can be controlled with the mask_target_labels parameter in make_da_pipeline and the selectors (Shared, PerDomain, etc.).

…levant

…a da_pipeline with SelectSourceTarget

codecov · 2025-06-25T12:20:28Z

Codecov Report

Attention: Patch coverage is 98.21429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.77%. Comparing base (7880eb1) to head (91a8925).

❗ There is a different number of reports uploaded between BASE (7880eb1) and HEAD (91a8925). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (7880eb1) HEAD (91a8925)

2 1

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
- Coverage   96.41%   88.77%   -7.65%     
==========================================
  Files          63       63              
  Lines        6919     7020     +101     
==========================================
- Hits         6671     6232     -439     
- Misses        248      788     +540

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tgnassou · 2025-06-29T17:47:46Z

examples/plot_how_to_use_skada.py

@@ -263,6 +263,7 @@
    PCA(n_components=2),
    SelectSource(SVC()),
    default_selector=SelectSourceTarget,
+    mask_target_labels=False,


Why is it false here?

I don't get why the use of SelectSourceTarget?

when you do that you have one PCA for source anc one for target but SVC is traine donly on source

So we should be able to mask target label with SelectSourceTarget no ? We don't want data leakage even if we have one PCA for source and one for target ?

tgnassou · 2025-06-29T17:50:39Z

skada/_utils.py

-        unmasked_idx = y != _DEFAULT_MASKED_TARGET_CLASSIFICATION_LABEL
-    elif y_type == Y_Type.CONTINUOUS:
-        unmasked_idx = np.isfinite(y)
+    if "sample_domain" in params:


With these two lines, we avoid semi-supervised DA. I think it's a residue of before no?

tgnassou · 2025-06-29T17:53:23Z

skada/tests/test_reweight.py

@@ -307,7 +308,11 @@ def predict(self, X, sample_weight=None):
            assert sample_weight is None
            return X

-    clf = make_da_pipeline(DensityReweightAdapter(), mediator, FakeEstimator())
+    clf = make_da_pipeline(
+        Shared(DensityReweightAdapter(), mask_target_labels=False),


Why is it false here ?

If we mask the target label it breaks when we use SelectTarget for the standard scaler. That means that with the selector SelectTarget, the source domain is not propagate in the pipeline :/

Something to fix in an other issue I think

YanisLalou and others added 8 commits June 25, 2025 10:30

Add _auto_mask_target_labels to prevent data leakage

7e4de8f

Merge branch 'main' into auto_mask_target_labels

a1adf4e

remove mask_target_labels attribute to SelectSourceTarget, seems irre…

9b5c4f0

…levant

Disable automatic target label masking for supervised selectors

b126315

Merge branch 'main' into auto_mask_target_labels

9fdae8b

Disable masking for SelectSourceTarget

d202b2a

Fix doc with the new mask_target_labels attribute when instantiating …

2a8302f

…a da_pipeline with SelectSourceTarget

Merge branch 'main' into auto_mask_target_labels

a5f3f6a

YanisLalou changed the title ~~[WIP] Add automatic target label masking to prevent data leakage~~ [TO_REVIEW] Add automatic target label masking to prevent data leakage Jun 25, 2025

rflamary and others added 3 commits June 25, 2025 17:08

Merge branch 'main' into auto_mask_target_labels

7317915

Merge branch 'main' into auto_mask_target_labels

35f0057

Merge branch 'main' into auto_mask_target_labels

38716df

tgnassou reviewed Jun 29, 2025

View reviewed changes

tgnassou and others added 2 commits July 10, 2025 16:43

rm useless line

798873e

Merge branch 'main' into auto_mask_target_labels

91a8925

github-actions bot added Examples shallow tests-shallow base labels Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TO_REVIEW] Add automatic target label masking to prevent data leakage #330

[TO_REVIEW] Add automatic target label masking to prevent data leakage #330

Uh oh!

YanisLalou commented Jun 25, 2025

Uh oh!

codecov bot commented Jun 25, 2025 •

edited

Loading

Uh oh!

tgnassou Jun 29, 2025

Uh oh!

tgnassou Jul 10, 2025

Uh oh!

rflamary Jul 10, 2025

Uh oh!

tgnassou Jul 10, 2025

Uh oh!

tgnassou Jun 29, 2025

Uh oh!

tgnassou Jun 29, 2025

Uh oh!

tgnassou Jul 10, 2025

Uh oh!

tgnassou Jul 10, 2025

Uh oh!

Uh oh!

[TO_REVIEW] Add automatic target label masking to prevent data leakage #330

Are you sure you want to change the base?

[TO_REVIEW] Add automatic target label masking to prevent data leakage #330

Uh oh!

Conversation

YanisLalou commented Jun 25, 2025

Uh oh!

codecov bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jun 25, 2025 •

edited

Loading