Improve tutorial classification on imbalanced data #2169

lorentzenchr · 2022-12-29T12:57:53Z

This PR improves the tutorial for classification on imbalanced data, https://www.tensorflow.org/tutorials/structured_data/imbalanced_data:

Add proper scoring rules
Add choice of threshold
Remove oversampling (to be considered a bad practice)

See also https://discuss.tensorflow.org/t/improvements-to-the-tutorial-classification-on-imbalanced-data/13520.

- Add proper scoring rules - Add choice of threshold - Remove oversampling (to be considered a bad practice)

github-actions · 2022-12-29T12:58:18Z

Preview

Preview and run these notebook edits with Google Colab:

site/en/tutorials/structured_data/imbalanced_data.ipynb

Rendered notebook diffs available on ReviewNB.com.

Format and style

Use the TensorFlow docs notebook tools to format for consistent source diffs and lint for style:

$ python3 -m pip install -U --user git+https://github.com/tensorflow/docs

$ python3 -m tensorflow_docs.tools.nbfmt notebook.ipynb

$ python3 -m tensorflow_docs.tools.nblint --arg=repo:tensorflow/docs notebook.ipynb

If commits are added to the pull request, synchronize your local branch: git pull origin improve_imbalanced_classification

lorentzenchr · 2023-01-11T18:06:34Z

@8bitmp3 Is there any chance to get a first feedback?

MarkDaoust

Sorry about the delay, a lot of people were out for the Christmas holidays.

Thanks for taking the time to make the PR. Generally I support these changes, we jhust havea few little things to discuss.

Mainly: I'm not convinced that removing the resampling example is the right approach here.

Yes, on the training data resampling/reweighting almost never beat the straight classifier. on the validation set resamplng does show improvements compared to the baseline, and does much better than reweighting. Given that resampling is working better than reweighting, I'm against removing it.

Would it make sense to emphasize the cross entropy / log loss a bit more?

I don't think you should stop at CrossEntropy because because in many applications you do need to return a 0 or 1, and that has real-world value/costs and those are what you care about. I think the right thing to emphasize is the PRC curve and the relative values/costs of the different types of errors.

MarkDaoust · 2023-01-11T18:18:14Z

site/en/tutorials/structured_data/imbalanced_data.ipynb

+        "#### Metrics for probability predictions\n",
+        "\n",
+        "As we train our network with the cross entropy as a loss function, it is fully capable of predicting class probabilities, i.e. it is a probabilistic classifier.\n",
+        "Metrics that assess probabilistic predictions and that are, in fact, **proper scoring rules** are:\n",


proper scoring rules

This is the first I've seen this term, if it's worth mentioning we should give a brief description of what that means, and why it's important.

I'll try to add a single sentence. Under "Read more" is a canonical reference. On top of that, I can recommend to read https://arxiv.org/abs/0912.0902 (knowing that scoring rules and scoring functions coincide for binary classification).

site/en/tutorials/structured_data/imbalanced_data.ipynb

MarkDaoust · 2023-01-11T18:33:52Z

site/en/tutorials/structured_data/imbalanced_data.ipynb

+        "\n",
+        "#### Other metrices\n",
+        "\n",
+        "The following metrics take into account all possible choices of thresholds $t$, but they are not proper scoring rules and only assess the ranking of predictions, not their absolute values.\n",


This sentence is hard to understand without a little more context on "proper scoring rules".

Do you have a suggestion?
I added one sentence for proper scoring rules above. Then this clearly says that AUC only assesses the ranking of predictions. Otherwise stated: Best AUC does not guarantee to be close to the true probabilities.

Good call introducing proper scoring earlier.

only assess the ranking of predictions, not their absolute values.

I don't understand this. Which ranking & values are we talking about?

Best AUC does not guarantee to be close to the true probabilities.

Right, but if all you want is a deterministic classifier, we don't care about the true probabilities.

site/en/tutorials/structured_data/imbalanced_data.ipynb

lorentzenchr · 2023-01-14T10:47:29Z

@MarkDaoust Thanks for looking into this PR and your feedback.

Probabilistic classifier

Would it make sense to emphasize the cross entropy / log loss a bit more?

I don't think you should stop at CrossEntropy because because in many applications you do need to return a 0 or 1, and that has real-world value/costs and those are what you care about. I think the right thing to emphasize is the PRC curve and the relative values/costs of the different types of errors.

I would divide it into 2 steps. The first is modelling: Find a good probabilistic classifier. The statistical forecast literature clearly states that this is to be preferred over deterministic ones. The second step is then to make a decision, i.e. predict 0 or 1, given the predicted class probability. Note that given a good probabilistic classifier, there does not exist a (systematically/in expectation) better decision than the one based on it.

Without knowing the true cost (or cost ratio), the best one can do is - like in this tutorial - to demonstrate different thresholds and plot ROC curves.

In this regard, I noticed that the EarlyStopping is using the PRC-AUC ('val_prc'). I think this is not the best choice and indeed, setting it to 'val_loss' gives better results in the end (with setting a lower patience=5 instead of 10 to prevent overfittng). I would like to chance this, too.

I also think that the differences in the final results, in particular for the over-sampling case, are due to estimation uncertainty, i.e. due to chance and not systematic (confidence intervals would proof it).

lorentzenchr · 2023-02-24T09:15:15Z

@MarkDaoust Any change to get this merged?

MarkDaoust · 2023-02-24T13:21:41Z

Thanks for the ping, I give it a final look and try to get it merged.

site/en/tutorials/structured_data/imbalanced_data.ipynb

MarkDaoust · 2023-02-24T13:34:39Z

site/en/tutorials/structured_data/imbalanced_data.ipynb

+        "\n",
+        "#### Other metrices\n",
+        "\n",
+        "The following metrics take into account all possible choices of thresholds $t$, but they are not proper scoring rules and only assess the ranking of predictions, not their absolute values.\n",


Good call introducing proper scoring earlier.

only assess the ranking of predictions, not their absolute values.

I don't understand this. Which ranking & values are we talking about?

Best AUC does not guarantee to be close to the true probabilities.

Right, but if all you want is a deterministic classifier, we don't care about the true probabilities.

site/en/tutorials/structured_data/imbalanced_data.ipynb

lorentzenchr · 2023-02-24T14:05:13Z

IMHO, 885aead drops an important piece of information: "AUC and AUPRC only assess the ranking of predictions, not their absolute values", i.e. they are insensitive to (bad) calibration. That's a real deficiency of those metrics.

MarkDaoust · 2023-02-24T14:20:01Z

IMHO, 885aead drops an important piece of information: "they only assess the ranking of predictions, not their absolute values", i.e. they are insensitive to (bad) calibration. That's a real deficiency of those metrics.

Thanks for the feedback. Could you help clarify? Can you give a little more detail here? What do you mean concretely?

"they only assess the ranking of predictions, not their absolute values"

I'm stumbling here on the fact that an AUPRC of 1.0 includes a perfect deterministic classifier, and a random classifier would give .. 0.5? 0.0? Those seem like absolute reference points to me.

insensitive to (bad) calibration

I'm still lost here, can you give an example?

lorentzenchr · 2023-02-24T14:45:05Z

Let's concentrate on AUC: If you add a constant (or multiply by a positive constant) to the probability prediction of a model, AUC does not change. More visual, AUC does tell nothing about a reliability diagram which assesses (auto-) calibration.
References out of my head (you'll notice fast why they are in my head):

Gini Index and Friends https://dx.doi.org/10.2139/ssrn.4248143
Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice https://arxiv.org/abs/2202.12780

site/en/tutorials/structured_data/imbalanced_data.ipynb

Improve tutorial classification on imbalanced data

50ff59b

- Add proper scoring rules - Add choice of threshold - Remove oversampling (to be considered a bad practice)

lorentzenchr requested review from MarkDaoust and 8bitmp3 as code owners December 29, 2022 12:57

nbfmt

d339214

8bitmp3 self-assigned this Dec 29, 2022

8bitmp3 added the review in progress Someone is actively reviewing this PR label Dec 29, 2022

MarkDaoust requested changes Jan 11, 2023

View reviewed changes

lorentzenchr and others added 3 commits January 13, 2023 22:17

DOC explain proper scoring rule in one sentence.

98a6190

DOC replace AUPCR by PRC

0310825

nbfmt

4a7c008

8bitmp3 assigned MarkDaoust Jan 14, 2023

lorentzenchr added 2 commits January 14, 2023 12:11

DOC add legend to plot_loss

bc83c24

DOC restate oversampling part

c6d2ab3

MarkDaoust previously approved these changes Feb 24, 2023

View reviewed changes

github-actions bot added the lgtm Community-added approval label Feb 24, 2023

Drop ref to stratified splitting.

abb712d

MarkDaoust dismissed their stale review via abb712d February 24, 2023 13:57

Use mse instead of Brier score.

6ec2f74

MarkDaoust reviewed Feb 24, 2023

View reviewed changes

site/en/tutorials/structured_data/imbalanced_data.ipynb Outdated Show resolved Hide resolved

MarkDaoust added 2 commits February 24, 2023 05:59

Drop ref to proper scoring in threshold description.

885aead

Use mse instead of "brier"

0ec0ea0

MarkDaoust previously approved these changes Feb 24, 2023

View reviewed changes

MarkDaoust added ready to pull Start merge process and removed review in progress Someone is actively reviewing this PR labels Feb 24, 2023

markmcd reviewed Mar 7, 2023

View reviewed changes

site/en/tutorials/structured_data/imbalanced_data.ipynb Outdated Show resolved Hide resolved

MarkDaoust dismissed their stale review via f58ac1e March 7, 2023 12:23

Fix link

f58ac1e

MarkDaoust requested a review from a team as a code owner March 7, 2023 12:23

MarkDaoust approved these changes Mar 7, 2023

View reviewed changes

MarkDaoust added ready to pull Start merge process and removed ready to pull Start merge process labels Mar 7, 2023

copybara-service bot merged commit 775470f into tensorflow:master Mar 9, 2023

lorentzenchr deleted the improve_imbalanced_classification branch March 12, 2023 10:24

Improve tutorial classification on imbalanced data #2169

Improve tutorial classification on imbalanced data #2169

Uh oh!

Conversation

lorentzenchr commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 29, 2022

Preview

Format and style

Uh oh!

lorentzenchr commented Jan 11, 2023

Uh oh!

MarkDaoust left a comment

Choose a reason for hiding this comment

Uh oh!

MarkDaoust Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MarkDaoust Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

MarkDaoust Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jan 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Probabilistic classifier

Uh oh!

lorentzenchr commented Feb 24, 2023

Uh oh!

MarkDaoust commented Feb 24, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarkDaoust Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarkDaoust commented Feb 24, 2023

Uh oh!

lorentzenchr commented Feb 24, 2023

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Dec 29, 2022 •

edited

Loading

lorentzenchr commented Jan 14, 2023 •

edited

Loading

lorentzenchr commented Feb 24, 2023 •

edited

Loading