Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning messages when full data-set is used #441

Open
ArturoAmorQ opened this issue Aug 24, 2021 · 0 comments
Open

Add warning messages when full data-set is used #441

ArturoAmorQ opened this issue Aug 24, 2021 · 0 comments
Milestone

Comments

@ArturoAmorQ
Copy link
Collaborator

ArturoAmorQ commented Aug 24, 2021

The full data-set (no train-test split or cv) is used for modeling in the following notebooks:

This has been a source of confusion (see for instance this forum question).

We should add a Warning message similar (but adapted to each case) to the one in logistic_regression_non_linear.py:

Warning: Be aware that we fit and will check the boundary decision of the classifier on the same dataset without splitting the dataset into a training set and a testing set. While this is a bad practice, we use it for the sake of simplicity to depict the model behavior. Always use cross-validation when you want to assess the generalization performance of a machine-learning model.

Additionally, a Warning message should be added in the following notebooks

where we remind the user that scoring the model in the full data-set is not necessarily wrong but provides no info about under/over-fitting.

What do you think?

@lesteve lesteve added this to the MOOC 3.0 milestone Jan 6, 2022
@lesteve lesteve modified the milestones: MOOC 3.0, MOOC 4.0 Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants