Skip to content

v1.5.0

Choose a tag to compare

@thomasmeissnercrm thomasmeissnercrm released this 30 Jul 14:24
· 265 commits to main since this release
f002de6

Release v1.5.0 is another step towards model explainability. It add two new main features:

ModelMatchMaker

So far Bluecast provided tools to measure data drift, but it did not include anything to deal with it.
This release fills this gap and adds a ModelMatchMaker. It is a simple utility that allows to store
multiple training datasets and BlueCast instances. Then users can provide a new dataset and
ModelMatchMaker returns the dataset with the least data drift in comparison. It also returns the
associated BlueCast instance. From here users could add the matching dataset to the new dataset or
use the best matching model instead of training a new one (here using a match for the unseen data).

See the docs for more information.

ErrorAnalyser

So far BlueCast provided lots of information about model already:

  • see all hyperparameter sets and their evaluation scores (optional)
  • see most important hyperparameters
  • feature importance
  • evaluation on unseen data (when using fit_eval)
  • it was also possible to plot the decision trees

However BlueCast lacked any work with out of fold datasets. With version 1.5.0 users can
change the training config and set a path to store out of fold data. ErrorAnalyser helps with evaluation
of the out of fold data. It has two core insights to offer:

  • plotting prediction error distributions for all categories or bins of numerical features for each target class
    or target bin
  • return a preprocessed DataFrame that shows the mean absolute prediction error for each sub segment of the data

See the docs for more information.

Additional changes

  • update poetry environment for developers
  • add max_bin to tuning
  • set 5 folds as new default for more robust single-model instances