[WB-7886] Add CatBoost Integration #2975

ayulockin · 2021-12-01T22:28:19Z

Description

The PR adds a simple WandbCallback for CatBoost.

The PR currently enables:

logging iteration
logging train and validation metrics

Testing

yea test and manually tested.

* improved xgboost wandb_callback * log config, better logging of metrics, typo * cleaniness is good * add feature importance plotting * best score + iteration * deprecated callback, docstring, fixes * define_metric, rename, fixes * test added * define metrics fixes * add command to yea

codecov · 2021-12-01T22:44:58Z

Codecov Report

Merging #2975 (37cc0de) into master (55b885c) will decrease coverage by 0.20%.
The diff coverage is 92.18%.

@@            Coverage Diff             @@
##           master    #2975      +/-   ##
==========================================
- Coverage   80.15%   79.95%   -0.21%     
==========================================
  Files         210      213       +3     
  Lines       27818    27872      +54     
==========================================
- Hits        22298    22285      -13     
- Misses       5520     5587      +67

Flag	Coverage Δ
functest	`56.62% <89.06%> (-0.56%)`	⬇️
unittest	`69.58% <9.37%> (-0.13%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
wandb/sdk/data_types.py	`84.34% <ø> (ø)`
wandb/integration/catboost/catboost.py	`90.56% <90.56%> (ø)`
wandb/__init__.py	`91.34% <100.00%> (-0.25%)`	⬇️
wandb/catboost/__init__.py	`100.00% <100.00%> (ø)`
wandb/integration/catboost/__init__.py	`100.00% <100.00%> (ø)`
wandb/integration/lightgbm/__init__.py	`95.45% <100.00%> (ø)`
wandb/sdk/internal/profiler.py	`95.00% <100.00%> (ø)`
wandb/sdk/wandb_artifacts.py	`81.68% <100.00%> (ø)`
wandb/util.py	`85.58% <100.00%> (-0.13%)`	⬇️
wandb/integration/metaflow/metaflow.py	`52.29% <0.00%> (-32.76%)`	⬇️
... and 8 more

morganmcg1

Look good so far, the new XGB callback code is included here too, that should be kept in a separate PR.

There is a log_function function to be added too right?

morganmcg1 · 2021-12-03T23:47:10Z

wandb/integration/catboost/catboost.py

+        train_pool = Pool(train[features], label=train['label'], cat_features=cat_features)
+        test_pool = Pool(test[features], label=test['label'], cat_features=cat_features)
+
+        model = CatBoostRegressor(iterations=100,


nit, maybe move the iterations argument to a new line

👍🏻 In general, please re-format the example code. I'd copy it into a dummy file, the run tox -e format -- dummy.py and copy back into docstrings.

wandb/integration/catboost/catboost.py

dmitryduev

Thanks @ayulockin! Mostly LGTM, a few minor comments.

Tested it out using their basic tutorial https://github.com/catboost/tutorials/blob/master/python_tutorial.ipynb -- all works well including things like logging custom metrics)
Please rm the xgboost stuff from this PR.
Also, @raubitsj, I saw you were gonna add telemetry changes to the xgboost PR, would you mind doing the same here please?

dmitryduev · 2021-12-06T22:09:19Z

wandb/integration/catboost/catboost.py

+        train_pool = Pool(train[features], label=train['label'], cat_features=cat_features)
+        test_pool = Pool(test[features], label=test['label'], cat_features=cat_features)
+
+        model = CatBoostRegressor(iterations=100,


👍🏻 In general, please re-format the example code. I'd copy it into a dummy file, the run tox -e format -- dummy.py and copy back into docstrings.

wandb/integration/catboost/catboost.py

dmitryduev · 2021-12-06T22:20:17Z

functional_tests/catboost/catboost.yea

+    - :wandb:runs[0][summary][learn-MultiClass]
+    - 0.0
+  - :wandb:runs[0][exitcode]: 0
+


functional_tests/catboost/test_catboost.py

This reverts commit dc36b2c.

This reverts commit 40bb8a7.

This reverts commit 4600ea1.

This reverts commit 5559308.

dmitryduev · 2021-12-07T20:19:47Z

@raubitsj looking at wandb/proto/wandb_telemetry.proto, seems like the stuff to enable telemetry for catboost is already there, is that right?

Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

wandb/integration/catboost/catboost.py

mypy.ini

tests/test_library_public.py

wandb/integration/catboost/catboost.py

wandb/sdk/data_types.py

wandb/integration/catboost/catboost.py

wandb/sdk/wandb_artifacts.py

wandb/integration/catboost/catboost.py

Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

wandb/integration/catboost/catboost.py

functional_tests/catboost/test_catboost.py

…nto ayut-catboost

wandb/catboost/__init__.py

raubitsj · 2022-02-17T02:22:27Z

Thanks @ayulockin! Mostly LGTM, a few minor comments.

Tested it out using their basic tutorial https://github.com/catboost/tutorials/blob/master/python_tutorial.ipynb -- all works well including things like logging custom metrics)

Please rm the xgboost stuff from this PR.

Also, @raubitsj, I saw you were gonna add telemetry changes to the xgboost PR, would you mind doing the same here please?

This is the type of changes needed for telemetry:
80afe3f

Let me know if you need help, hopefully this is a clean change to model after.

raubitsj · 2022-02-17T02:23:38Z

@raubitsj looking at wandb/proto/wandb_telemetry.proto, seems like the stuff to enable telemetry for catboost is already there, is that right?

Responded at: #2975 (comment)

dmitryduev · 2022-02-17T22:28:26Z

@raubitsj: I added feature usage tracking to telemetry + testing that with yea as you requested. Passes locally and the CI is green, so merging.
Many thanks for this once again, @ayulockin!!

ayulockin · 2022-02-18T04:09:13Z

Thanks @dmitryduev for shipping it.

ayulockin and others added 6 commits November 17, 2021 23:01

code-check

4600ea1

test working

40bb8a7

type annotation

dc36b2c

basic catboost integration

1ec2b44

catboost yea test

219b035

ayulockin requested a review from morganmcg1 December 1, 2021 22:34

dmitryduev self-requested a review December 3, 2021 18:45

morganmcg1 reviewed Dec 3, 2021

View reviewed changes

dmitryduev reviewed Dec 6, 2021

View reviewed changes

ayulockin added 4 commits December 7, 2021 17:58

Revert "type annotation"

e7c1d22

This reverts commit dc36b2c.

Revert "test working"

2fd4c90

This reverts commit 40bb8a7.

Revert "code-check"

b2fad70

This reverts commit 4600ea1.

Revert "Improved xgboost wandb_callback (#1)"

890926f

This reverts commit 5559308.

ayulockin and others added 5 commits December 8, 2021 16:16

Update functional_tests/catboost/test_catboost.py

7fe71c1

Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

log_summary, test

ff82e8e

formatting

2d92320

feature importance

06d344b

one bad white trailing space

c862c89

ayulockin marked this pull request as ready for review December 8, 2021 21:32

ayulockin and others added 4 commits December 9, 2021 19:58

minor fix

c63e950

Merge branch 'master' into ayut-catboost

81092ed

Merge branch 'master' into ayut-catboost

9a3ae4a

Update functional_tests/catboost/test_catboost.py

1a2f2db

dmitryduev reviewed Feb 16, 2022

View reviewed changes

wandb/integration/catboost/catboost.py Outdated Show resolved Hide resolved

Update wandb/integration/catboost/catboost.py

7aa5912

dmitryduev reviewed Feb 16, 2022

View reviewed changes

wandb/integration/catboost/catboost.py Outdated Show resolved Hide resolved

Update wandb/integration/catboost/catboost.py

0efe10f

Update __init__.py

0668118