Skip to content

Conversation

@sisp
Copy link
Contributor

@sisp sisp commented May 27, 2023

I've added support for logging metrics for multiple data sets/splits (e.g. train and test/eval) with XGBoost. This is consistent with other supported frameworks, e.g. Keras. It's also a common use case to compare the metrics on multiple data sets/splits, e.g. to check overfitting.

This addition is backwards compatible.

I'll create a PR to the docs project once this addition has been approved.

@codecov-commenter
Copy link

codecov-commenter commented May 28, 2023

Codecov Report

Patch coverage: 96.00% and project coverage change: +0.05 🎉

Comparison is base (ce68fe1) 89.65% compared to head (85cdbeb) 89.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
+ Coverage   89.65%   89.71%   +0.05%     
==========================================
  Files          43       43              
  Lines        2938     2955      +17     
  Branches      242      245       +3     
==========================================
+ Hits         2634     2651      +17     
  Misses        264      264              
  Partials       40       40              
Impacted Files Coverage Δ
src/dvclive/xgb.py 95.83% <87.50%> (+0.83%) ⬆️
tests/test_frameworks/test_xgboost.py 96.29% <100.00%> (+1.17%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sisp
Copy link
Contributor Author

sisp commented May 28, 2023

Or should we remove the metric_data argument and create a subdirectory for each evaluation set? It seems the other framework integrations work like that. It would be a breaking change though.

Alternatively, metric_data could also accept None and default to None in which case metrics for all evaluation sets would be logged. That would be backwards compatible if we kept omitting the subdirectory for the evaluation set when only one set is used. It seems other framework integrations always create this subdirectory though, so the XGBoost integration is the exception.

@daavoo daavoo self-requested a review May 29, 2023 08:01
@daavoo daavoo added A: frameworks Area: ML Framework integration feature labels May 29, 2023
Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sisp thanks for the contribution!

I took a look and I think what makes sense is to just go ahead and drop metric_data , as you suggested in #587 (comment)

@sisp
Copy link
Contributor Author

sisp commented May 29, 2023

@daavoo Shouldn't we provide at least a grace period until the breaking change takes effect, still violating SemVer but only after, e.g., 3 minor releases? We could allow None (i.e. Optional[str] and write metrics logs to those inferred subdirectories), default to None, and show a deprecation warning to notify users and give them time to migrate.

@daavoo
Copy link
Contributor

daavoo commented May 29, 2023

We could allow None (i.e. Optional[str] and write metrics logs to those inferred subdirectories), default to None, and show a deprecation warning to notify users and give them time to migrate.

Yes, sorry for the lack of clarity, I meant the option in your last paragraph

@sisp
Copy link
Contributor Author

sisp commented May 29, 2023

Great! I'll update the PR tomorrow.

@sisp sisp changed the title xgb: add support for multiple metrics data sets xgb: infer metric data names from evals and deprecate metric_data May 30, 2023
@sisp
Copy link
Contributor Author

sisp commented May 30, 2023

@daavoo I've updated the PR as we discussed and resolved merge conflicts with main. I've also submitted a PR to the dvc.org project to first document the metric_data parameter and also add a notice regarding its deprecation: treeverse/dvc.org#4578

@sisp sisp requested a review from daavoo May 30, 2023 09:37
sisp and others added 2 commits May 30, 2023 11:48
Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com>
Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing and addressing all comments!

@daavoo daavoo merged commit 9088996 into treeverse:main May 30, 2023
@sisp sisp deleted the feat/xgb-multiple-metrics-data branch May 30, 2023 10:01
@sisp
Copy link
Contributor Author

sisp commented May 30, 2023

You're welcome! Thanks for the thorough review and your responsiveness! 🙏 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: frameworks Area: ML Framework integration feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants