Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs, including PBCDataloader incorrect preprocessing #152

Merged
merged 4 commits into from
Mar 16, 2023

Conversation

DrShushen
Copy link
Member

Description

Fix three bugs:

  1. 1e3d6b4 fixes the "did not converge" error in tests involving XGBSurvivalAnalysis/XGBTimeSeriesSurvival. The tests that would previously (at least in some cases) fail include tests/metrics/test_performance.py::test_evaluate_performance_survival_analysis[PerformanceEvaluatorXGB-test_plugin0]. The fix was to change the strategy from "weibull" to the more stable "debiased_bce".
  2. f309351 fixes the failing check in WindowLinearLayer.forward, which would fail when a length zero tensor was provided (i.e. no static data case). The conditional only needs to run if static data is expected, so an extra check added to fix this. The tests that would previously (at least in some cases) fail include: tests/plugins/core/models/test_ts_tabular_vae.py::test_ts_vae_generation[GoogleStocksDataloader].
  3. 1e3d6b4 fixes incorrect preprocessing in PBCDataloader.
    • Steps to replicate:
      • If you add this sanity check, which checks that the static features extracted from the source data are indeed unchanging (over time steps) for each patient...
      • ...You find that this fails in the original implementation (run from synthcity.utils.datasets.time_series.pbc import PBCDataloader; PBCDataloader(as_numpy=True).load()).
    • Why is this happening? The dataframes used in preprocessing steps mix up the indexes of the samples at the sorting step, and at a later step attempt to reference the old indexes - which leads to mixing up different patients' data together. This is now fixed by resetting the index at the sorting step.

Affected Dependencies

None.

How has this been tested?

  • No new tests are required.

Checklist

@DrShushen DrShushen added the bug Something isn't working label Mar 15, 2023
@DrShushen DrShushen self-assigned this Mar 15, 2023
@DrShushen
Copy link
Member Author

@bcebere for 3. above, are there any performance tests that we expect to be affected? Since PBCDataloader now essentially returns different data.

@DrShushen DrShushen changed the title Dr shushen/fix pbc dataloader preprocessing Fix bugs, including PBCDataloader incorrect preprocessing Mar 15, 2023
@DrShushen
Copy link
Member Author

DrShushen commented Mar 15, 2023

@bcebere
Getting this following error in GH actions but not locally
https://github.com/vanderschaarlab/synthcity/actions/runs/4430650800/jobs/7772687428

2023-03-15T21:05:55.6778011Z FAILED tests/plugins/privacy/test_decaf.py::test_plugin_generate[test_plugin0] - RuntimeError: Training with multiple optimizers is only supported with manual optimization. Set `self.automatic_optimization = False`, then access your optimizers in `training_step` with `opt1, opt2, ... = self.optimizers()`.

Looks like possibly a pytorch-lightning v2 issue.

@DrShushen DrShushen merged commit 33a0556 into main Mar 16, 2023
@DrShushen DrShushen deleted the DrShushen/fix-PBCDataloader-preprocessing branch March 16, 2023 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants