[ENH] Improve test framework for v1 models #1908

phoeenniixx · 2025-07-01T18:48:57Z

Reference Issues/PRs

Fixes #1906

What does this implement/fix? Explain your changes.

The PR makes the changes as suggested in #1906:

Add tag "info:compatible_loss" to model_pkg classes to keep a track of losses a model is compatible with.
Add "loss classes" :
- "distr" - Distribution Losses like NormalDistributionLoss
- "quantile" - QuantileLoss
- "point" - Point Prediction Losses like MAE
Tests that loop over models and each loss that model is compatible with and test against common vignettes
Tests for MultiLoss containing only losses compatible with a given model.
Tests for output format contracts
Adds the remaining Models to the test framework

phoeenniixx · 2025-07-01T18:52:16Z

For now only DeepAR tests are added (not the MultiLoss tests). Will add rest of the models in subsequent commits

phoeenniixx · 2025-07-01T18:55:18Z

pytorch_forecasting/tests/test_all_estimators.py

+
+                        all_train_kwargs.append(final_params)
+                        train_kwargs_names.append(f"base_params_{i}_{loss_name}")
+        else:


This if-else block is just added so that the model_pkg classes that still not have the tag info:compatible_loss can also pass the tests, I will remove it once all the model_pkg classes have this tag

codecov · 2025-07-01T19:02:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@18f5f95). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1908   +/-   ##
=======================================
  Coverage        ?   86.61%           
=======================================
  Files           ?       97           
  Lines           ?     7865           
  Branches        ?        0           
=======================================
  Hits            ?     6812           
  Misses          ?     1053           
  Partials        ?        0

Flag	Coverage Δ
cpu	`86.61% <100.00%> (?)`
pytest	`86.61% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fkiraly · 2025-07-02T16:30:56Z

pytorch_forecasting/tests/test_all_estimators.py

+                        final_params = deepcopy(base_params)
+                        for key, value in loss_params.items():
+                            if (
+                                isinstance(value, dict)


5 or more tabs is a code smell - can we avoid this?

sure, I moved the code to different functions

phoeenniixx · 2025-07-02T20:26:05Z

This is strange, the tests are passing for macOS and ubuntu and failing for windows?
the error:

self = LogNormal(), value = tensor([[1., 1., 1.],
        [1., 1., 0.]])

    def _validate_sample(self, value: Tensor) -> None:
        """
        Argument validation for distribution methods such as `log_prob`,
        `cdf` and `icdf`. The rightmost dimensions of a value to be
        scored via these methods must agree with the distribution's batch
        and event shapes.
    
        Args:
            value (Tensor): the tensor whose log probability is to be
                computed by the `log_prob` method.
        Raises
            ValueError: when the rightmost dimensions of `value` do not match the
                distribution's batch and event shapes.
        """
        if not isinstance(value, torch.Tensor):
            raise ValueError("The value argument to log_prob must be a Tensor")
    
        event_dim_start = len(value.size()) - len(self._event_shape)
        if value.size()[event_dim_start:] != self._event_shape:
            raise ValueError(
                f"The right-most size of value must match event_shape: {value.size()} vs {self._event_shape}."
            )
    
        actual_shape = value.size()
        expected_shape = self._batch_shape + self._event_shape
        for i, j in zip(reversed(actual_shape), reversed(expected_shape)):
            if i != 1 and j != 1 and i != j:
                raise ValueError(
                    f"Value is not broadcastable with batch_shape+event_shape: {actual_shape} vs {expected_shape}."
                )
        try:
            support = self.support
        except NotImplementedError:
            warnings.warn(
                f"{self.__class__} does not define `support` to enable "
                + "sample validation. Please initialize the distribution with "
                + "`validate_args=False` to turn off validation."
            )
            return
        assert support is not None
        valid = support.check(value)
        if not torch._is_all_true(valid):
>           raise ValueError(
                "Expected value argument "
                f"({type(value).__name__} of shape {tuple(value.shape)}) "
                f"to be within the support ({repr(support)}) "
                f"of the distribution {repr(self)}, "
                f"but found invalid values:\n{value}"
            )
E           ValueError: Expected value argument (Tensor of shape (2, 3)) to be within the support (GreaterThan(lower_bound=0.0)) of the distribution LogNormal(), but found invalid values:
E           tensor([[1., 1., 1.],
E                   [1., 1., 0.]])

C:\hostedtoolcache\windows\Python\3.12.10\x64\Lib\site-packages\torch\distributions\distribution.py:317: ValueError

When I run locally this doesnot happen. Although I am on ubuntu so I cannot say much, maybe someone with windows test what the issue is?

fkiraly · 2025-07-02T21:02:45Z

Is it always the same test, test_integration[DecoderMLP-base_params-1-LogNormalDistributionLoss]?

I am suspecting there is something with numerical accuracy happening, e.g., values that are close to zero but not quite zero.

phoeenniixx · 2025-07-02T21:10:30Z

Is it always the same test, test_integration[DecoderMLP-base_params-1-LogNormalDistributionLoss]?

Yes

I am suspecting there is something with numerical accuracy happening, e.g., values that are close to zero but not quite zero.

I'll then shift to windows and will see locally what exactly is happening

fkiraly · 2025-07-03T06:39:55Z

one trick is also to introduce prints or artificial raises in the code that trigger only when you expect the error - although in this specific case I fear that is not going to be easy, so trying to reproduce locally might be your best bet.

…sting into tests

phoeenniixx · 2025-07-03T13:40:58Z

From what I could see in dwc["target"] , there are values like 0.0010 that might be causing the issue?
See here the whole dwc["target"]
Whole dwc here

fkiraly · 2025-07-03T15:58:59Z

Strange.

can you explain which object at which stage of the test this corresponds to?
do you have ea hypothesis how the 0.010 values cause an issue?
do the 0.010 values occur only on windows? If yes, why?
if no, why are they a problem only on windows?

phoeenniixx · 2025-07-06T20:07:23Z

Hi, so dwc is dataframe output of data_with_covariates() called inside _get_test_dataloaders_from of DecoderMLP_pkg. At first, I thought the issue is dwc["target"] has some values like 0.0010 and this is being taken as 0 somewhere. But I didn't find any concrete evidence regarding this.

Then maybe the issue is different float types as you said on discord, but as we know ( and I checked) dwc["target"] is numpy.float64 in both ubuntu and windows. So this was also a dead end.

Then I found that in LogNormal of torch.distributions, scale has the bound to be positive and its the same bound that is failing. So, I thought maybe this could be the issue? but the rescale_parameters of LogNormalDistributionLoss [source] already adds F.softplus, but still to be sure, I added F.softplus in map_x_to_distribution (see here) to the scale:

def map_x_to_distribution(self, x: torch.Tensor) -> distributions.LogNormal:
        scale=F.softplus(x[..., 1])
        return self.distribution_class(loc=x[..., 0], scale=scale)

And obviously this also didn't work.

After sometime, I found that somehow y_actual (from here) is getting value 0.0 in tensors. This is because y in training_step is getting this value 0 from the batch. So what my current hypothesis is that somehow the transformation to tensors of y is the actual issue. I still need some time to diagnose.

From this exprience I learned one thing atleast, that debugging should be "targetted" and not just random guesses, I think my random gusses also have made this process so complex.

Edit: just to make it clear and add more context how the y is having 0.
this is a value I copied from debugger for y from training_step:

y = (tensor([[1., 1., 1.],
        [1., 0., 0.]]), None)

and in some cases (when I would rerun the debugger), the value is something like:

y = (tensor([[1., 1., 1.],
        [1., 1., 0.]]), None)

or

y = (tensor([[1., 1., 1.],
        [1., 1., 1.]]), None)

In the second case, (when there is no 0), in some other batch y would get 0, so eventually it will fail again.
The one thing I noticed though is 0 appears only in the second list, and first list of tensors will always be 1.

phoeenniixx · 2025-07-06T20:15:00Z

Then maybe the issue is different float types as you said on discord, but as we know ( and I checked) dwc["target"] is numpy.float64 in both ubuntu and windows

To see if the float types Changes after transformation to tensors, I first need to find the place where this transformation is being done and there I can add assert and see the float types. But I would need some time as where exactly this transformation happens and how the data moves across TimeSeriesDataset is still a mystery to me in many ways :)

fkiraly · 2025-07-07T13:49:22Z

to quickly summarize our discussion in the tech meet today: the LogNormal issue looks like a cursed failure that might be hard to diagnose (and possibly even a problem in torch itself)

It is also proving that the test framework works well! Since it seems to be a genuine issue, rather than something to do with the framework - and the framework is intended to increase coverage so these combination cases get picked up.

The approach I would recommend is to park detected failures that are not immediately clear how to fix, and move them to issues - after all, the failures were there before, just not detected. Add some skipping mechanism to skip precisely the failure, open an issue with a description, and then we can deal with it later - and turn the test back on in the same PR as the fix (this pointer should be there in the issue)

This approach is a corollary of "test driven development" where you start with tests that fail, and then gradually change your code to make the tests pass - in separate PR and step-wise. If you write new code, then all tests fail initially; but we are improving coverage and consistency of an existing code base, so only some tests will fail in this case.

phoeenniixx and others added 2 commits July 2, 2025 00:07

add loss classes

eaeade7

Merge branch 'main' into tests

7d7fb9c

phoeenniixx commented Jul 1, 2025

View reviewed changes

phoeenniixx moved this to PR in progress in May - Sep 2025 mentee projects Jul 2, 2025

phoeenniixx added this to May - Sep 2025 mentee projects Jul 2, 2025

fkiraly assigned phoeenniixx Jul 2, 2025

fkiraly reviewed Jul 2, 2025

View reviewed changes

add decodermlp

72a9dad

phoeenniixx mentioned this pull request Jul 2, 2025

[ENH] Improve the test framework of model testing in v1 and make it more robust #1906

Open

Merge branch 'main' into tests

0b8f1ee

phoeenniixx and others added 5 commits July 3, 2025 13:12

try a debug for LogNormalDistributionLoss

1527e10

linting

5a85271

revert

fcdbaa6

Merge branch 'main' into tests

ddc8b67

Merge branch 'tests' of https://github.com/phoeenniixx/pytorch-foreca…

8e80fe6

…sting into tests

phoeenniixx added 2 commits July 7, 2025 11:16

change the pred_len and encoder_len

0d3f124

Update _loss_mapping.py

d5682eb

Merge branch 'main' into tests

fc680be

PranavBhatP mentioned this pull request Jul 14, 2025

[ENH] Move BaseFixtureGenerator to pytorch-forecasting/tests/_base #1919

Draft

3 tasks

Merge branch 'main' into tests

a22a0b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Improve test framework for v1 models #1908

[ENH] Improve test framework for v1 models #1908

phoeenniixx commented Jul 1, 2025 •

edited

Loading

Uh oh!

phoeenniixx commented Jul 1, 2025

Uh oh!

phoeenniixx Jul 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

fkiraly Jul 2, 2025

Uh oh!

phoeenniixx Jul 2, 2025 •

edited

Loading

Uh oh!

phoeenniixx commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 2, 2025 •

edited

Loading

Uh oh!

phoeenniixx commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 3, 2025

Uh oh!

phoeenniixx commented Jul 3, 2025 •

edited

Loading

Uh oh!

fkiraly commented Jul 3, 2025

Uh oh!

phoeenniixx commented Jul 6, 2025 •

edited

Loading

Uh oh!

phoeenniixx commented Jul 6, 2025

Uh oh!

fkiraly commented Jul 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

[ENH] Improve test framework for v1 models #1908

Are you sure you want to change the base?

[ENH] Improve test framework for v1 models #1908

Conversation

phoeenniixx commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

phoeenniixx commented Jul 1, 2025

Uh oh!

phoeenniixx Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fkiraly Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

phoeenniixx Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 3, 2025

Uh oh!

phoeenniixx commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jul 3, 2025

Uh oh!

phoeenniixx commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Jul 6, 2025

Uh oh!

fkiraly commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

phoeenniixx commented Jul 1, 2025 •

edited

Loading

phoeenniixx Jul 1, 2025 •

edited

Loading

codecov bot commented Jul 1, 2025 •

edited

Loading

phoeenniixx Jul 2, 2025 •

edited

Loading

fkiraly commented Jul 2, 2025 •

edited

Loading

phoeenniixx commented Jul 3, 2025 •

edited

Loading

phoeenniixx commented Jul 6, 2025 •

edited

Loading

fkiraly commented Jul 7, 2025 •

edited

Loading