Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twister: testplan: toolchain_exclude filter is overridden by integration_platforms #49696

Closed
stephanosio opened this issue Aug 30, 2022 · 4 comments
Assignees
Labels
area: Twister Twister bug The issue is a bug, or the PR is fixing a bug

Comments

@stephanosio
Copy link
Member

stephanosio commented Aug 30, 2022

Describe the bug

When both integration_platforms and toolchain_exclude are specified for a test, the twister testplan logic fails to apply the toolchain_exclude filter.

For example (8e8b568):

  logging.add.log_user:
    tags: logging
    filter: CONFIG_USERSPACE
    extra_args: CONF_FILE=log_user.conf USERSPACE_TEST=1
    integration_platforms:
      - qemu_x86
    # FIXME: log_user test fails when compiled with the GCC 12 from Zephyr SDK.
    #        (see the GitHub issue zephyrproject-rtos/zephyr#49213)
    toolchain_exclude: zephyr

The status of the above test for qemu_x86 is set to error (In test case toolchain exclude but is one of the integration platforms) instead of filtered -- this results in the CI attempting to run this test when it should not.

To Reproduce

At 8e8b568, run the following:

scripts/twister --integration --save-tests test.json

test.json contains the following:

        {
            "name":"tests/subsys/logging/log_core_additional/logging.add.log_user",
            "arch":"x86",
            "platform":"qemu_x86",
            "run_id":"d463ea458733a7bf02d91c8f39cc0cb8",
            "runnable":true,
            "status":"error",
            "reason":"In test case toolchain exclude but is one of the integration platforms",
            ...

Expected behavior

status for tests/subsys/logging/log_core_additional/logging.add.log_user on qemu_x86 is set to filtered.

Impact

CI attempts to execute these tests because they are not "filtered."

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Toolchain: Zephyr SDK 0.15.0
  • Commit SHA: 8e8b568

Additional context

Related discussion on Discord: https://discord.com/channels/720317445772017664/733037944922964069/1014173813795209216

@stephanosio stephanosio added bug The issue is a bug, or the PR is fixing a bug area: Twister Twister labels Aug 30, 2022
@stephanosio
Copy link
Member Author

cc @nashif @PerMac

@stephanosio
Copy link
Member Author

Feel free to close this if you do not think this is a bug.

I still do not understand why toolchain filter needs to be mutually exclusive with platform filters (by that, I mean integration_platforms:).

@PerMac
Copy link
Member

PerMac commented Sep 2, 2022

TL;DR It's a feature, not a bug. IMO it is the only way to have a reliable scope in the CI.

I find it not a bug since this is the desired behavior and I deliberately made it work like this (skips on integration platforms getting changed to errors in the CI run). The background:
There are > 400 platforms in zephyr and > 1000 tests. This gives an enormous combination of all possible configurations to test. However, not all combinations are valid. Therefore, twister has a filtering logic to decide which configuration is viable to test and which should be filtered out. The viable scope is still huge and unfeasible to be used for running regression tests for PRs. Therefore, there are additional rules for selecting the scope: some platforms are marked in their yamls with testing: default: true https://github.com/zephyrproject-rtos/zephyr/blob/main/boards/arm/frdm_k64f/frdm_k64f.yaml#L29 It appeared that using all of those platforms is still not feasible. Moreover, this list was not updated for a while, and many platforms only where a certain feature can be tested are not on the "default" list.

We realized we need to be able to set the platform scope for the CI on the test level. I.e. the test (and/or the CI) maintainers should be able to decide on which platforms the test has to run in the CI. This led us to the introduction of the integration_platforms and the --integration/-G mode. Their definition: integration_platforms set the minimal scope for the on-PR CI (i.e. when --integration/-G is used). This translates into two statements:

  • set the scope for the CI
  • make sure builds on such platforms are not skipped in the CI (stresses on "minimal").

This issue questions the second statement, so I will elaborate a bit more on it. There are dozens of filters applied by twister to decide if a test can run on a given platform https://github.com/zephyrproject-rtos/zephyr/blob/main/scripts/pylib/twister/twisterlib/testplan.py#L553 Not applicable tests end as being skipped. With so many filters (with some of them being very obscured) there is a high risk for a test to switch from passed to skipped (e.g. a typo in depends_on entry). Skipped tests are not monitored in any way. Only errors/failures are marked as red and highlighted in the reports.
Summing the above: it is fairly easy to end up with a test that was executed and passed one day on a given platform in the CI and being completely skipped the next day. Then a change in a code is made, introducing a bug, but it is not detected since applicable tests were skipped. After half a year a maintainer is unpleasantly surprised by a report from a client/user that a given feature is broken. The surprise is big since the maintainer saw the CI passing on all of the changes for the last half a year. And the above is not just my imagination, but the real stories. I had to figure out several times "why the hell do we have a green CI if a component is broken". This is why I opted for introducing "skip->error" on integration platforms if the integration mode is on.

In the reported issue. If a certain scenario is deliberately skipped with "zephyr" toolchain (the one used in the CI) why would we want to deliberately ask the CI to execute it (vide definition of integration_platforms)? These are two mutually excluding options and hence the error. So in this case, if we know that this scenario cannot be executed in the CI, we can:

  • remove integration_platforms from the scenario (but then we can forgot about putting it back after fixing the issue)
  • add the failing scenario to the quarantine to mark the temporarliness of the solution (and start using this feature in the CI). The scenario will end up as skipped, but won't cause an error. Instead, It will get a special status in the report and remind us that it still needs a fix. https://docs.zephyrproject.org/latest/develop/test/twister.html#quarantine

@nashif
Copy link
Member

nashif commented Sep 6, 2022

not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Twister Twister bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

3 participants