Skip to content

provide unique seed to each test  #600

@brycedrennan

Description

@brycedrennan
Contributor

Description

Per #531 its unexpected that randomly generated values in different tests would be identical. It's common in test suites to use randomly generated identifiers to help maintain test isolation and provide easier-to-understand logs.

Examples:

  • one may be using a third party system that cannot be reset but where isolation can be provided by using new unique identifiers for each test.
  • one may be using unique s3 buckets per test (and want them to persist for review afterwords)
  • one may be reading test suite logs and want to find logs with a specific identifier related to a specific test failure

One could use --randomly-dont-reset-seed but then they would lose the reproducibility and repeatability.

I propose that the node_id of the test (and the test run seed) be used to deterministically generate a unique seed for that test (instead of all tests being reset to the same fixed value).

Activity

adamchainz

adamchainz commented on Jan 18, 2024

@adamchainz
Member

Using a random number per test does not guarantee isolation in remote systems. The birthday problem makes the chance of collision between many tests high, at least for 32-bit identifiers.

Why not use shared counter between tests to absolutely guarantee unique numbers?

I am not exactly against the proposal though. It might make sense for the reason that unique random seeds will make a test suite use more random data, increasing the amount of fuzz testing.

brycedrennan

brycedrennan commented on Feb 23, 2024

@brycedrennan
ContributorAuthor

Actual scenario I encountered:
Each test run spins up a new postgres database and populates it with the correct schema. There are about ten tests that need "users" in the database to work. The test fixture generates a random email address from random-first-name.random-last-name.random-number(1-100000).

Even accounting for the birthday problem, the chances of collision are very small.

If the email addresses are unique then understanding the logs is a lot easier. You just look for the email address that relates to the test. You could even keep the database around and look at records related to that account. But if the "random" email addresses are the same in every test, that's unexpected and makes debugging a lot harder.

bcm-at-zama

bcm-at-zama commented on Mar 18, 2024

@bcm-at-zama

I think I have the same request than @brycedrennan, but if not, I could make another issue, tell me: I would like to be able to call with --randomly-seed=1976373286, but to have that the seed used for all test test_xxx_i of all files path_j / file_j depend on the

  • randomly seed
  • test_xxx_i string
  • path_j / file_j

I think it allows to have tests with similar code, but to make sure that the real seeds that will be used in them are different, for more test coverage.

It's basically what I had done in our https://github.com/zama-ai/concrete-ml/blob/main/conftest.py#L171 autoseeding_of_everything function, before I realize that somehow, randomly overwrites my seeding just after, in its

def pytest_runtest_call(item: Item) -> None:
    if item.config.getoption("randomly_reset_seed"):
        _reseed(item.config)

I would be happy to participate or do a PR for that, I'll just need a bit of guidance to make the right implementation choice.

bcm-at-zama

bcm-at-zama commented on Mar 20, 2024

@bcm-at-zama

OK so actually, what I did with autoseeding_of_everything with --randomly-dont-reset-seed looks to make the job! We can:

  • seed everything
  • all seeds used in practice depends on the filename / the function name / the parameters, such that, even if you reuse the same kind of test body for different usages, the PRNG used in practice will be different in them

If requested, I'll be happy to try to add it to the project!

adamchainz

adamchainz commented on Mar 20, 2024

@adamchainz
Member

@bcm-at-zama That sounds like about the same request here. Rather than combine the filename and test name like you’ve done in your fixture, we should instead use the pytest “test ID”, which includes the path already, plus extra details like test parameters IIRC.

I agree with @brycedrennan that we can actually ignore the birthday problem. I still don't like the idea of trying to rely on differing random seeds to avoid collisions. But it does sound like a good idea to increase test coverage.

@bcm-at-zama , would you like to try creating a PR? Please include tests, a changelog note, and any relevant updates to the README.

brycedrennan

brycedrennan commented on Mar 21, 2024

@brycedrennan
ContributorAuthor

@adamchainz yeah its a good point. a global counter in a fixture (possibly combined with some more random data) would be a more thorough solution. On the other hand, sometimes tests are distributed among machines and a global counter would actually be a bit tricky to implement so it's nice to have options.

bcm-at-zama

bcm-at-zama commented on Mar 21, 2024

@bcm-at-zama

Great @adamchainz : I don't know about test ID but I'll have a look. I'll try to make this PR in April!

Edit: Shame on me, I never did a PR here and I'll certainly not do it, I moved to another position and I have not the time for this, sorry.

slafs

slafs commented on Apr 30, 2025

@slafs

I just came here to say this feature would be highly appreciated 😄

linked a pull request that will close this issueUse a different random seed per test #687on Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @slafs@adamchainz@brycedrennan@bcm-at-zama

      Issue actions

        provide unique seed to each test · Issue #600 · pytest-dev/pytest-randomly