Skip to content

TestUserMetrics is flaky #13420

@kradalby

Description

@kradalby

We have several different metrics libraries in the clients that all use some variant of a global state (expvar, clientmetrics, soon usermetrics(expvar)(#13309). This runs fine when there is a single tailscale client, or tsnet, but all of these metrics becomes unpredictable and sometimes wrong if multiple tailscale's are ran, for example if an application has multiple tsnet instances or tests are ran in parallell (or in sequence as they dont clear between tests).

@knyar gathered a list of the behaviour we are currently seeing:
If I understand things correctly, when we do this, such “global” metrics become unusable/inaccurate in a few different ways:

  • gauge metrics get overwritten by whatever instance set them last.
  • counter metrics remain somewhat “correct” and track the cumulative count across all instances, without giving us ability to distinguish them.
  • on clientmetric collection side, each tsnet app reports the same “global” values to logz independently, so we end up tracking the same inaccurate set of metrics multiple times.

Metadata

Metadata

Assignees

Labels

flaky-testtestingIssues around tests in the codebase and the QA process

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions