Skip to content

WIP Usage tracker #11816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from
Draft

WIP Usage tracker #11816

wants to merge 60 commits into from

Conversation

colega
Copy link
Contributor

@colega colega commented Jun 23, 2025

What this PR does

Adds usage-tracker service. Creating a PR to see the CI checks.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

pracucci and others added 30 commits March 3, 2025 13:22
* Add usage-tracker module

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Add usage-tracker module

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add UsageTracker gRPC service

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Change from int64 to uint64

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Add partition and instance rings support to usage-tracker

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Usage-tracker: track series in the map

This starts tracking the series from the request in the internal map.

No cleanup is implemented yet. Also tests pending.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* License header

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Switch to standard map, extract store logic

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Inject a non-implemented events publisher

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Extract `minutes` type

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Configure idle timeout

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Implement cleanup, refactor, and happy case

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Remove unused testUser2

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Remove unused arg names

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make reference-help

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make doc

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update the comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make license

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Added UsageTrackerClient

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed ring op

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Add UsageTrackerClient to distributor

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed compilation issue in distributor unit tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Rebuilt auto-generated doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Update pkg/mimir/modules.go

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
* UsageTracker: add methods to create & load snapshot

Still to be wired to Kafka/ObjectStorage.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Move constant

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Extract areInValidSpanToCompareMinutes

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix method usage and check

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
…bled (#10129)

* Enforce max series limit in the distributor when usage-tracker is enabled

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixes

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix dependencies

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix bug in UsageTrackerClient

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix check

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added cortex_usage_tracker_client_track_series_duration_seconds

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
When we're tracking series, the timestamp we're setting is `now`
truncated to minutes. It's VERY unlikely that some other goroutine has
updated this timestamp to something newer already.

Maybe there was a newer timestamp set from a snapshot, in which case we
should overwrite it, as we cleanup based on *our* wall clock anyway.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Don't preallocate MaxUint64/128 if tenant has no limit set.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* UsageTracker: process created series events

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix and test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Apply suggestions from code review

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Init usage-tracker Kafka client

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Move idle-timeout validation to Config.Validate()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Allow to configure usage-tracker zone

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix local dev env

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* UsageTracker: store.cleanup() test & refactor

The whole markForDeletion didn't work well, so I refactored it to track
the number of ongoing requests and rely on that being 0 to perform
deletions.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Remove commented out test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Make test shorter

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
#10144)

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* UsageTracker: publish and consume SeriesCreated events

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Implement batching and concurrent event publishing

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix bug and start at millis

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* More generous default for max pending events

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
One of the assumptions of trackerStore is that cleanup() is called at
least once in each idleTimeout: this is needed to make sure that
"minutes" timestamp doesn't get too old and starts looking as a newer
one.

We didn't ensure this in the test, so weird stuff was happening.

Now we increase the timestamp one minute each time, and we set idle
timeout to 1 minute, so we force the race conditions.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* UsageTracker: register store as prometheus collector

This will export the number of activer series tracker for each tenant.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Consistent metric name

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add license header

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* UsageTracker: decouple tests from implementation

This decouples trackerStore tests from the specific implementation by
using a new seriesCounts() method and by decoding the snapshots rather
than accessing to the data maps.

This will allow us to switch the implementation while keeping the same
tests.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add BenchmarkTrackerStoreTrackSeries

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This is still good enough for our use case (we don't need to touch the
1h idle timeout limit) but this means that we only need 7 bits to store
the timestamp.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* UsageTracker: extract clock.Minutes

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make license

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix uint maths

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix comment and remove unused sub method

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* UsageTracker: use special swiss map & channels

This refactors the entire tracker.

There are two big changes here: how the map is implemented and how we
interact with it.

How the map is implemented:

We're getting rid of usual maps storing atomic.Int32: it's an overkill
to have 32bit int to store a byte, it's even more an overkill to store a
64bit pointer to that, let's not talk about map overhead.

Instead of that, we're building kind of swiss map, I based the
implementation on the code from dolthub/swiss, you can read a blog post
how that works here:
https://www.dolthub.com/blog/2023-03-28-swiss-map/

However, most of the code was replaced.

What did I do? Swiss map has a slice groups of hash suffixes, which are one
byte, to quickly find the bucket where our element is, and it also has a
slice of groups of keys. That's still a lot of overhead for our use.

What are our restrictions:
- Our keys are uint64, so great, we don't need to hash, our keys are hashes already!.
- All our keys in a single shard have the same suffix of 7 bytes (because
  we have 128 shards).

So, I changed the implementation to "index" based on the prefix, not the
suffix of the key (series hash), and since the suffix is always the
same, we could use that space for something else... Guess what? Our
values are also 7 bits, so we're storing the key AND the value in a
single uint64 field. Now each series is just uint64+uint8 in the map
implementation (plus the overallocation of the slice, of course).

I didn't just store the value in the suffix of the uint64 data, but I
also negated it, so we can figure out whether a value is a real value or
an empty value without having to check if there's a tombstone in the
index! Cleanup is now just iterating one single slice, isn't that
awesome?

Well, but how do we coordinate all of that? We can't do atomic byte
operations (can we? IDK), but also we probably don't want to. I guessed
(to be investigated whether that was true) that lots of CPU usage comes
from having to do atomic-everything.

I head that golang is good at handling hundreds of goroutines (and I
managed tens of thousands of them in the past, although not in a
high-performance env), so I decided to just run one goroutine per each
map shard.

Each map shard has its own worker which takes care of updating the
internal state: now the synchronization overhad is in pushing the data
to the workers through a channel. But once data is in the worker, a
single shard is always updated by the same goroutine.

Say we have 10k tenants in an env, that's 10K goroutines, not that
bad... I hope. We'll see.

I also considered lowering the number of shards to 64, but in order to
achieve that we need to restrict the idleTimeout to 30 minutes, so we
can fit the minutes timestamp in a 64 byte value. I think we can do
that.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Shutdown tenants in the benchmark

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Cleanup in tests too

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update comments

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make lint: remove unused

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix tenantshard.TestMap

* go fmt

* Redefine keyMask constant

* Make safer usage of channels and slices

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Make events topic configurable and change it

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make doc reference-help

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Refactor events

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Just making sure everything is right if we end up productionizing this
and someone needs to understand the code.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
colega and others added 25 commits March 3, 2025 17:51
If a lot of requests arrive at the same time, we're going to to
tentatively allocate too many too large maps.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
…dom generator (#10187)

* Pre-compute series hashes in usage-tracker-load-generator

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Usage-tracker-load-generator: pre-compute series hashes and fix random generator

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* UsageTracker: replace goroutines with mutexes

The new map implementation is great, but channel interaction seems to be
too expensive. This replaces goroutines & channels with a simple
sync.Mutex.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Lint and remove extra return

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This will use more memory, but also probably less CPU as the buckets
will be less full.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
UsageTracker will be able to run multiple partitions, handling each
partition's lifecycle separtely.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Since one usage-tracker is going to serve multiple partitions, we should
be explicit about the partition we want to talk to.

I think that ideally we should group all requests to same instance in a
single proto request, but I don't want to spend time on that now.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This reverts commit ffeaef831f8b1901c97238bcd1aad2f98983372c.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
When scaling up, we'll lose partitions gracefully.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This should enable graceful scale down by signalling other instances
that they should take over our partitions.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
PartitionInstanceLifecycler and Kafka metrics aren't properly
unregistered when we stop a partition, so we can't currently create same
partition twice.

I tried cleaning up the instance lifecycler metrics, but Kafka client is
out of my control, so I ended up exposing a WrapCollectorWith method in
prometheus/client_golang and leveraging that.

prometheus/client_golang@main...colega:prometheus-client_golang:wrap-collector

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
…10973)

* Add partition assignment tests to usage tracker

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix waitgroup adding race

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix method name typo

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Try to make a test more robust by performing a controlled amount of
reconciliations.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* s/ownds/owns/

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Ensure RUnlock() in tests

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Contributor

github-actions bot commented Jun 23, 2025

colega added 4 commits June 27, 2025 14:40
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants