Skip to content

proposal: sync: add M-local storage #73667

@valyala

Description

@valyala

Proposal Details

The issue

Sometimes it is needed to develop concurrent code, which must scale with the number of available CPU cores. I face such tasks quite frequently when working on VictoriaMetrics. A few examples:

  • VictoriaMetrics needs to accept incoming samples from many concurrent client connections in the most efficient way. Every connection is processed by a dedicated goroutine. These goroutines read the ingested samples and put them into in-memory buffer, which is periodically converted to searchable data block and saved to disk. Initially the buffer was implemented as a slice of samples protected by a mutex. This hit scalability issues on systems with many CPU cores, since they spent a lot of time blocking on the mutex instead of doing useful work. So later this buffer has been rewritten into sharded buffers where every shard is protected by its own mutex. The samples are added into these shards in a round-robin, hash-based or a random manner in order to reduce mutex contention. This helped improving the scalability of data ingestion path on systems with many CPU cores, but it still suffers from mutex contention :( The mutex contention can be reduced further by increasing the number of shards, but this increases memory usage.
  • VictoriaLogs needs to accept incoming logs. It buffers them in memory and then periodically converts the buffered logs into searchable blocks. It faces the same scalability issues on systems with many CPU cores as described above.

The solution

To provide M-Local storage at sync package with the following API:

// MLocal is a storage for thread-local T items.
//
// It holds up to a single item per thread (M).
type MLocal[T any] struct {
    // Init is an optional function for initializing the newly created x for the current thread (M)
    Init func(x *T)
}

// Get returns T item for the current thread.
//
// If the current thread has no an item, then it is created
// and initialized via optional Init function.
//
// The caller must be responsible for protecting access
// to the returned item from concurrent goroutines
func (m *MLocal[T]) Get() *T

// All returns all the T items at m.
//
// It is expected that All is called when there are no
// other concurrent  goroutines who access m
func (m *MLocal[T]) All() []*T

It is expected that the implementation must allocate T items at P-local area in order to avoid false sharing between items allocated for different threads.

The All method can be implemented in the most simple way by iterating over the existing items and putting them into a newly allocated slice. This is OK given the documented restrictions to this method - it can be called when there are no concurrent goroutines which access the m.

This API ideally solves the practical tasks described above, by providing linear scalability with the number of CPU cores.

Alternatives

There are other proposals for P-local storage in Go, but they are too complex and are trying to solve many different cases with different requirements and restrictions at once. That's why there are very low chances they will be implemented in the near future. Even if they will be implemented, the implementation won't fit ideally different practical use cases such as this one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Proposalcompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions