-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Proposal Details
The issue
Sometimes it is needed to develop concurrent code, which must scale with the number of available CPU cores. I face such tasks quite frequently when working on VictoriaMetrics. A few examples:
- VictoriaMetrics needs to accept incoming samples from many concurrent client connections in the most efficient way. Every connection is processed by a dedicated goroutine. These goroutines read the ingested samples and put them into in-memory buffer, which is periodically converted to searchable data block and saved to disk. Initially the buffer was implemented as a slice of samples protected by a mutex. This hit scalability issues on systems with many CPU cores, since they spent a lot of time blocking on the mutex instead of doing useful work. So later this buffer has been rewritten into sharded buffers where every shard is protected by its own mutex. The samples are added into these shards in a round-robin, hash-based or a random manner in order to reduce mutex contention. This helped improving the scalability of data ingestion path on systems with many CPU cores, but it still suffers from mutex contention :( The mutex contention can be reduced further by increasing the number of shards, but this increases memory usage.
- VictoriaLogs needs to accept incoming logs. It buffers them in memory and then periodically converts the buffered logs into searchable blocks. It faces the same scalability issues on systems with many CPU cores as described above.
The solution
To provide M-Local storage at sync
package with the following API:
// MLocal is a storage for thread-local T items.
//
// It holds up to a single item per thread (M).
type MLocal[T any] struct {
// Init is an optional function for initializing the newly created x for the current thread (M)
Init func(x *T)
}
// Get returns T item for the current thread.
//
// If the current thread has no an item, then it is created
// and initialized via optional Init function.
//
// The caller must be responsible for protecting access
// to the returned item from concurrent goroutines
func (m *MLocal[T]) Get() *T
// All returns all the T items at m.
//
// It is expected that All is called when there are no
// other concurrent goroutines who access m
func (m *MLocal[T]) All() []*T
It is expected that the implementation must allocate T items at P-local area in order to avoid false sharing between items allocated for different threads.
The All
method can be implemented in the most simple way by iterating over the existing items and putting them into a newly allocated slice. This is OK given the documented restrictions to this method - it can be called when there are no concurrent goroutines which access the m
.
This API ideally solves the practical tasks described above, by providing linear scalability with the number of CPU cores.
Alternatives
There are other proposals for P-local storage in Go, but they are too complex and are trying to solve many different cases with different requirements and restrictions at once. That's why there are very low chances they will be implemented in the near future. Even if they will be implemented, the implementation won't fit ideally different practical use cases such as this one.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status