Description
Avoid heap reallocations by using thread-local buffers
Background
Currently, with every request processed by OtlpEncoder
, we create fresh Vec
buffers in hot paths such as:
determine_fields
(for building up the schema field list)write_row_data
(for serializing per-log record rows)- temporary vectors in batch assembly
Since the encoder can be invoked at high frequency and potentially from multiple threads (e.g., via a multi-threaded runtime), this results in repeated heap allocations per request, putting additional pressure on the allocator and causing unnecessary CPU and memory churn.
Example Hot Code Path
fn determine_fields(&self, log: &LogRecord) -> Vec<FieldDef> {
let estimated_capacity = 7 + 4 + log.attributes.len();
let mut fields = Vec::with_capacity(estimated_capacity); // <-- Allocates every call
...
}
Similarly, in write_row_data:
fn write_row_data(&self, log: &LogRecord, sorted_fields: &[FieldDef]) -> Vec<u8> {
let mut buffer = Vec::with_capacity(sorted_fields.len() * 50); // <-- Allocates every call
...
}
Proposal
Introduce thread-local buffers for encoding temporary vectors.
Use Rust's std::thread_local!
or std::cell::RefCell
for maintaining per-thread Vec<FieldDef>
and Vec<u8>
buffers.
On each encode call, clear and reuse the thread-local buffer, instead of allocating a new one.
This approach is safe and effective, since each request is independent, and sharing across threads isn't needed.
Why thread-local?
Each thread maintains its own reusable buffer, so no synchronization is required.
Dramatically reduces allocator overhead for high-throughput encoding.
Suggested Implementation
Add a thread-local pool for each kind of temporary vector used in encoding (e.g., fields, row data).
Refactor code to obtain the buffer from the thread-local storage, clear it, and use it during the encode call.
For example:
thread_local! {
static FIELD_DEFS_BUF: RefCell<Vec<FieldDef>> = RefCell::new(Vec::with_capacity(32));
static ROW_BUF: RefCell<Vec<u8>> = RefCell::new(Vec::with_capacity(4096));
}
Usage example:
FIELD_DEFS_BUF.with(|buf| {
let mut fields = buf.borrow_mut();
fields.clear();
// use fields as mutable Vec<FieldDef> instead of allocating a new one
});
Benefits
- Reduces per-request heap allocations by reusing buffers across requests within a thread.
- Lowers CPU usage and allocator contention under load.
- Zero risk of cross-thread races (buffers are per-thread).
- No external crate required; can use Rust std only.
Next Steps
- Audit code for all per-request allocations of Vec, Vec, and other large temporaries.
- Refactor these to use thread-local buffers as described.
- Add a benchmark to demonstrate reduction in allocation count and performance win.