Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Feb 20, 2025

The PrimitiveBuilder currently exposes its internal components as pub. This puts the onus of handling the state of the null buffer and values buffers on the client.

The PrimitiveBuilder is built around append-only access to values and buffers. But for several array variants, random access is better suited. For example, unpacking FastLanes vectors, or scattering the patches of a SpareArray.

This PR removes values and nulls from the public API for the PrimitiveBuilder, and replaces it with an uninit_range(len: usize) method. This returns a mutable handle that provides scoped access to the nulls and the values. For simplicity, it impls Deref/DerefMut to &[MaybeUninit<T>] to make random access of values easy and also gives direct access to nice std::slice methods like fill or copy_from_slice.

The returned handle has a finish(self) method which should be called after value initialization, and will advance the internal buffer length to finish a chunk.

As an example of usage:

    let mut builder: PrimitiveBuilder<i32> = PrimitiveBuilder::with_capacity(Nullability::NonNullable, 5);
    
	// Get a range. In this case it's the full range but it can be a partial range.
    let mut range = builder.uninit_range(5);

	// Populate the uninitialized range in reverse order.
    // Even indices will store values, odd indices will be NULLs.
    for i in [4, 3, 2, 1, 0] {
        range[i] = MaybeUninit::new(i as i32);
    }
    range.finish();

You can also copy into the output from an already initialized set of values:

range.copy_from_init(0, 3, &[1,2,3]);

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 20, 2025

CodSpeed Performance Report

Merging #2443 will not alter performance

Comparing aduffy/scoped-builder (e33e879) with develop (93ba60a)

Summary

✅ 765 untouched benchmarks

builder
.values
.extend_from_slice(transmute(&decoded[offset..]));
uninit.copy_from_init(out_idx, 1024 - offset, transmute(&decoded[offset..]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uninit isn't stateful, the whole point is that you have access to the whole range while the uninit is live. The downside is we need to keep track of the out_idx ourself

@a10y a10y force-pushed the aduffy/scoped-builder branch from d085e22 to 3fc68d8 Compare February 20, 2025 17:22
&mut transmute_mut(builder.values.deref_mut())
[my_offset_in_builder + i * 1024 - offset..][..1024],
);
// SAFETY: &[T] and &[MaybeUninit<T>] have the same layout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also we promise not to read the values right?

@a10y
Copy link
Contributor Author

a10y commented Feb 20, 2025

Documenting some discussion from Slack:

  • Why not have have finish() get called on Drop for the UninitRange?
    • In the case an error is thrown, we don't want to auto-finish, because it's possible that you didn't complete initializing the values, so that memory should not be treated as having completed initialization
  • Do we ever want to allow the user to set_len directly instead of calling finish()?
    • For all canonical Primitive types, the range is always going to be fixed, since all values (including nulls) occupy one T-sized slot in the output buffer. Having finish consume the handle and update the builder len is a safer and more ergonomic way to set the range as being initialized

@a10y a10y requested a review from joseph-isaacs February 20, 2025 18:22
@a10y a10y marked this pull request as ready for review February 20, 2025 18:49
@a10y a10y enabled auto-merge (squash) February 20, 2025 18:52
@a10y a10y merged commit 174e5b5 into develop Feb 20, 2025
25 checks passed
@a10y a10y deleted the aduffy/scoped-builder branch February 20, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants