Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Slice 2.0 #18471

Merged
merged 4 commits into from Aug 12, 2023
Merged

Update to Slice 2.0 #18471

merged 4 commits into from Aug 12, 2023

Conversation

dain
Copy link
Member

@dain dain commented Jul 31, 2023

Description

This PR updates to the proposed Slice 2.0 apis.

This PR is based on #18461

Release notes

(x) This is not user-visible or docs only and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Jul 31, 2023
@github-actions github-actions bot added tests:hive iceberg Iceberg connector bigquery BigQuery connector mongodb MongoDB connector labels Aug 8, 2023
@dain dain force-pushed the slice-2.0 branch 5 times, most recently from e27c294 to 538d329 Compare August 10, 2023 04:02
ReadBuffer buffer = buffers[0];
int shortsRemaining = length;
while (shortsRemaining > 0) {
ensureReadable(min(Long.BYTES, shortsRemaining * Short.BYTES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this line. If there are, say, 100 shorts remaining, this would do:

ensureReadable(min(8, 200)) -> ensureReadable(8)

Also, what's the significance of Long.BYTES here? Why would this method care about having at least 8 readable bytes, assuming that's the intention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely agree. This is the style of the other methods in this class I had to ask @arhimondr how this code works, and this is what he said:

Hmm, it does look weird. I need to read it carefully to tell you more.
It basically tells that "at least" 8 bytes (or less if remaining bytes is less than 8" should be available
If not available - it will read an entire encryption / compression block

I think we all agree it need some renaming

int currentIndex = sourceIndex;
int shortsRemaining = length;
while (shortsRemaining > 0) {
ensureCapacityFor(min(Long.BYTES, shortsRemaining * Short.BYTES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

return XxHash64.hash(Slices.wrappedIntArray(positionLinks));
long hash = 0;
for (int positionLink : positionLinks) {
hash = XxHash64.hash(hash, positionLink);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't matter, but be aware that this is not equivalent to the old code in terms of semantics and performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add an update() method that accepts a long

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree to both comments. The code should be fine as this is only used during spilling, and only used as a checksum.

return createSlicesBlock(IPADDRESS, generateListWithNulls(positionCount, nullRate, () -> randomSlice(random, 16)));
}

private static Slice randomSlice(Random random, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this to Slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add this to Slices. I tooks a quick look at the code base, and there are like 20ish places that do this.

// there might be a value split across the buffers
Slice slice = null;
if (remaining != 0) {
slice = Slices.allocate(Integer.BYTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cache this one-int buffer (in the class, or for the whole method) to avoid potentially reallocating on every boundary? Not sure how often remaining < Integer.BYTES is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tempBuffer8 as a field variable and use it in int[] and long[] methods

@@ -414,19 +413,19 @@ public static long hash64(Slice data)
long k = 0;
switch (data.length() - current) {
case 7:
k ^= ((long) UnsafeSlice.getByteUnchecked(data, current + 6) & 0xff) << 48;
k ^= ((long) data.getByteUnchecked(current + 6) & 0xff) << 48;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice.getByte() and Slice.getByteUnchecked() should be equivalent now, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getByte performs a bounds check on the Slice's view of the underlying byte[].

return XxHash64.hash(Slices.wrappedIntArray(positionLinks));
long hash = 0;
for (int positionLink : positionLinks) {
hash = XxHash64.hash(hash, positionLink);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add an update() method that accepts a long

core/trino-spi/pom.xml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed iceberg Iceberg connector mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

None yet

3 participants