Skip to content

feat: ChunkedArray null methods (drop, count) #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

winding-lines
Copy link
Collaborator

Implement some of the methods for null support: drop and count.

@kszucs
Copy link
Owner

kszucs commented May 8, 2025

Thanks for the PR! Going to take a look tomorrow.

@winding-lines winding-lines force-pushed the chunked-array-iteration branch 2 times, most recently from 43eb47a to fb6f5a7 Compare May 8, 2025 00:59
"""

var dtype: DataType
var length: Int
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should store the null_count as well as the length.

var buffer_start = 0
# Process each buffer.
for buffer_index in range(len(self.buffers)):
var buffer = self.buffers[buffer_index]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the case of nested types? In case of List we have three buffers: validity, offsets, values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the individual type will have to know that. Let me see what language features we have available.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement drop_nulls on the typed array interfaces ListArray, BinaryArray, PrimitiveArray[T]?

fn size(self) -> Int:
return self.buffer.size

fn grow[I: Intable](mut self, target_length: I):
return self.buffer.grow[DType.bool](target_length)

@always_inline
fn bit_count(self) -> Int:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these methods covered with tests?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like partial_byte_set doesn't have a corresponding test case.

@kszucs
Copy link
Owner

kszucs commented May 8, 2025

Looks good on principle but would be nice if we could split this into 3 PRs:

  1. Adding and testing the Bitmap utility methods
  2. Moving ChunkedArray into its own module without the drop_nulls method but with combine_chunks.
  3. Implement drop_nulls for Primitive/Binary/Nested/Chunked arrays

@winding-lines winding-lines force-pushed the chunked-array-iteration branch from fb6f5a7 to 16c1420 Compare May 27, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants