Skip to content

Conversation

@connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Oct 1, 2025

The way in which the modules are structured under vortex-array/src/arrays is quite inconsistent. This PR ensures that each kind of array consistently has the same structure, which is:

  • array.rs: contains the array struct definition and implementation
  • compute: contains any compute function implementations that the encoding implements in separate files
  • vtable: contains the array vtable implementations, each in a separate file. The vtable itself is located in vtable/mod.rs

Probably the most controversial thing here is the addition of a vtable module, where each helper trait implementation is in its own file. I agree that it isn't ideal as it means we have a lot of files named the same thing, but at the same time I feel it is far worse to have every single module place their vtable implementations in slightly different files (it becomes incredibly hard to find where things are, even with a language server).

If we really care that the ArrayVTable and the OperationsVTable should be located in the same file, then I would argue that those traits should be combined.

I also snuck in a change (first commit) where we validate in new_unchecked in debug mode only.

Note: All modules have files moved around in the exact same way with the exception of primitive, where I create an array module instead of an array.rs because there are so many extra things we implement specifically for PrimitiveArray.

@connortsui20 connortsui20 added the chore Release label indicating a trivial change label Oct 1, 2025
@vortex-data vortex-data deleted a comment from codspeed-hq bot Oct 1, 2025
@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

❌ Patch coverage is 89.43595% with 221 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.56%. Comparing base (62b57a6) to head (952a813).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-array/src/arrays/varbin/array.rs 69.54% 53 Missing ⚠️
vortex-array/src/arrays/decimal/array.rs 83.08% 23 Missing ⚠️
vortex-array/src/arrays/varbinview/binary_view.rs 85.16% 23 Missing ⚠️
vortex-array/src/arrays/struct_/array.rs 87.50% 22 Missing ⚠️
vortex-array/src/arrays/varbinview/array.rs 88.54% 22 Missing ⚠️
...tex-array/src/arrays/primitive/array/conversion.rs 68.25% 20 Missing ⚠️
vortex-array/src/arrays/list/array.rs 83.18% 19 Missing ⚠️
vortex-array/src/arrays/primitive/array/mod.rs 86.31% 13 Missing ⚠️
...ortex-array/src/arrays/masked/vtable/operations.rs 21.42% 11 Missing ⚠️
vortex-array/src/arrays/primitive/array/cast.rs 78.94% 4 Missing ⚠️
... and 6 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq

This comment was marked as spam.

@connortsui20 connortsui20 requested review from gatesn, joseph-isaacs and robert3005 and removed request for gatesn, joseph-isaacs and robert3005 October 1, 2025 21:02
@connortsui20 connortsui20 force-pushed the ct/array-module-cleanup branch 2 times, most recently from e3032fa to 4559b3f Compare October 2, 2025 14:43
@connortsui20 connortsui20 marked this pull request as ready for review October 2, 2025 15:48
@connortsui20 connortsui20 requested a review from danking October 2, 2025 15:48
@connortsui20
Copy link
Contributor Author

@claude review

@claude
Copy link
Contributor

claude bot commented Oct 2, 2025

Claude finished @connortsui20's task —— View job


Code Review: vortex-array modules cleanup ✅

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Examine the new module structure pattern
  • Review key changed files for consistency
  • Check code quality and potential issues
  • Provide comprehensive review feedback

Review Summary

This is an excellent refactoring PR that significantly improves the organization and consistency of the vortex-array module structure. The changes successfully standardize how array encodings are organized across the codebase.

🎯 Positive Aspects

Consistent Module Structure

The new standardized structure is well-designed and makes navigation much easier:

arrays/{encoding}/
├── array.rs          # Array struct and core implementation
├── compute/           # Compute function implementations
├── vtable/           # VTable trait implementations (one per file)
│   ├── mod.rs        # VTable definition and registration
│   ├── array.rs      # ArrayVTable implementation
│   ├── canonical.rs  # CanonicalVTable implementation
│   ├── operations.rs # OperationsVTable implementation
│   ├── serde.rs      # SerdeVTable implementation
│   ├── validity.rs   # ValidityVTable implementation
│   └── visitor.rs    # VisitorVTable implementation
└── tests.rs          # Tests (where extracted)

Primitive Array Special Treatment

The decision to make primitive/array/ a module instead of array.rs is well-justified given the complexity:

  • vortex-array/src/arrays/primitive/array/mod.rs:232 - The main array implementation
  • Additional specialized modules like accessor.rs, cast.rs, conversion.rs, patch.rs, top_value.rs
  • This approach prevents the single file from becoming unwieldy

Debug Validation Enhancement

The addition of debug validation in new_unchecked methods is a great defensive programming practice:

#[cfg(debug_assertions)]
Self::validate(&buffer, offset, len, &validity)
    .vortex_expect("[Debug Assertion]: Invalid `BoolArray` parameters");

This provides runtime validation in debug builds without affecting release performance.

VTable Organization

The new vtable/ module structure with separate files for each trait implementation addresses a real pain point:

  • Makes it easy to locate specific trait implementations
  • Consistent across all array types
  • The vtable!(Encoding) macro usage in vtable/mod.rs maintains consistency

📋 Code Quality Assessment

✅ Adheres to Repository Guidelines

  • Follows the code style preferences from CLAUDE.md
  • Maintains proper module-scoped imports
  • Uses appropriate error handling with vortex_err! and vortex_bail!
  • Includes comprehensive doc comments for public APIs

✅ No Lint Issues

  • Cargo clippy passes cleanly with --all-targets --all-features
  • No warnings generated

✅ Maintains API Compatibility

  • All public interfaces remain unchanged
  • No breaking changes to existing functionality

🔍 Technical Details Reviewed

File Movements

The file reorganization is systematic and well-executed:

  • vortex-array/src/arrays/bool/array.rs:452 - Comprehensive implementation with proper validation
  • vortex-array/src/arrays/bool/vtable/mod.rs:42 - Clean vtable organization
  • vortex-array/src/arrays/list/vtable/mod.rs:42 - Consistent pattern across encodings

Test Preservation

Tests have been properly preserved and reorganized:

  • Some moved to dedicated tests.rs files (e.g., chunked/tests.rs, masked/tests.rs)
  • Test logic remains intact
  • Test harness files appropriately relocated

📊 Coverage Impact

The codecov report shows 88.63% patch coverage with 260 lines missing coverage. This is expected for a refactoring PR where:

  • Code is moved rather than newly implemented
  • Existing uncovered lines remain uncovered in their new locations
  • No degradation in overall project coverage (87.63%)

🎉 Recommendation

✅ APPROVE - This is a high-quality refactoring that significantly improves code organization without introducing any functional changes or regressions. The standardized structure will make the codebase much more maintainable and navigable for both current and future contributors.

The controversial vtable/ module separation mentioned in the PR description is actually well-justified - having predictable locations for trait implementations far outweighs the minor inconvenience of similar file names across modules.


@connortsui20 connortsui20 force-pushed the ct/array-module-cleanup branch 4 times, most recently from f703d51 to 41b7f58 Compare October 2, 2025 18:35
Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. My only difference of opinion is that I dislike "tests.rs" files. I prefer the tests at the bottom of the file with the relevant functions. I guess when I'm reading a function I'm both interested in what it does but also why I should believe it does that correctly. If the tests are not in-line, the file doesn't serve as a boundary within which I search for evidence. Instead, the whole module or the whole repo must be searched for the relevant tests.

@robert3005
Copy link
Contributor

Agree with @danking on the test part. Otherwise we can merge this

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Also changes the export pattern for `BinaryView` and related types.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20
Copy link
Contributor Author

connortsui20 commented Oct 3, 2025

Discussed briefly offline, since the tests are not really specific to a single file (most tests will test several things that are split across the vtable trait, compute functions, and method implementations), for now we'll just keep it in tests.rs.

I will mention that I feel the tests should be a bit more targeted to specific things (sort of how I structured the tests in fixed_size_list and list_view), but I think for now this is good enough(TM).

@connortsui20 connortsui20 force-pushed the ct/array-module-cleanup branch from 41b7f58 to 952a813 Compare October 3, 2025 16:10
@connortsui20 connortsui20 enabled auto-merge (squash) October 3, 2025 16:13
@connortsui20 connortsui20 merged commit d7d45f2 into develop Oct 3, 2025
39 checks passed
@connortsui20 connortsui20 deleted the ct/array-module-cleanup branch October 3, 2025 16:21
gatesn pushed a commit that referenced this pull request Oct 29, 2025
Also renames `compress` to `bitpack_compress` since it gets exported out
of `fastlanes` and you have no idea which method compress actually
means.

This is a purely cosmetic change that mimics the changes made in
#4818

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Release label indicating a trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants