pluggable arrow exec by a10y · Pull Request #7793 · vortex-data/vortex

a10y · 2026-05-05T12:25:40Z

Summary

Closes: #000

Testing

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

codspeed-hq · 2026-05-05T12:33:13Z

Merging this PR will degrade performance by 21.4%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 36 regressed benchmarks
✅ 1139 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)]`	117.8 µs	138.3 µs	-14.83%
❌	Simulation	`bench_compare_primitive[(10000, 128)]`	103.3 µs	124.9 µs	-17.29%
❌	Simulation	`bench_compare_primitive[(10000, 2)]`	102.8 µs	123.6 µs	-16.79%
❌	Simulation	`bench_compare_primitive[(10000, 2048)]`	126.3 µs	148 µs	-14.7%
❌	Simulation	`bench_compare_primitive[(10000, 32)]`	103.1 µs	124 µs	-16.88%
❌	Simulation	`bench_compare_primitive[(10000, 4)]`	103.2 µs	123.9 µs	-16.69%
❌	Simulation	`bench_compare_primitive[(10000, 512)]`	111.7 µs	133.2 µs	-16.15%
❌	Simulation	`bench_compare_primitive[(10000, 8)]`	103 µs	123.7 µs	-16.7%
❌	Simulation	`bench_compare_sliced_dict_primitive[(1000, 10000)]`	76.8 µs	97.7 µs	-21.4%
❌	Simulation	`bench_compare_sliced_dict_primitive[(10000, 10000)]`	153 µs	173.3 µs	-11.69%
❌	Simulation	`bench_compare_sliced_dict_primitive[(2000, 10000)]`	81.6 µs	102.3 µs	-20.22%
❌	Simulation	`bench_compare_sliced_dict_primitive[(2500, 10000)]`	84.4 µs	105.2 µs	-19.77%
❌	Simulation	`bench_compare_sliced_dict_primitive[(3333, 10000)]`	88.9 µs	110.4 µs	-19.48%
❌	Simulation	`bench_compare_sliced_dict_primitive[(5000, 10000)]`	98.5 µs	119.2 µs	-17.33%
❌	Simulation	`bench_compare_sliced_dict_primitive[(7500, 10000)]`	136.4 µs	156.7 µs	-12.96%
❌	Simulation	`bench_compare_sliced_dict_primitive[(9999, 10000)]`	152.6 µs	173.3 µs	-11.94%
❌	Simulation	`bench_compare_sliced_dict_varbinview[(1000, 10000)]`	107.3 µs	128.4 µs	-16.41%
❌	Simulation	`bench_compare_sliced_dict_varbinview[(2000, 10000)]`	133.7 µs	154.4 µs	-13.42%
❌	Simulation	`bench_compare_sliced_dict_varbinview[(2500, 10000)]`	147.4 µs	168.3 µs	-12.44%
❌	Simulation	`bench_compare_sliced_dict_varbinview[(3333, 10000)]`	170.4 µs	191.4 µs	-10.98%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

_{Comparing aduffy/pluggable-arrow-exec (fba88e0) with develop (d7c22ba)}

joseph-isaacs · 2026-05-05T13:10:26Z

+    canonical_encoder: RwLock<Option<ArrowEncoderRef>>,
+    /// Fallback decoder used after the user chain has declined.
+    default_decoder: RwLock<Option<ArrowDecoderRef>>,
+    /// Fallback dtype reader used after the user chain has declined.
+    default_dtype_reader: RwLock<Option<ArrowDTypeReaderRef>>,


A RwLock still has contention if all accessors and readers, maybe an ArcSwap. I think we might want a wrapper for this pattern, not needed here however!

joseph-isaacs · 2026-05-05T13:12:19Z

+    fn to_arrow_array(
+        &self,
+        array: ArrayRef,
+        target: &DataType,


I think we want this to be an optional so if we don't mind we can return encoding specific values

Suggested change

target: &DataType,

target: Option<&DataType>,

Sorry if I missed a comment regarding this.

the idea was that you must pass in an explicit target physical type, whether that's derived from user or by calling preferred_arrow_type() first

joseph-isaacs · 2026-05-05T13:16:11Z

+/// Returning [`Ok(None)`] passes the request to the next reader in the chain.
+pub trait ArrowDTypeReader: 'static + Send + Sync + Debug {
+    /// Try to read a Vortex [`DType`] from an Arrow [`Field`].
+    ///
+    /// Implementations typically inspect [`Field::metadata`] for the `ARROW:extension:name`
+    /// key and dispatch on it.
+    fn try_read_dtype(&self, field: &Field) -> VortexResult<Option<DType>>;
+}


why have both fields and data_types? I guess nullability?

yea both is kinda funny. i think if you want to cover extension dtypes then you need the Field b/c that has metadata. and also nullability yea.

joseph-isaacs · 2026-05-05T13:16:37Z

+    fn preferred_arrow_type(
+        &self,
+        array: &ArrayRef,
+        session: &ArrowSession,
+    ) -> VortexResult<Option<DataType>>;


two passes might be expensive?

joseph-isaacs · 2026-05-05T13:17:31Z

we need to make sure this doens't regress any benchmarks

joseph-isaacs · 2026-05-05T13:19:26Z

+    pub fn register_encoder_for_extension(
+        &self,
+        key: impl Into<Id>,
+        plugin: impl Into<ArrowEncoderRef>,


I think we should make these ArrowEncoderRef I would think we end up using the same for for many ids so we don't want a different instance for each?

Using might may lead to that?

palaska · 2026-05-05T17:59:05Z

looks like 3 people started doing the same thing in parallel 😄 #7726

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y · 2026-05-05T20:20:20Z

It's a hot topic

pluggable arrow exec

8e4a261

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y added the do not merge Pull requests that are not intended to merge label May 5, 2026

joseph-isaacs reviewed May 5, 2026

View reviewed changes

a10y mentioned this pull request May 5, 2026

feat[vortex-array]: add ArrowExportPlugin registry for extension types #7796

Closed

switch to ArcSwap

fba88e0

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pluggable arrow exec#7793

pluggable arrow exec#7793
a10y wants to merge 2 commits intodevelopfrom
aduffy/pluggable-arrow-exec

a10y commented May 5, 2026

Uh oh!

codspeed-hq Bot commented May 5, 2026 •

edited

Loading

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

a10y May 5, 2026

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

a10y May 5, 2026

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

joseph-isaacs May 5, 2026

Uh oh!

palaska commented May 5, 2026

Uh oh!

a10y commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

a10y commented May 5, 2026

Summary

Testing

Uh oh!

codspeed-hq Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 21.4%

Performance Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palaska commented May 5, 2026

Uh oh!

a10y commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented May 5, 2026 •

edited

Loading