Skip to content

v0.3.0

Choose a tag to compare

@github-actions github-actions released this 14 Mar 17:15
· 56 commits to main since this release

ExArrow 0.3.0 — Release Notes

Released: 2026-03-10
Hex: https://hex.pm/packages/ex_arrow/0.3.0
Docs: https://hexdocs.pm/ex_arrow/0.3.0
Changelog: https://github.com/thanos/ex_arrow/blob/main/CHANGELOG.md#030---2026-03-10


What's new

Arrow compute kernels (ExArrow.Compute)

Three native operations on RecordBatch values — entirely in Rust Arrow
memory, no BEAM-side data copy:

Function Description
filter/2 Keep rows where the first column of a boolean batch is true
project/2 Select and reorder columns by name
sort/3 Sort a batch by a named column, ascending or descending

Results are new ExArrow.RecordBatch handles that compose with IPC writers,
Flight do_put, further compute calls, or the Parquet writer.

{:ok, filtered} = ExArrow.Compute.filter(batch, mask_batch)
{:ok, projected} = ExArrow.Compute.project(filtered, ["id", "score"])
{:ok, sorted} = ExArrow.Compute.sort(projected, "score", direction: :desc)

Backed by the arrow-select and arrow-ord Rust crates (both part of the
arrow-rs 56 release family, so no extra dependency resolution).


Parquet support (ExArrow.Parquet.Reader / ExArrow.Parquet.Writer)

Read and write the Apache Parquet format using the parquet Rust crate.
Both file paths and in-memory binaries are supported on both sides.

# Read
{:ok, stream}   = ExArrow.Parquet.Reader.from_file("/data/events.parquet")
{:ok, schema}   = ExArrow.Stream.schema(stream)
batches         = ExArrow.Stream.to_list(stream)

# Write
:ok = ExArrow.Parquet.Writer.to_file("/out/result.parquet", schema, batches)

# In-memory round-trip
{:ok, bytes}  = ExArrow.Parquet.Writer.to_binary(schema, batches)
{:ok, stream2} = ExArrow.Parquet.Reader.from_binary(bytes)

Parquet streams use the same ExArrow.Stream interface as IPC and ADBC
streams — schema/1, next/1, and to_list/1 work identically across all
three backends. No new consumption patterns to learn.


Explorer bridge (ExArrow.Explorer)

One-call conversion between ExArrow.Stream / ExArrow.RecordBatch and
Explorer.DataFrame:

# ExArrow → Explorer
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/events.arrow")
{:ok, df}     = ExArrow.Explorer.from_stream(stream)

# Explorer → ExArrow (e.g. write to Parquet or send via Flight)
df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
{:ok, stream} = ExArrow.Explorer.to_stream(df)

All four bridge functions: from_stream/1, from_record_batch/1,
to_stream/1, to_record_batches/1.

Opt in by adding {:explorer, "~> 0.8"} to your mix.exs. When Explorer is
absent every function returns {:error, "Explorer is not available…"} rather
than failing to compile.


Nx bridge (ExArrow.Nx)

Zero-copy column extraction to Nx.Tensor by sharing the raw byte buffer:

{:ok, stream}  = ExArrow.Parquet.Reader.from_file("/data/features.parquet")
batch          = ExArrow.Stream.next(stream)
{:ok, tensor}  = ExArrow.Nx.column_to_tensor(batch, "price")

# Or extract all numeric columns at once
{:ok, tensors} = ExArrow.Nx.to_tensors(batch)
# %{"price" => #Nx.Tensor<f64[1000]>, "qty" => #Nx.Tensor<s32[1000]>, …}

Reverse path: from_tensor/2 writes an Nx.Tensor into a
single-column RecordBatch without going through an Elixir list.

Supported Arrow types: Int8/16/32/64, UInt8/16/32/64, Float32/64.
Non-numeric columns return {:error, …} from column_to_tensor/2 and are
silently skipped by to_tensors/1.

Opt in by adding {:nx, "~> 0.9"} to your mix.exs.


New public API surface

Module New functions
ExArrow.Compute filter/2, project/2, sort/3
ExArrow.Parquet.Reader from_file/1, from_binary/1
ExArrow.Parquet.Writer to_file/3, to_binary/2
ExArrow.Explorer from_stream/1, from_record_batch/1, to_stream/1, to_record_batches/1
ExArrow.Nx column_to_tensor/2, to_tensors/1, from_tensor/2

Optional dependencies

Add to mix.exs Unlocks
{:explorer, "~> 0.8"} ExArrow.Explorer bridge
{:nx, "~> 0.9"} ExArrow.Nx bridge

Both are optional. ExArrow compiles and works without them; the bridge modules
gracefully degrade to {:error, "… is not available…"} at runtime.


Upgrade guide from 0.2.0

# mix.exs — bump the version pin
{:ex_arrow, "~> 0.3.0"}

# Optional: add if you use the new bridges
{:explorer, "~> 0.8", optional: true}
{:nx, "~> 0.9", optional: true}

No breaking changes to existing IPC, Flight, or ADBC APIs. All existing calls
continue to work without modification.


Fixes

  • Dialyzer call_without_opaque errors in ExArrow.Explorer and ExArrow.Nx
    — function heads no longer pattern-match on the concrete struct behind
    @opaque types; resource extraction uses the dedicated resource_ref/1
    helpers instead.
  • Credo warnings in new test files (alias ordering, pipe chains, length/1
    vs != []).

What's next (v0.4.0)

v0.4.0 is already released and ships:

  • Arrow C Data Interface (ExArrow.CDI) — zero-copy batch transfer via
    raw C-struct pointers; foundation for a future zero-copy Explorer bridge.
  • ExArrow.Nx.from_tensors/1 — multi-column RecordBatch from a tensor
    map in one call.
  • Lazy Parquet streaming — row groups decoded on demand via
    Stream.next/1; peak memory proportional to the largest row group, not
    the full file.