Skip to content

Releases: Vitruves/nail-parquet

nail-parquet v1.8.0

30 May 13:04

Choose a tag to compare

nail 1.8.0

This release teaches nail to display nested and complex column types properly. Previously head/tail/preview printed raw Arrow type names (ListArray, StructArray, PrimitiveArray<Int8>, timestamp) for anything beyond a handful of scalar types; now they render real values, and nested List/Struct/Map columns expand into a readable tree.

Added

  • -L/--level — expand nested values. head, tail, and preview now expand List/Struct/Map columns instead of printing ListArray/StructArray. The depth budget controls how far they expand: -L 0 collapses each container to a compact tag ({…2 fields}, […3 items]), the default -L 1 shows one level, and higher values go deeper. Cards render an indented tree laid out in the value column (single-element lists drop the - bullet for less noise); --table keeps the compact single-line form; and -f json emits valid, properly-typed nested JSON. Binary blobs always render as <N bytes> regardless of depth, so image/embedding columns are safe to inspect at any level.

Fixed

  • Many column types showed their Arrow type name instead of a value. head/tail/preview now render real values for Int8/Int16/UInt8/UInt16/UInt32, Timestamp (including timezone-aware), Time32/Time64, Decimal128, Dictionary, LargeUtf8, Binary/LargeBinary, FixedSizeList, and other types that used to print PrimitiveArray<…>, timestamp, ListArray, etc. Leaf values now go through Arrow's formatter, so the set of correctly-displayed types matches Arrow's.
  • nail head -f json produced invalid JSON for some strings. Strings containing backslashes or control characters (e.g. LaTeX \angle, embedded newlines) are now properly escaped, and nested columns serialize to real JSON objects/arrays.
  • --table rows could be broken by nested values. Embedded newlines in nested string leaves are now flattened so a cell can never spill past the grid borders.

Changed

  • --table rendering optimized for wide tables. Cell content is capped at 30 characters with truncation, and columns paginate into multiple bordered blocks that each fit the terminal width. A # row-index column repeats on every block so rows stay aligned across pages, a cols A–B of N (page X/Y) header annotates each subsequent block, and column colors stay consistent across blocks.
  • nail correlations --matrix redesigned. Output is now a proper bordered matrix with headers derived from the row labels (so names containing dots/underscores like __index_level_0__ are no longer mangled), per-column auto-sizing, and color-coded cells (green for positive, red for negative, intensity by magnitude, bold white for the 1.0 diagonal).

Example

$ nail head data.parquet -n 1 -L 2
┌──────────────────────────────────────── Record 1 ────────────────────────────────────────
│
│ data_source          : synthetic_python
│ prompt               : content : Write a Python function `get_even_numbers(n)` that returns
│                      :           all even numbers from 1 to `n`.
│                      : role    : user
│ reward_model         : ground_truth : [2, 4, 6, 8, 10]
│                      : style        : rule
│ extra_info           : index     : 0
│                      : split     : train
│                      : synthetic : true
│ images               : - bytes : <10566 bytes>
│                      :   path  : img_diagram.png
└────────────────────────────────────────────────────────────────────────────────────────────

Upgrading

cargo install nail-parquet

Pre-built binaries (linux-musl x86_64/aarch64, macOS x86_64/aarch64, windows-msvc) with SHA256SUMS.txt are attached to this release.

Full Changelog: v1.7.1...v1.8.0

nail-parquet v1.7.1

26 May 17:34

Choose a tag to compare

[1.7.1] - 2026-05-21

Fixed

  • Linux release binaries no longer require glibc ≥ 2.38 (#6).
    Releases built on ubuntu-latest (24.04, glibc 2.39) failed to start on Ubuntu 22.04, Debian 11, RHEL 8, etc. (version 'GLIBC_2.38' not found).
    Linux artifacts are now statically linked against musl — no glibc dependency.
  • Windows stack overflow in many subcommands
    (STATUS_STACK_OVERFLOW, 0xC00000FD). The binary now reserves an 8 MiB stack on Windows targets via .cargo/config.toml, matching Linux/macOS.
  • Clippy lints under Rust 1.95: too_many_arguments, collapsible_match, nonminimal_bool, needless_range_loop.
  • Source formatting drift (cargo fmt) and pinned style via committedrustfmt.toml (hard_tabs = true).

Added

  • nail create math functions — column expressions now support a curated set of nail-flavored functions:
    • Scalar (per-row): abs, sign, floor, ceil, round, trunc, sqrt, cbrt, exp, ln, log, log10, log2, pow, sin, cos, tan, asin, acos, atan, atan2.
    • Aggregate (broadcast to every row, auto-wrapped as OVER ()): mean, sum, min, max, count, median, std/stddev, var/variance, stddev_pop, var_pop.
    • Example: -c "z=(value-mean(value))/std(value)".
    • Quote column names that collide with a function (mean("mean")) or that contain operator characters ("revenue-income"-cost).
  • -c is now repeatable on nail create. Pass multiple -c flags or comma-separate specs within one flag; commas inside pow(a, b) / round(x, 2) are correctly preserved.
  • nail clean — one-shot import cleanup. Snake_case headers, trim string whitespace, drop fully-empty rows by default; --drop-empty-cols opt-in. --keep-headers, --keep-whitespace, --keep-empty-rows to opt out.
  • Stdin/stdout via - on every command. Read with cat data.csv | nail head -, write with nail filter ... -o -. Format auto-detected from --format or by sniffing input bytes (Parquet magic, JSON brace, CSV otherwise). Excel is the only format that still requires a file path.
  • "Did you mean …?" suggestions on column-not-found errors. Powered by Levenshtein distance with length-scaled tolerance.
  • Examples in every --help. Each subcommand now ends its help with 2-3 concrete invocations.
  • Multi-platform release workflow (.github/workflows/release.yml) producing artifacts for:
    • x86_64-unknown-linux-musl (static)
    • aarch64-unknown-linux-musl (static, via cross)
    • x86_64-apple-darwin, aarch64-apple-darwin
    • x86_64-pc-windows-msvc
      Each release ships a SHA256SUMS.txt.

Full Changelog: v1.7.0...v1.7.1

nail-parquet v1.7.0

22 Apr 11:07

Choose a tag to compare

nail-parquet v1.7.0 — 2026-04-22

Major release: richer output, more expressive selectors, shell completions,
and a lighter dependency footprint.

New

  • --table (global): colored columnar output for any command.
  • --batch-size <N> (global): tune DataFusion's rows-per-record-batch.
  • completions <shell>: generate scripts for bash/zsh/fish/powershell/elvish.
    Supports --auto-install (standard user location) and --location <PATH>.
  • filter: OR via |, AND via , (AND binds tighter).
  • select / drop: --type <numeric|integer|float|string|boolean|temporal|binary>,
    combinable with --columns (pattern AND type).
  • search: multi-value OR via | in --value.
  • frequency: --head <N>, --tail <N>, and % column in table/file output.
  • preview: --rows <spec> (e.g. 1,3,5-10), works in interactive mode too.

Fixed

  • Float formatting trims trailing zeros, keeps one decimal (40.00040.0).
  • Better border visibility on dark terminal themes.

Internals

  • Streaming Parquet writes (ArrowWriter + SNAPPY); no full in-memory materialization.
  • Predicate pushdown enabled on reads; default batch sizes 32,768 / 8,192 (with --jobs).
  • reqwestureq for update checks; DataFusion built with minimal features.
  • Added rayon and futures; new GitHub Actions CI; README trimmed.

Upgrade

cargo install nail-parquet

Binaries: https://github.com/Vitruves/nail-parquet/releases

No breaking CLI changes. CommonArgs gained batch_size and table fields
(library embedders should update struct initializers).

nail-parquet v1.6.5

10 Aug 14:37

Choose a tag to compare

Release Notes - Version 1.6.5

New Features & Improvements 🎉

• New sort command: Added comprehensive sorting functionality with support for multiple date types and sorting strategies
• Kendall Tau correlation: Proper implementation replaces the previous approximation derived from Spearman correlation
• Matrix output for correlations: The --matrix command now includes formatted table output for improved readability
• Bug fixes and stability improvements: Various fixes to enhance overall performance and reliability

Looking for Your Input 🤝

As nail-parquet continues to mature with its extensive collection of subcommands and customization options, we're actively seeking fresh ideas from our community. Development has naturally slowed in recent weeks as we've addressed many of the most requested features.

We'd love to hear from you!

Whether you have suggestions for new functionality, improvements to existing features, or ideas we haven't considered, please don't hesitate to:

• Contact us directly
• Open an issue with your improvement suggestions on our repository

Your feedback helps shape the future of nail-parquet and ensures it continues to meet your data processing needs.

Happy data nailing!

v1.6.4

17 Jul 12:56

Choose a tag to compare

nail-parquet v1.6.4 - First Release

A fast, powerful command-line utility for working with Parquet files and other data formats, built with Rust and Apache DataFusion.

Features

Core Functionality:

  • 32+ Commands for comprehensive data manipulation and analysis
  • Multi-format Support: Parquet, CSV, JSON, Excel (read/write)
  • High Performance: Built on Apache DataFusion and Arrow for columnar processing
  • Interactive Mode: Terminal UI with ratatui for data preview and exploration

Key Commands:

  • head, tail, preview - Data inspection
  • stats, correlations, frequency - Statistical analysis
  • filter, select, drop - Data manipulation
  • merge, append, split - Data combination
  • convert, optimize - Format conversion and optimization
  • outliers, dedup - Data quality operations
  • sample, shuffle, binning - Data sampling and analysis

Advanced Features:

  • Parallel Processing: Configurable with --jobs flag
  • Verbose Logging: Debug output with --verbose flag
  • Multiple Output Formats: JSON, CSV, Parquet, formatted text
  • Smart Column Resolution: Case-insensitive column name matching
  • Statistical Operations: Correlation analysis, outlier detection, frequency analysis