Skip to content

Interactive Python TUI for visualizing and analyzing files with multipe formats

License

Notifications You must be signed in to change notification settings

sanspareilsmyn/parqv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parqv

Python Version License PyPI version Built with Textual


Supported File Formats: βœ… Parquet | βœ… JSON / JSON Lines (ndjson) | (More planned!)


parqv is a Python-based interactive TUI (Text User Interface) tool designed to explore, analyze, and understand various data file formats directly within your terminal. Initially supporting Parquet and JSON, parqv aims to provide a unified, visual experience for quick data inspection without leaving your console.

πŸ’» Demo

parqv.gif (Demo shows Parquet features; UI adapts for other formats)

πŸ€” Why parqv?

  1. Unified Interface: Launch parqv <your_data_file> to access metadata, schema, data preview, and column statistics all within a single, navigable terminal window. No more juggling different commands for different file types.
  2. Interactive Exploration:
    • πŸ–±οΈ Keyboard & Mouse Driven: Navigate using familiar keys (arrows, hjkl, Tab) or even your mouse (thanks to Textual).
    • πŸ“œ Scrollable Views: Easily scroll through large schemas, data tables, or column lists.
    • 🌲 Clear Schema View: Understand column names, data types, and nullability at a glance. (Complex nested structures visualization might vary by format).
    • πŸ“Š Dynamic Stats: Select a column and instantly see its detailed statistics (counts, nulls, min/max, mean, distinct values, etc.).
  3. Cross-Format Consistency:
    • 🎨 Rich Display: Leverages rich and Textual for colorful, readable tables and text across supported formats.
    • πŸ“ˆ Quick Stats: Get key statistical insights consistently, regardless of the underlying file type.
    • πŸ”Œ Extensible: Designed with a handler interface to easily add support for more file formats in the future (like CSV, Arrow IPC, etc.).

✨ Features (TUI Mode)

  • Multi-Format Support: Currently supports Parquet (.parquet) and JSON/JSON Lines (.json, .ndjson). Run parqv <your_file.{parquet,json,ndjson}>.
  • Metadata Panel: Displays key file information (path, format, size, total rows, column count, etc.). Fields may vary slightly depending on the file format.
  • Schema Explorer:
    • Interactive list view of columns.
    • Clearly shows column names, data types, and nullability.
  • Data Table Viewer:
    • Scrollable table preview of the file's data.
    • Attempts to preserve data types for better representation.
  • Column Statistics Viewer:
    • Select a column in the Schema tab to view detailed statistics.
    • Shows counts (total, valid, null), percentages, and type-specific stats (min/max, mean, stddev, distinct counts, length stats, boolean value counts where applicable).
  • Row Group Inspector (Parquet Specific):
    • This panel only appears when viewing Parquet files.
    • Lists row groups with stats (row count, compressed/uncompressed size).
    • (Planned) Select a row group for more details.

πŸš€ Getting Started

1. Prerequisites:

  • Python: Version 3.10 or higher.
  • pip: The Python package installer.

2. Install parqv:

  • Open your terminal and run:
    pip install parqv
    (This will also install dependencies like textual, pyarrow, pandas, and duckdb)
  • Updating parqv:
    pip install --upgrade parqv

3. Run parqv:

  • Point parqv to your data file:
    #parquet
    parqv /path/to/your/data.parquet
    
    # json
    parqv /path/to/your/data.json
  • The interactive TUI will launch. Use your keyboard (and mouse, if supported by your terminal) to navigate:
    • Arrow Keys / j,k (in lists): Move selection up/down.
    • Tab / Shift+Tab: Cycle focus between the main tab content and potentially other areas. (Focus handling might evolve).
    • Enter (in column list): Select a column to view statistics.
    • View Switching: Use Ctrl+N (Next Tab) and Ctrl+P (Previous Tab) or click on the tabs (Metadata, Schema, Data Preview).
    • Scrolling: Use PageUp / PageDown / Home / End or arrow keys/mouse wheel within scrollable areas (like Schema stats or Data Preview).
    • q / Ctrl+C: Quit parqv.
    • (Help Screen ? is planned)

πŸ“„ License

Licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

About

Interactive Python TUI for visualizing and analyzing files with multipe formats

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published