Shuffle is a command line tool for working with and transforming delimiter-separated values (DSV) files, such as CSV (RFC 4180), tab-delimited TXT, and so on.
(Shuffle can correctly guess most common delimiters. The file above is pipe-delimited.)
A full list of features can be listed by typing
shuffle in the terminal:
- Pretty printing
- Joining, merging, and reordering/subsetting
- Calculating statistics and frequency counts for each column
- Converting files to JSON and SQLite databases (with type-casting!)
Because life is short and RAM isn't cheap. Unlike many other tools, Shuffle is designed for speed and memory efficiency and can take advantage of multi-core processors.
For example, on my machine it takes Python's
pandas about 32 seconds to convert a 150MB comma-separated TXT file to a SQLite3 database, compared to 12 seconds for Shuffle. It also does this with at most 50MB of memory, whereas
pandas eats your RAM for breakfast, lunch, and dinner.
Similarly, it takes Shuffle just under 3 seconds to generate summary statistics and frequency counts for this 80MB CSV. On the other hand,
CSVKit--a popular Python package, still hasn't finished running even after 4 minutes.