A library of utilities for working with CSV, the king of tabular file formats.
Inspired by pdftk and the original csvcut utility (by Joe Germuska and Aaron Bycoffe).
THIS CODE IS VERY ALPHA
- Produce compatibility-oriented CSVs. This means that quoting is done with double-quotes and only when necessary, columns are separated with commas, and lines are terminated with unix style line endings (“\n”).
- Write to STDOUT in order to support piping. As an example, the following would display the top 10 rows, columns 1 and 3 only, of a Excel xls file that has been converted to csv:
in2csv data.xls | csvcut -c 1,3 | head 10
- Read and write rectangular data. Columns without headers are ignored.
- Standardize, standardize, standardize. Floats should end with “.0”, even if they are round. Dates and times should be in ISO8601 format. Etc.
git clone git://github.com/onyxfish/csvkit.git cd csvkit mkvirtualenv --no-site-packages csvkit pip install -r requirements.txt nosetests
For users (until first PyPI build is ready):
git://github.com/onyxfish/csvkit.git cd csvkit python setup.py install
Converts various formats into csv. Currently supported: fixed-width and xls (Excel).
Converting fixed width requires that you provide a schema file with the “-s” option. The schema file should have the following format:
column,start,length name,0,30 birthday,30,10 age,40,3
The header line is required and must match exactly.
All output from in2csv is written to standard out.
Convert a fixed-width file:
in2csv -s examples/testfixed_schema.csv examples/testfixed
Convert an xls file:
Filters and truncates CSV files. Like unix “cut” command, but for tabular data.
Note that csvcut does not include row slicing or filtering, for this you should pipe data to head, tail, or grep.
Print the indices and names of all columns:
csvcut -n examples/testfixed_converted.csv
Extract the first and third columns:
csvcut -c 1,3 examples/testfixed_converted.csv
Extract columns named “integer” and “date”:
csvcut -c integer,date examples/testfixed_converted.csv
Show the first ten values in column 1:
csvcut -c 1 -s examples/testfixed_converted.csv | head -n 10
Show the last value in column 6:
csvcut -c 6 -s examples/testfixed_converted.csv | tail -n 1
Show unique values in the fourth column:
csvcut -c 4 -s examples/testfixed_converted.csv | uniq
Search for rows about Chicago:
csvcut -s examples/testfixed_converted.csv | grep -i chicago
Add line-numbers to the csvcut output, then find the Chicago Tribune:
csvcut -l -s examples/testfixed_converted.csv | grep -i tribune
Generates a SQL “CREATE TABLE” statement for a given CSV file. Supports a variety of SQL dialects (execute “csvsql -h” for a complete list).
Generate a statement in the postgresql dialect:
csvsql -i postgresql examples/testfixed_converted.csv
Cleans a CSV file of common syntax errors. Outputs [basename]out.csv and [basename]err.csv, the former containing all valid rows and the latter containing all error rows along with line numbers and descriptions.
Test a file with known bad rows:
Prints descriptive statistics for all columns in a CSV file:
Merges two or more CSV tables together using a method analogous to SQL JOIN operation. By default it performs an inner join, but full outer, left outer, and right outer are all available via flags. Key columns are specified with the -c flag (either a single column which exists in all tables, or a comma-seperated list of columns with one corresponding to each). If the columns flag is not provided then the tables will be merged “sequentially”, that is they will be attached side-by-side with no concern for ordering or matching.
Stack up the rows from multiple CSV files, optionally adding a grouping value. Each CSV file must have the same number of columns and they must have the same column names.
Contrived example: joining a set of homogoenous files for different years:
csvstack -g 1984,1985 -n year examples/testfixed_converted.csv examples/testfixed_converted.csv
Renders a CSV to the command line in a readable, fixed-width format. If the table is too long, try piping the output to head, tail, more or less.
This is especially useful when piping through other utilities:
csvcut -c 2,3 -l examples/testfixed_converted.csv | csvlook