Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

A library of utilities for working with CSV, the king of tabular file formats.

branch: master
README.textile

csvkit

A library of utilities for working with CSV, the king of tabular file formats.

Inspired by pdftk and the original csvcut utility (by Joe Germuska and Aaron Bycoffe).

THIS CODE IS VERY ALPHA

Principles

  • Produce compatibility-oriented CSVs. This means that quoting is done with double-quotes and only when necessary, columns are separated with commas, and lines are terminated with unix style line endings (“\n”).
  • Write to STDOUT in order to support piping. As an example, the following would display the top 10 rows, columns 1 and 3 only, of a Excel xls file that has been converted to csv:
in2csv data.xls | csvcut -c 1,3 | head 10
  • Read and write rectangular data. Columns without headers are ignored.
  • Standardize, standardize, standardize. Floats should end with “.0”, even if they are round. Dates and times should be in ISO8601 format. Etc.

Installation

For developers:

git clone git://github.com/onyxfish/csvkit.git
cd csvkit
mkvirtualenv --no-site-packages csvkit
pip install -r requirements.txt
nosetests

For users (until first PyPI build is ready):

git://github.com/onyxfish/csvkit.git
cd csvkit
python setup.py install

Usage: in2csv

Converts various formats into csv. Currently supported: fixed-width and xls (Excel).

Converting fixed width requires that you provide a schema file with the “-s” option. The schema file should have the following format:

column,start,length
name,0,30
birthday,30,10
age,40,3

The header line is required and must match exactly.

All output from in2csv is written to standard out.

Convert a fixed-width file:

in2csv -s examples/testfixed_schema.csv examples/testfixed

Convert an xls file:

in2csv examples/test.xls

Usage: csvcut

Filters and truncates CSV files. Like unix “cut” command, but for tabular data.

Note that csvcut does not include row slicing or filtering, for this you should pipe data to head, tail, or grep.

Print the indices and names of all columns:

csvcut -n examples/testfixed_converted.csv

Extract the first and third columns:

csvcut -c 1,3 examples/testfixed_converted.csv

Extract columns named “integer” and “date”:

csvcut -c integer,date examples/testfixed_converted.csv

Show the first ten values in column 1:

csvcut -c 1 -s examples/testfixed_converted.csv | head -n 10

Show the last value in column 6:

csvcut -c 6 -s examples/testfixed_converted.csv | tail -n 1

Show unique values in the fourth column:

csvcut -c 4 -s examples/testfixed_converted.csv | uniq

Search for rows about Chicago:

csvcut -s examples/testfixed_converted.csv | grep -i chicago

Add line-numbers to the csvcut output, then find the Chicago Tribune:

csvcut -l -s examples/testfixed_converted.csv | grep -i tribune

Usage: csvsql

Generates a SQL “CREATE TABLE” statement for a given CSV file. Supports a variety of SQL dialects (execute “csvsql -h” for a complete list).

Generate a statement in the postgresql dialect:

csvsql -i postgresql examples/testfixed_converted.csv

Usage: csvclean

Cleans a CSV file of common syntax errors. Outputs [basename]out.csv and [basename]err.csv, the former containing all valid rows and the latter containing all error rows along with line numbers and descriptions.

Test a file with known bad rows:

csvclean examples/bad.csv

Usage: csvsummary

Prints descriptive statistics for all columns in a CSV file:

csvsummary examples/testfixed_converted.csv

Usage: csvjoin

Merges two or more CSV tables together using a method analogous to SQL JOIN operation. By default it performs an inner join, but full outer, left outer, and right outer are all available via flags. Key columns are specified with the -c flag (either a single column which exists in all tables, or a comma-seperated list of columns with one corresponding to each). If the columns flag is not provided then the tables will be merged “sequentially”, that is they will be attached side-by-side with no concern for ordering or matching.

Usage: csvstack

Stack up the rows from multiple CSV files, optionally adding a grouping value. Each CSV file must have the same number of columns and they must have the same column names.

Contrived example: joining a set of homogoenous files for different years:

csvstack -g 1984,1985 -n year examples/testfixed_converted.csv examples/testfixed_converted.csv

Usage: csvlook

Renders a CSV to the command line in a readable, fixed-width format. If the table is too long, try piping the output to head, tail, more or less.

csvlook examples/testfixed_converted.csv

This is especially useful when piping through other utilities:

csvcut -c 2,3 -l examples/testfixed_converted.csv | csvlook

Authors

Christopher Groskopf
Joe Germuska
Aaron Bycoffe

License

MIT

Something went wrong with that request. Please try again.