Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
80 lines (63 sloc) 3.28 KB

csvstat

Description

Prints descriptive statistics for all columns in a CSV file. Will intelligently determine the type of each column and then print analysis relevant to that type (ranges for dates, mean and median for integers, etc.):

usage: csvstat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
               [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H]
               [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [-n]
               [-c COLUMNS] [--type] [--nulls] [--unique] [--min] [--max]
               [--sum] [--mean] [--median] [--stdev] [--len] [--freq]
               [--freq-count FREQ_COUNT] [--count] [-y SNIFF_LIMIT]
               [FILE]

Print descriptive statistics for each column in a CSV file.

positional arguments:
  FILE                  The CSV file to operate on. If omitted, will accept
                        input as piped data via STDIN.

optional arguments:
  -h, --help            show this help message and exit
  --csv                 Output results as a CSV, rather than text.
  -n, --names           Display column names and indices from the input CSV
                        and exit.
  -c COLUMNS, --columns COLUMNS
                        A comma separated list of column indices, names or
                        ranges to be examined, e.g. "1,id,3-5". Defaults to
                        all columns.
  --type                Only output data type.
  --nulls               Only output whether columns contains nulls.
  --unique              Only output counts of unique values.
  --min                 Only output smallest values.
  --max                 Only output largest values.
  --sum                 Only output sums.
  --mean                Only output means.
  --median              Only output medians.
  --stdev               Only output standard deviations.
  --len                 Only output the length of the longest values.
  --freq                Only output lists of frequent values.
  --freq-count FREQ_COUNT
                        The maximum number of frequent values to display.
  --count               Only output total row count.
  -y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT
                        Limit CSV dialect sniffing to the specified number of
                        bytes. Specify "0" to disable sniffing entirely.

See also: :doc:`../common_arguments`.

Examples

Basic use:

csvstat examples/realdata/FY09_EDU_Recipients_by_State.csv

When an statistic name is passed, only that stat will be printed:

csvstat --min examples/realdata/FY09_EDU_Recipients_by_State.csv

    1. State Name: None
    2. State Abbreviate: None
    3. Code: 1
    4. Montgomery GI Bill-Active Duty: 435
    5. Montgomery GI Bill- Selective Reserve: 48
    6. Dependents' Educational Assistance: 118
    7. Reserve Educational Assistance Program: 60
    8. Post-Vietnam Era Veteran's Educational Assistance Program: 1
    9. TOTAL: 768
   10. j: None

If a single stat and a single column are requested, only a value will be returned:

csvstat -c 4 --mean examples/realdata/FY09_EDU_Recipients_by_State.csv

6,263.904
You can’t perform that action at this time.