Skip to content

Commit

Permalink
Remove $ shell prompt from docs, related #418
Browse files Browse the repository at this point in the history
  • Loading branch information
James McKinney committed Jan 23, 2016
1 parent 2edf1b1 commit 0922802
Show file tree
Hide file tree
Showing 17 changed files with 69 additions and 69 deletions.
2 changes: 1 addition & 1 deletion docs/scripts/csvclean.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Examples

Test a file with known bad rows::

$ csvclean -n examples/bad.csv
csvclean -n examples/bad.csv

Line 3: Expected 3 columns, found 4 columns
Line 4: Expected 3 columns, found 2 columns
8 changes: 4 additions & 4 deletions docs/scripts/csvcut.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Examples

Print the indices and names of all columns::

$ csvcut -n examples/realdata/FY09_EDU_Recipients_by_State.csv
csvcut -n examples/realdata/FY09_EDU_Recipients_by_State.csv
1: State Name
2: State Abbreviate
3: Code
Expand All @@ -56,13 +56,13 @@ Print the indices and names of all columns::

Extract the first and third columns::

$ csvcut -c 1,3 examples/realdata/FY09_EDU_Recipients_by_State.csv
csvcut -c 1,3 examples/realdata/FY09_EDU_Recipients_by_State.csv

Extract columns named "TOTAL" and "State Name" (in that order)::

$ csvcut -c TOTAL,"State Name" examples/realdata/FY09_EDU_Recipients_by_State.csv
csvcut -c TOTAL,"State Name" examples/realdata/FY09_EDU_Recipients_by_State.csv

Add line numbers to a file, making no other changes::

$ csvcut -l examples/realdata/FY09_EDU_Recipients_by_State.csv
csvcut -l examples/realdata/FY09_EDU_Recipients_by_State.csv

4 changes: 2 additions & 2 deletions docs/scripts/csvformat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ Examples

Convert "standard" CSV file to a pipe-delimited one::

$ csvformat -D "|" examples/dummy.csv
csvformat -D "|" examples/dummy.csv

Convert to ridiculous line-endings::

$ csvformat -M "\r" examples/dummy.csv
csvformat -M "\r" examples/dummy.csv

4 changes: 2 additions & 2 deletions docs/scripts/csvgrep.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ Examples

Search for the row relating to Illinois::

$ csvgrep -c 1 -m ILLINOIS examples/realdata/FY09_EDU_Recipients_by_State.csv
csvgrep -c 1 -m ILLINOIS examples/realdata/FY09_EDU_Recipients_by_State.csv

Search for rows relating to states with names beginning with the letter "I"::

$ csvgrep -c 1 -r "^I" examples/realdata/FY09_EDU_Recipients_by_State.csv
csvgrep -c 1 -r "^I" examples/realdata/FY09_EDU_Recipients_by_State.csv

4 changes: 2 additions & 2 deletions docs/scripts/csvjson.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Examples

Convert veteran's education dataset to JSON keyed by state abbreviation::

$ csvjson -k "State Abbreviate" -i 4 examples/realdata/FY09_EDU_Recipients_by_State.csv
csvjson -k "State Abbreviate" -i 4 examples/realdata/FY09_EDU_Recipients_by_State.csv

Results in a JSON document like::

Expand All @@ -72,7 +72,7 @@ Results in a JSON document like::

Converting locations of public art into GeoJSON::

$ csvjson --lat latitude --lon longitude --k slug --crs EPSG:4269 -i 4 examples/test_geo.csv
csvjson --lat latitude --lon longitude --k slug --crs EPSG:4269 -i 4 examples/test_geo.csv

Results in a GeoJSON document like::

Expand Down
4 changes: 2 additions & 2 deletions docs/scripts/csvlook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ Examples

Basic use::

$ csvlook examples/testfixed_converted.csv
csvlook examples/testfixed_converted.csv

This utility is especially useful as a final operation when piping through other utilities::

$ csvcut -c 9,1 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvlook
csvcut -c 9,1 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvlook
4 changes: 2 additions & 2 deletions docs/scripts/csvpy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ Examples

Basic use::

$ csvpy examples/dummy.csv
csvpy examples/dummy.csv
Welcome! "examples/dummy.csv" has been loaded in a reader object named "reader".
>>> reader.next()
[u'a', u'b', u'c']

As a dictionary::

$ csvpy --dict examples/dummy.csv -v
csvpy --dict examples/dummy.csv -v
Welcome! "examples/dummy.csv" has been loaded in a DictReader object named "reader".
>>> reader.next()
{u'a': u'1', u'c': u'3', u'b': u'2'}
Expand Down
4 changes: 2 additions & 2 deletions docs/scripts/csvsort.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ Examples

Sort the veteran's education benefits table by the "TOTAL" column::

$ cat examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsort -c 9
cat examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsort -c 9

View the five states with the most individuals claiming veteran's education benefits::

$ cat examples/realdata/FY09_EDU_Recipients_by_State.csv | csvcut -c 1,9 | csvsort -r -c 2 | head -n 5
cat examples/realdata/FY09_EDU_Recipients_by_State.csv | csvcut -c 1,9 | csvsort -r -c 2 | head -n 5
14 changes: 7 additions & 7 deletions docs/scripts/csvsql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,22 +66,22 @@ Examples

Generate a statement in the PostgreSQL dialect::

$ csvsql -i postgresql examples/realdata/FY09_EDU_Recipients_by_State.csv
csvsql -i postgresql examples/realdata/FY09_EDU_Recipients_by_State.csv

Create a table and import data from the CSV directly into Postgres::

$ createdb test
$ csvsql --db postgresql:///test --table fy09 --insert examples/realdata/FY09_EDU_Recipients_by_State.csv
createdb test
csvsql --db postgresql:///test --table fy09 --insert examples/realdata/FY09_EDU_Recipients_by_State.csv

For large tables it may not be practical to process the entire table. One solution to this is to analyze a sample of the table. In this case it can be useful to turn off length limits and null checks with the ``no-constraints`` option::
$ head -n 20 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsql --no-constraints --table fy09
head -n 20 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsql --no-constraints --table fy09

Create tables for an entire folder of CSVs and import data from those files directly into Postgres::

$ createdb test
$ csvsql --db postgresql:///test --insert examples/*.csv
createdb test
csvsql --db postgresql:///test --insert examples/*.csv

You can also use CSVSQL to "directly" query one or more CSV files. Please note that this will create an in-memory SQL database, so it won't be very fast::

$ csvsql --query "select m.usda_id, avg(i.sepal_length) as mean_sepal_length from iris as i join irismeta as m on (i.species = m.species) group by m.species" examples/iris.csv examples/irismeta.csv
csvsql --query "select m.usda_id, avg(i.sepal_length) as mean_sepal_length from iris as i join irismeta as m on (i.species = m.species) group by m.species" examples/iris.csv examples/irismeta.csv
2 changes: 1 addition & 1 deletion docs/scripts/csvstack.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Examples

Contrived example: joining a set of homogoenous files for different years::

$ csvstack -g 2009,2010 examples/realdata/FY09_EDU_Recipients_by_State.csv examples/realdata/Datagov_FY10_EDU_recp_by_State.csv
csvstack -g 2009,2010 examples/realdata/FY09_EDU_Recipients_by_State.csv examples/realdata/Datagov_FY10_EDU_recp_by_State.csv
6 changes: 3 additions & 3 deletions docs/scripts/csvstat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,11 @@ Examples

Basic use::

$ csvstat examples/realdata/FY09_EDU_Recipients_by_State.csv
csvstat examples/realdata/FY09_EDU_Recipients_by_State.csv

When an statistic name is passed, only that stat will be printed::

$ csvstat --freq examples/realdata/FY09_EDU_Recipients_by_State.csv
csvstat --freq examples/realdata/FY09_EDU_Recipients_by_State.csv

1. State Name: None
2. State Abbreviate: None
Expand All @@ -66,7 +66,7 @@ When an statistic name is passed, only that stat will be printed::
If a single stat *and* a single column are requested, only a value will be returned::

$ csvstat -c 4 --freq examples/realdata/FY09_EDU_Recipients_by_State.csv
csvstat -c 4 --freq examples/realdata/FY09_EDU_Recipients_by_State.csv

3548.0

12 changes: 6 additions & 6 deletions docs/scripts/in2csv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Examples

Convert the 2000 census geo headers file from fixed-width to CSV and from latin-1 encoding to utf8::

$ in2csv -e iso-8859-1 -f fixed -s examples/realdata/census_2000/census2000_geo_schema.csv examples/realdata/census_2000/usgeo_excerpt.upl > usgeo.csv
in2csv -e iso-8859-1 -f fixed -s examples/realdata/census_2000/census2000_geo_schema.csv examples/realdata/census_2000/usgeo_excerpt.upl > usgeo.csv

.. note::

Expand All @@ -70,20 +70,20 @@ Convert the 2000 census geo headers file from fixed-width to CSV and from latin-

Convert an Excel .xls file::

$ in2csv examples/test.xls
in2csv examples/test.xls

Standardize the formatting of a CSV file (quoting, line endings, etc.)::

$ in2csv examples/realdata/FY09_EDU_Recipients_by_State.csv
in2csv examples/realdata/FY09_EDU_Recipients_by_State.csv

Fetch csvkit's open issues from the Github API, convert the JSON response into a CSV and write it to a file::

$ curl https://api.github.com/repos/onyxfish/csvkit/issues?state=open | in2csv -f json -v > issues.csv
curl https://api.github.com/repos/onyxfish/csvkit/issues?state=open | in2csv -f json -v > issues.csv
Convert a DBase DBF file to an equivalent CSV::

$ in2csv examples/testdbf.dbf > testdbf_converted.csv
in2csv examples/testdbf.dbf > testdbf_converted.csv

Fetch the ten most recent robberies in Oakland, convert the GeoJSON response into a CSV and write it to a file::

$ curl "http://oakland.crimespotting.org/crime-data?format=json&type=robbery&count=10" | in2csv -f geojson > robberies.csv
curl "http://oakland.crimespotting.org/crime-data?format=json&type=robbery&count=10" | in2csv -f geojson > robberies.csv

12 changes: 6 additions & 6 deletions docs/scripts/sql2csv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,16 @@ Examples

Load sample data into a table using :doc:`csvsql` and then query it using `sql2csv`::

$ csvsql --db "sqlite:///dummy.db" --table "test" --insert examples/dummy.csv
$ sql2csv --db "sqlite:///dummy.db" --query "select * from test"
csvsql --db "sqlite:///dummy.db" --table "test" --insert examples/dummy.csv
sql2csv --db "sqlite:///dummy.db" --query "select * from test"

Load data about financial aid recipients into Postgresql. Then find the three states that received the most, while also filtering out empty rows::

$ createdb recipients
$ csvsql --db "postgresql:///recipients" --table "fy09" --insert examples/realdata/FY09_EDU_Recipients_by_State.csv
$ sql2csv --db "postgresql:///recipients" --query "select * from fy09 where \"State Name\" != '' order by fy09.\"TOTAL\" limit 3"
createdb recipients
csvsql --db "postgresql:///recipients" --table "fy09" --insert examples/realdata/FY09_EDU_Recipients_by_State.csv
sql2csv --db "postgresql:///recipients" --query "select * from fy09 where \"State Name\" != '' order by fy09.\"TOTAL\" limit 3"

You can even use it as a simple SQL calculator (in this example an in-memory sqlite database is used as the default)::

$ sql2csv --query "select 300 * 47 % 14 * 27 + 7000"
sql2csv --query "select 300 * 47 % 14 * 27 + 7000"

12 changes: 6 additions & 6 deletions docs/tricks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Reading compressed CSVs

csvkit has builtin support for reading ``gzip`` or ``bz2`` compressed input files. This is automatically detected based on the file extension. For example::

$ csvstat examples/dummy.csv.gz
$ csvstat examples/dummy.csv.bz2
csvstat examples/dummy.csv.gz
csvstat examples/dummy.csv.bz2

Please note, the files are decompressed in memory, so this is a convenience, not an optimization.

Expand All @@ -17,11 +17,11 @@ Specifying STDIN as a file

Most tools default to ``STDIN`` if no filename is specified, but tools like :doc:`scripts/csvjoin` and :doc:`scripts/csvstack` accept multiple files, so this is not possible. To work around this it is also possible to specify ``STDIN`` by using ``-`` as a filename. For example, these three commands are functionally identical::

$ csvstat examples/dummy.csv
$ cat examples/dummy.csv | csvstat
$ cat examples/dummy.csv | csvstat -
csvstat examples/dummy.csv
cat examples/dummy.csv | csvstat
cat examples/dummy.csv | csvstat -

This specification allows you to, for instance, ``csvstack`` input on ``STDIN`` with another file::

$ cat ~/src/csvkit/examples/dummy.csv | csvstack ~/src/csvkit/examples/dummy3.csv -
cat ~/src/csvkit/examples/dummy.csv | csvstack ~/src/csvkit/examples/dummy3.csv -

6 changes: 3 additions & 3 deletions docs/tutorial/2_examining_the_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ In the previous section we saw how we could use ``csvlook`` and ``csvcut`` to pe

Let's examine summary statistics for some selected columns from our data (remember you can use ``csvcut -n data.csv`` to see the columns in the data)::

$ csvcut -c county,acquisition_cost,ship_date data.csv | csvstat
csvcut -c county,acquisition_cost,ship_date data.csv | csvstat
1. county
<type 'unicode'>
Nulls: False
Expand Down Expand Up @@ -69,7 +69,7 @@ csvgrep: find the data you need

After reviewing the summary statistics you might wonder what equipment was received by a particular county. To get a simple answer to the question we can use :doc:`/scripts/csvgrep` to search for the state's name amongst the rows. Let's also use ``csvcut`` to just look at the columns we care about and ``csvlook`` to format the output::

$ csvcut -c county,item_name,total_cost data.csv | csvgrep -c county -m LANCASTER | csvlook
csvcut -c county,item_name,total_cost data.csv | csvgrep -c county -m LANCASTER | csvlook
|------------+--------------------------------+-------------|
| county | item_name | total_cost |
|------------+--------------------------------+-------------|
Expand Down Expand Up @@ -100,7 +100,7 @@ csvsort: order matters

Now let's use :doc:`/scripts/csvsort` to sort the rows by the ``total_cost`` column, in reverse (descending) order::

$ csvcut -c county,item_name,total_cost data.csv | csvgrep -c county -m LANCASTER | csvsort -c total_cost -r | csvlook
csvcut -c county,item_name,total_cost data.csv | csvgrep -c county -m LANCASTER | csvsort -c total_cost -r | csvlook
|------------+--------------------------------+-------------|
| county | item_name | total_cost |
|------------+--------------------------------+-------------|
Expand Down
26 changes: 13 additions & 13 deletions docs/tutorial/3_power_tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ csvjoin: merging related data

One of the most common operations that we need to perform on data is "joining" it to other, related data. For instance, given a dataset about equipment supplied to counties in Nebraska, one might reasonably want to merge that with a dataset containing the population of each county. :doc:`/scripts/csvjoin` allows us to take two those two datasets (equipment and population) and merge them, much like you might do with a SQL ``JOIN`` query. In order to demonstrate this, let's grab a second dataset::

$ curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/acs2012_5yr_population.csv
curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/acs2012_5yr_population.csv

Now let's see what's in there::

$ csvstat acs2012_5yr_population.csv
csvstat acs2012_5yr_population.csv
1. fips
<type 'int'>
Nulls: False
Expand Down Expand Up @@ -58,11 +58,11 @@ Now let's see what's in there::

As you can see, this data file contains population estimates for each county in Nebraska from the 2012 5-year ACS estimates. This data was retrieved from `Census Reporter <http://censusreporter.org/>`_ and reformatted slightly for this example. Let's join it to our equipment data::

$ csvjoin -c fips data.csv acs2012_5yr_population.csv > joined.csv
csvjoin -c fips data.csv acs2012_5yr_population.csv > joined.csv

Since both files contain a fips column, we can use that to join the two. In our output you should see the population data appended at the end of each row of data. Let's combine this with what we've learned before to answer the question "What was the lowest population county to receive equipment?"::

$ csvcut -c county,item_name,total_population joined.csv | csvsort -c total_population | csvlook | head
csvcut -c county,item_name,total_population joined.csv | csvsort -c total_population | csvlook | head
|-------------+----------------------------------------------------------------+-------------------|
| county | item_name | total_population |
|-------------+----------------------------------------------------------------+-------------------|
Expand All @@ -81,19 +81,19 @@ csvstack: combining subsets

Frequently large datasets are distributed in many small files. At some point you will probably want to merge those files for aggregate analysis. :doc:`/scripts/csvstack` allows you to "stack" the rows from CSV files with identical headers. To demonstrate, let's imagine we've decided that Nebraska and Kansas form a "region" and that it would be useful to analyze them in a single dataset. Let's grab the Kansas data::

$ curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/ks_1033_data.csv
curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/ks_1033_data.csv

Back in :doc:`1_getting_started`, we had used in2csv to convert our Nebraska data from XLSX to CSV. However, we named our output `data.csv` for simplicity at the time. Now that we are going to be stacking multiple states, we should re-convert our Nebraska data using a file naming convention matching our Kansas data::

$ in2csv ne_1033_data.xlsx > ne_1033_data.csv
in2csv ne_1033_data.xlsx > ne_1033_data.csv

Now let's stack these two data files::

$ csvstack ne_1033_data.csv ks_1033_data.csv > region.csv
csvstack ne_1033_data.csv ks_1033_data.csv > region.csv

Using csvstat we cansee that our ``region.csv`` contains both datasets::

$ csvstat -c state,acquisition_cost region.csv
csvstat -c state,acquisition_cost region.csv
1. state
<type 'unicode'>
Nulls: False
Expand Down Expand Up @@ -126,7 +126,7 @@ Sometimes (almost always), the command line isn't enough. It would be crazy to t

By default, ``csvsql`` will generate a create table statement for your data. You can specify what sort of database you are using with the ``-i`` flag::

$ csvsql -i sqlite joined.csv
csvsql -i sqlite joined.csv
CREATE TABLE joined (
state VARCHAR(2) NOT NULL,
county VARCHAR(10) NOT NULL,
Expand All @@ -151,19 +151,19 @@ Here we have the sqlite "create table" statement for our joined data. You'll see

Often you won't care about storing the SQL statements locally. You can also use ``csvsql`` to create the table directly in the database on your local machine. If you add the ``--insert`` option the data will also be imported::

$ csvsql --db sqlite:///leso.db --insert joined.csv
csvsql --db sqlite:///leso.db --insert joined.csv

How can we check that our data was imported successfully? We could use the sqlite command line interface, but rather than worry about the specifics of another tool, we can also use ``sql2csv``::

$ sql2csv --db sqlite:///leso.db --query "select * from joined"
sql2csv --db sqlite:///leso.db --query "select * from joined"

Note that the ``--query`` parameter to ``sql2csv`` accepts any SQL query. For example, to export Douglas county from the ``joined`` table from our sqlite database, we would run::

$ sql2csv --db sqlite:///leso.db --query "select * from joined where county='DOUGLAS';" > douglas.csv
sql2csv --db sqlite:///leso.db --query "select * from joined where county='DOUGLAS';" > douglas.csv

Sometimes, if you will only be running a single query, even constructing the database is a waste of time. For that case, you can actually skip the database entirely and ``csvsql`` will create one in memory for you::

$ csvsql --query "select county,item_name from joined where quantity > 5;" joined.csv | csvlook
csvsql --query "select county,item_name from joined where quantity > 5;" joined.csv | csvlook

SQL queries directly on CSVs! Keep in mind when using this that you are loading the entire dataset into an in-memory database, so it is likely to be very slow for large datasets.

Expand Down

0 comments on commit 0922802

Please sign in to comment.