Skip to content

Commit

Permalink
Merge pull request #418 from cathydeng/master
Browse files Browse the repository at this point in the history
remove $ for more beginner friendly docs
  • Loading branch information
James McKinney committed Jan 23, 2016
2 parents 345a7c2 + ca6f6b7 commit 2edf1b1
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions docs/tutorial/1_getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Installing csvkit

Installing csvkit is easy::

$ sudo pip install csvkit
sudo pip install csvkit

If you have problems installing, check out the common issues described in the :doc:`../install` section of the full documentation.

Expand All @@ -29,23 +29,23 @@ Getting the data

Let's start by creating a clean workspace::

$ mkdir csvkit_tutorial
$ cd csvkit_tutorial
mkdir csvkit_tutorial
cd csvkit_tutorial

Now let's fetch the data::

$ curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/ne_1033_data.xlsx
curl -L -O https://github.com/onyxfish/csvkit/raw/master/examples/realdata/ne_1033_data.xlsx

in2csv: the Excel killer
========================

For purposes of this tutorial, I've converted this data to Excel format. (NPR published it in CSV format.) If you have Excel you can open the file and take a look at it, but really, who wants to wait for Excel to load? Instead, let's make it a CSV::

$ in2csv ne_1033_data.xlsx
in2csv ne_1033_data.xlsx

You should see a CSV version of the data dumped into your terminal. All csvkit utilities write to the terminal output ("standard out") by default. This isn't very useful, so let's write it to a file instead::

$ in2csv ne_1033_data.xlsx > data.csv
in2csv ne_1033_data.xlsx > data.csv

``data.csv`` will now contain a CSV version of our original file. If you aren't familiar with the ``>`` syntax, it literally means "redirect standard out to a file", but it may be more convenient to think of it as "save".

Expand All @@ -56,7 +56,7 @@ csvlook: data periscope

Now that we have some data, we probably want to get some idea of what's in it. We could open it in Excel or Google Docs, but wouldn't it be nice if we could just take a look in the command line? Enter csvlook::

$ csvlook data.csv
csvlook data.csv

Now at first the output of :doc:`/scripts/csvlook` isn't going to appear very promising. You'll see a mess of data, pipe character and dashes. That's because this dataset has many columns and they won't all fit in the terminal at once. To fix this we need to learn how to reduce our dataset before we look at it.

Expand All @@ -65,7 +65,7 @@ csvcut: data scalpel

:doc:`/scripts/csvcut` is the original csvkit tool, the one that started the whole thing. With it, we can slice, delete and reorder the columns in our CSV. First, let's just see what columns are in our data::

$ csvcut -n data.csv
csvcut -n data.csv
1: state
2: county
3: fips
Expand All @@ -83,28 +83,28 @@ csvcut: data scalpel

As you'll can see, our dataset has fourteen columns. Let's take a look at just columns ``2``, ``5`` and ``6``::

$ csvcut -c 2,5,6 data.csv
csvcut -c 2,5,6 data.csv

Now we've reduced our output CSV to only three columns.

We can also refer to columns by their names to make our lives easier::

$ csvcut -c county,item_name,quantity data.csv
csvcut -c county,item_name,quantity data.csv

Putting it together with pipes
==============================

Now that we understand ``in2csv``, ``csvlook`` and ``csvcut`` we can demonstrate the power of csvkit's when combined with the standard command line "pipe". Try this command::

$ csvcut -c county,item_name,quantity data.csv | csvlook | head
csvcut -c county,item_name,quantity data.csv | csvlook | head

All csvkit utilities accept an input file as "standard in", in addition to as a filename. This means that we can make the output of one csvkit utility become the input of the next. In this case, the output of ``csvcut`` becomes the input to ``csvlook``. This also means we can use this output with standard unix commands such as ``head``, which prints only the first ten lines of it's input. Here, the output of ``csvlook`` becomes the input of ``head``.

Pipeability is a core feature of csvkit. Of course, you can always write your output to a file using ``>``, but many times it makes more sense to use pipes for speed and brevity.

Of course, we can also pipe ``in2csv``, combining all our previous operations into one::

$ in2csv ne_1033_data.xlsx | csvcut -c county,item_name,quantity | csvlook | head
in2csv ne_1033_data.xlsx | csvcut -c county,item_name,quantity | csvlook | head

Summing up
==========
Expand Down

0 comments on commit 2edf1b1

Please sign in to comment.