# Manipulating data
The commands you saw in the previous chapter allowed you to move things around in the filesystem. This chapter will show you how to work with the data in those files. The tools we’ll use are fairly simple, but are solid building blocks.

## How can I view a file's contents?
- Before you rename or delete files, you may want to view their contents
- A simple way to do that is with `cat`
    - `cat` prints the contents of files onto the screen


            cat argarwal.txt

            name: Agarwal, Jasmine
            position: RCT2
            start: 2017-04-01
            benefits: full

## How can I view a file's contents piece by piece?
- `cat` can be used to print large files and then scroll through the output
    - But it may be more convenient to **page** the output
- Use `less` to display one page at a time
    - You can press spacebar to page down
    - Or type `q` to quit
- If you give `less` the names of several files, you can
    - Use `:n` to move to next file
    - `:p` to go back to previous file
    - Or `:q` to quit 

## How can I look at the start of a file?
- When given a new dataset, the first thing most data scientists do is figure out what fields and what values the dataset contains
- If a dataset has been exported from a database or spreadsheet, it will often be stored as **comma-separate values** (CSV)
- `head` can be used to print the first few lines ("few" as in 10)

## How can I type less?
- One of the shell's power tool is **tab completion**
- If you start typing the name of a file and then press `TAB` key, the shell will do its best to auto-complete the path
- If the path is ambiguous, pressing `TAB` a second time will display a list of possibilities
- Typing another character or two to make your path more specific and then pressing `TAB` will fill in the rest of the name

## How can I control what commands do?
- You may not always want to look at the first 10 lines of a file
- Shell lets you change `head`'s behavior by giving it a **command-line flag**
- If you run,

        head -n 3 seasonal/summer.csv

    `head` will display only the first 3 lines

- A flag's name usually indicates it's purpose
    - E.g. `-n` means "**n**umber of lines"
- Command flags don't always have a `-` followed by a single letter
    - But it is widely-used convention

## How can I list everything below a directory?
- You can give `ls` the flag `-R` (which means "recursive")
    - This shows every file and directory in the current level, then everything in each sub-directory, and so on

## How can I get help for a command?
- To find out what commands do, use `man` (short for manual) in front of the command
    - `man` automatically invokes `less`
    - You may need to press spacebar to page through or `:q` to quit
- The one-line description under `NAME` tells you briefly what the command does
- The summary under `SYNOPSIS` lists all the flags it understands
    - Anything that is option is shown in square brackets `[...]`
    - Either/or alternatives are separated by `|`
    - Things that can be repeated are shown by `...`

## How can I select columns from a file?
- `head` and `tail` allow you to select rows from a text file
- If you want to select columns, you can use the command `cut`
    - It has several options (use `man cut` to explore)
- Most common use case is:

        cut -f 2-5,8 -d , values.csv

    which means "select columns 2 through 5 and columns 8, using comma as the separator

- `cut` uses `-f` (meaning "fields") to specify columns and `-d` (meaning "delimiter") to specify separator
    - You need to specify the latter because some files use spaces, tabs, or colons to separate columns

## What can't `cut` do?
- `cut` is a simple-minded command
- In particular, it doesn't understand quoted strings
- If for example, your file is:

        Name,Age
        "Johel,Ranjit",28
        "Sharma,Rupinder",26

    then,

        cut -f 2 -d , everyone.csv

    will produce:

        Age
        Ranjit"
        Rupinder"

## How can I repeat commands?
- Shell makes it easy for you to do things over again
- If you run some commands, you can then press the up-arrow key to cycle back through them
- You can also use the left and right arrow keys and the delete key to edit them
- Pressing return will then run the modified command
- Even better, `history` will print a list of command you have run recently
    - Just type `!55` to re-run the 55th command in your history (if you have that many)
    - You can also re-run a command by typing an exclamation mark followed by the command's name such as `!head` or `!cut`
        - This will re-run the most recent use of that command

## How can I select lines containing specific values?
- `grep` selects lines according to what they contain
- `grep` takes a piece of text followed by one or more filenames and prints all lines in those files that contain that text
- For example, `grep bicuspid seasonal/winter.csv` prints lines from `winter.csv` that contain "bicuspid"
- `grep` can search for patterns too 