# Batch Processing
Most shell commands will process many files at once. This chapter shows you how to make your own pipelines do that. Along the way, you will see how the shell uses variables to store information.

## How does the shell store information?
- Shell stores information in variables
    - Some of these, called **environment variables**, are available all the time
- Environment variable names are conventionally written in upper case
- A few commonly-used ones:

![Environment Variables](imgs/env_variables.png)

- To get a complete list, type `set` in the shell

## How can I print a variable's value?
- `echo` prints its arguments
- If you try to use it to print a variable's value like,

        echo USER
    
    it will print the variable's name `USER`
- To get the variable's value, you must put a dollar sign `$` in front of it

        echo $USER

- This is so the shell can tell whether you mean "a file named X" or "the value of a variable named X"

## How else does the shell store information?
- A **shell variable** is another kind of variable
    - This is like a local variable in a programming language
- To create a shell variable, you simply assign a value to a name:

        training=seasonal/summer.csv

    *without* any spaces before or after the `=` sign
- You can then check its value with `echo $training`

## How can I repeat a command many times?
- Shell variables are also used in **loops**
- If you run,

        for filetype in gif jpg png; do echo $filetype; done

    it produces

        gif
        jpg
        png

- Notice these things about a loop:
    1. The structure is `for` …variable… `in` …list… `; do` …body… `; done`
    2. The list of things the loop is to process (in our case, the words `gif`, `jpg`, and `png`).
    3. The variable that keeps track of which thing the loop is currently processing (in our case, `filetype`).
    4. The body of the loop that does the processing (in our case, `echo $filetype`).

## How can I repeat a command once for each file?
- You can always type in the names of files you want to process when writing a loop
- But it's usually better to use **wildcards**
- For example,

        for filename in seasonal/*.csv; do echo $filename; done

    can print,

        seasonal/autumn.csv
        seasonal/spring.csv
        seasonal/summer.csv
        seasonal/winter.csv

## How can I record the names of a set of files?
- People often set a variable using a wildcard expression to record a list of filenames
- For example,

        datasets=seasonal/*.csv

    you can display the filenames later using,

        for filename in $datasets; do echo $filename; done

## A variable's name versus its value
- A common mistake is to forget to use `$` before the name of a variable
- When you do this, the shell uses the name you have types rather than the value of the variable
- Another common mistake is to mis-type the variable's name
    - For example, you define `datasets` as

            datasets=seasonal/*.csv

        and then type

            echo $datsets

        the shell doesn't print anything because `datsets` is not defined

## How can I run many commands in a single loop?
- You can use pipelines to run multiple commands in a single loop
- For example,

        for file in seasonal/*.csv; do head -n 2 $file | tail -n 1; done

- All that is different is that its body is a pipeline of two commands instead of a single command

## Why shouldn't I use spaces in filenames?
- It's easy and sensible to give files multi-word names like `July 2017.csv` when you are using a graphical file explorer
    - This causes problems for the shell
- For example, suppose you wanted to rename `July 2017.csv` to `2017 July data.csv`. You cannot type,

        mv July 2017.csv 2017 July data.csv

    because it looks to the shell as though you are trying to move four files called `July`, `2017.csv`, `2017`, and `July` (again) into a directory called `data.csv`

- Instead, you have to quote the files' names so that the shell treats each as a single parameter:

        mv 'July 2017.csv' '2017 July data.csv'

## How can I do many things in a single loop?
- The loops you have seen so far all have a single command or pipeline in their body
- But a loop can contain any number of commands
- To tell the shell where one ends and the next begine, you must separate them with **semi-colons**:

        for f in seasonal/*.csv; do echo $f; head -n 2 $f | tail -n 1; done