In [1]:
bind 'set enable-bracketed-paste off'

In [2]:
# unset PROMPT_COMMAND
# Ignore this block
# shopt -s expand_aliases
alias trace_on='set -x'
alias trace_off='{ set +x; } 2>/dev/null'
alias argtest='./argtest.py'
# Manually set the width of our terminal to 180 so that argtest can autosize appropriately
stty cols 180

# Bash Argument Parsing Demo

Welcome! Bash/shell scripting can be very confusing at first. This will hopefully serve as a gentle introduction into the power and pitfalls of a shell/command line interface (CLI)

Let's start with a "hello world". This is very simple in bash:

In [3]:
echo hello world

hello world


`echo` just prints every argument it is given. It might seem a bit silly -- why would you want to print out what you just wrote? Well, although `echo` can be used in an interactive shell, it is more commonly used in scripts, in order to provide information to the user of the script.

## Variables

In [15]:
trace_on

unset foo
# Print the value of the `foo` variable
echo $foo

trace_off

+ set -x
+ unset foo
+ echo



`foo` is unset here. In bash, unset variables render as an empty string (nothing). `echo` prints a newline at the end of its output, leaving us with a single newline (`\n`) as the output

In [19]:
trace_on

# Set variable `foo` to value `bar`
foo=bar
# Print the value of variable `foo`
echo $foo

trace_off

+ foo=bar
+ echo bar
bar


In [21]:
trace_on

# Print the value of variable `foo`
echo "$foo"

trace_off

+ echo bar
bar


In [20]:
trace_on

# Print the literal characters `'$foo'`
echo '$foo'

trace_off

+ echo '$foo'
$foo


`foo` is still set to value `bar`, but single quotes prevent variables from being expanded. So, the literal string `'$foo'` is passed to `echo`, and then printed

It's important to understand that **`echo` has nothing to do with bash variables!** Variables are expanded _before_ being sent to a program

## Wildcards (Globs)

In bash, some characters are considered wildcards. Here they are, in order of how often you will probably use them:
1. `*` matches zero or more of _any_ character
2. `?` matches exactly one of _any_ character
3. `[]` matches the character ranges within it

Let's look at some examples! But first, let's create some files to work with

In [59]:
mkdir test_files
cd test_files

In [97]:
# Create empty test files. The contents don't matter; we just care about their filenames
touch file1.txt file2.txt file9.txt file10.txt other.txt data.fits file4.json fileZ.txt
# List the contents of test_files/
ls

data.fits  file10.txt  file1.txt  file2.txt  file4.json  file9.txt  fileZ.txt  other.txt


### Star Wildcard: `*`

`*` matches zero or more of _any_ character

What if we don't want to list _all_ of the files in this directory? What if we only want to list only the `.txt` files?

In [67]:
# List all files that end in .txt
ls *.txt

file10.txt  file1.txt  file2.txt  file9.txt  fileZ.txt  other.txt


In [68]:
# List all files that start with 'file'
ls file*

file10.txt  file1.txt  file2.txt  file4.json  file9.txt  fileZ.txt


In [75]:
# List all files that start with 'file' and end with '.txt'
ls file*.txt

file10.txt  file1.txt  file2.txt  file9.txt  fileZ.txt


### Question Mark Wildcard: `?`

`?` matches exactly one of _any_ character

Let's say that we wanted to list all the `file*.txt` files that have a single digit number in their name. We can't do this with `*`, but `?` will work

In [76]:
# List all filenames that start with 'file', then contain one of any character, then end with '.txt'
ls file?.txt

file1.txt  file2.txt  file9.txt  fileZ.txt


This matches `file1.txt`, `file2.txt`, `file9.txt`, and `fileZ.txt`, because they each have a single character after 'file'. It does _not_ match `file10.txt`, because there are two characters (i.e. `10`), not just one (i.e. `1`, `2`, `9`, `Z`)

### Range Wildcard: `[]`

`[]` matches the character ranges within it

What if we wanted to select just `file1.txt`, `file2.txt`, and `file3.txt`? That is, all `file[number].txt` files, but _not_ `fileZ.txt`. Obviously we could just explicitly list them all, but imagine we have thousands of files -- this won't be viable!

We can use the range wildcard to specify the explicit range of characters we want to match

In [79]:
ls file[1-9].txt

file1.txt  file2.txt  file9.txt


Note that wildcards match directories, too. For example, let's say we have .txt files in a few sub-directories

In [100]:
mkdir {a,b,c}
touch {a,b,c}/{1,2}.txt
tree

[01;34m.[00m
├── [01;34ma[00m
│   ├── 1.txt
│   └── 2.txt
├── [01;34mb[00m
│   ├── 1.txt
│   └── 2.txt
├── [01;34mc[00m
│   ├── 1.txt
│   └── 2.txt
├── data.fits
├── [01;34mdeep[00m
│   └── [01;34mnesting[00m
│       └── [01;34mof[00m
│           └── [01;34mdirectories[00m
│               ├── 1.txt
│               ├── 2.txt
│               └── 3.txt
├── file10.txt
├── file1.txt
├── file2.txt
├── file4.json
├── file9.txt
├── fileZ.txt
├── [01;34mmore[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
├── [01;34mmore_and_even_more[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
├── [01;34mmore_and_more[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
└── other.txt

16 directories, 26 files


In [None]:
ls *.txt

Hmmm... our new files don't show up. This is because `*` is restricted to the current directory. We can just list both directories explictly, though

In [102]:
ls ./*/*.txt

./a/1.txt  ./a/2.txt  ./b/1.txt  ./b/2.txt  ./c/1.txt  ./c/2.txt


## Expansions

### Curly Brackets

Curly brackets (`{}`) have special meaning in bash: any comma-separated values within them are _expanded_ during command pre-processing

Imagine that we want to list files `file9.txt` and `file10.txt`. We can't use any of the above methods! Again, we could obviously just list them both explicitly. But we can save some typing by using bracket expansion

In [78]:
ls file{9,10}.txt

file10.txt  file9.txt


### Recursive Glob (globstar)

This is probably already set in your shell, but if not: `shopt -s globstar`

First, some test data:

In [98]:
# Create nested directories
mkdir -p deep/nesting/of/directories
# Create some .txt files in the "bottom" directory
touch deep/nesting/of/directories/{1,2,3}.txt
# Show our work
tree

[01;34m.[00m
├── data.fits
├── [01;34mdeep[00m
│   └── [01;34mnesting[00m
│       └── [01;34mof[00m
│           └── [01;34mdirectories[00m
│               ├── 1.txt
│               ├── 2.txt
│               └── 3.txt
├── file10.txt
├── file1.txt
├── file2.txt
├── file4.json
├── file9.txt
├── fileZ.txt
└── other.txt

4 directories, 11 files


Great, so let's say we wanted to find _all_ of the .txt files "under" our current directory

In [82]:
ls *.txt

file10.txt  file1.txt  file2.txt  file9.txt  fileZ.txt  other.txt


In [86]:
# Use two wildcards to match .txt files in the current and nested directories
ls *.txt ./deep/nesting/of/directories/*.txt

./deep/nesting/of/directories/1.txt  ./deep/nesting/of/directories/2.txt  ./deep/nesting/of/directories/3.txt  file10.txt  file1.txt  file2.txt  file9.txt  fileZ.txt  other.txt


But if we have more than a few directories, this gets pretty tedious and error prone. 

In [99]:
mkdir -p {more,more_and_more,more_and_even_more}/deep/nestings
touch {more,more_and_more,more_and_even_more}/deep/nestings/{1,2,3}.txt

tree

[01;34m.[00m
├── data.fits
├── [01;34mdeep[00m
│   └── [01;34mnesting[00m
│       └── [01;34mof[00m
│           └── [01;34mdirectories[00m
│               ├── 1.txt
│               ├── 2.txt
│               └── 3.txt
├── file10.txt
├── file1.txt
├── file2.txt
├── file4.json
├── file9.txt
├── fileZ.txt
├── [01;34mmore[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
├── [01;34mmore_and_even_more[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
├── [01;34mmore_and_more[00m
│   └── [01;34mdeep[00m
│       └── [01;34mnestings[00m
│           ├── 1.txt
│           ├── 2.txt
│           └── 3.txt
└── other.txt

13 directories, 20 files


We _could_ specify these manually: `{deep,more,more_and_more,more_and_even_more}/deep/nestings/*.txt`

Luckily there's a better way: recursive globs!

> **globstar**
>
> If set, the pattern `**` used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a `/`, only directories and subdirectories match.

In [88]:
ls ./**/*.txt

./deep/nesting/of/directories/1.txt  ./deep/nesting/of/directories/3.txt  ./file1.txt  ./file9.txt  ./other.txt
./deep/nesting/of/directories/2.txt  ./file10.txt                         ./file2.txt  ./fileZ.txt


## Quotes
 
You'll see both single (`'`) and double (`"`) quotes in bash. You will also see backticks (``` ` ```), but these are _not_ quotes.

Single and double quotes are similar in many ways, but distinct in others, and this can be very confusing at first

### Whitespace

Whitespace in commands is handled differently if it is within quotes. 

Here we print "hello world" using no quotes, single quotes, then double quotes

In [4]:
# No quotes
echo hello world
# Single quotes
echo 'hello world'
# Double quotes
echo "hello world"

hello world
hello world
hello world


The first thing you might notice here is that there are twice as many lines! The ones prefixed by `+` are coming from Bash's command trace mode (`set -x`). The others are the output of `echo` itself. We can use this extra information to learn about how Bash handles quotes!

As for the output itself:

- `echo hello world` sends two arguments to `echo`: `hello` and `world`
- `echo 'hello world'` and `echo "hello world"` have the same effect: they both result in a single argument being sent to `echo`: `'hello world'`

**In all three cases, the output is identical!** This is because `echo` prints out all of its arguments, separated by a space (` `).

We can see this easily via a second set of examples, this time using `hello` and `world` separated by 4 spaces:

In [5]:
trace_on
# No quotes
echo hello    world
# Single quotes
echo 'hello    world'
# Double quotes
echo "hello    world"
trace_off

+ echo hello world
hello world
+ echo 'hello    world'
hello    world
+ echo 'hello    world'
hello    world


This time, the output is **not** identical. In the first example, `hello` and `world` are passed to `echo` as two separate arguments. `echo` then prints them both out, with a single space in between.

In the single- and double-quote examples, a single argument is passed to `echo`: `hello    world`. This single argument is printed out exactly as it is received.

In [6]:
trace_on
argtest hello world -v
trace_off

+ ./argtest.py hello world -v
Received 2 arguments:
hello world

 [3m                 hello                  [0m  [3m                 world                  [0m 
 ┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓  ┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ 
 ┃[1m [0m[1mChar.[0m[1m [0m┃[1m [0m[1mRepr.[0m[1m [0m┃[1m [0m[1mName                [0m[1m [0m┃  ┃[1m [0m[1mChar.[0m[1m [0m┃[1m [0m[1mRepr.[0m[1m [0m┃[1m [0m[1mName                [0m[1m [0m┃ 
 ┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩  ┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ 
 │ h     │ 'h'   │ LATIN SMALL LETTER H │  │ w     │ 'w'   │ LATIN SMALL LETTER W │ 
 │ e     │ 'e'   │ LATIN SMALL LETTER E │  │ o     │ 'o'   │ LATIN SMALL LETTER O │ 
 │ l     │ 'l'   │ LATIN SMALL LETTER L │  │ r     │ 'r'   │ LATIN SMALL LETTER R │ 
 │ l     │ 'l'   │ LATIN SMALL LETTER L │  │ l     │ 'l'   │ LATIN SMALL LETTER L │ 
 │ o     │ 'o'   │ LATIN SMALL LETTER O │  │ d     │ 'd'   │ LATIN SMALL LETTER D │ 
 └───────┴

In [7]:
trace_on
argtest 'hello world' -v
trace_off

+ ./argtest.py 'hello world' -v
Received 1 arguments:
hello world

 [3m              hello world               [0m 
 ┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ 
 ┃[1m [0m[1mChar.[0m[1m [0m┃[1m [0m[1mRepr.[0m[1m [0m┃[1m [0m[1mName                [0m[1m [0m┃ 
 ┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ 
 │ h     │ 'h'   │ LATIN SMALL LETTER H │ 
 │ e     │ 'e'   │ LATIN SMALL LETTER E │ 
 │ l     │ 'l'   │ LATIN SMALL LETTER L │ 
 │ l     │ 'l'   │ LATIN SMALL LETTER L │ 
 │ o     │ 'o'   │ LATIN SMALL LETTER O │ 
 │       │ ' '   │ SPACE                │ 
 │ w     │ 'w'   │ LATIN SMALL LETTER W │ 
 │ o     │ 'o'   │ LATIN SMALL LETTER O │ 
 │ r     │ 'r'   │ LATIN SMALL LETTER R │ 
 │ l     │ 'l'   │ LATIN SMALL LETTER L │ 
 │ d     │ 'd'   │ LATIN SMALL LETTER D │ 
 └───────┴───────┴──────────────────────┘ 


|  | No Quotes | Single Quotes | Double Quotes |
| --- | --- | --- | --- |
| Variables (`$foo`) | Expand | Literal | Expand |
| Wildcards (`*`, `?`, `[]`) | Wildcard | Literal | Literal |
| Expansions (`{}`, `**`) | Wildcard | Literal | Literal |

## Putting it all together

Let's look at all of this working together

In [105]:
# Show all text files named either 1 or 2, but only if they are within a "deep" directory at any point in their ancestry
ls ./**/deep/**/[1-2].txt

./deep/nesting/of/directories/1.txt  ./more_and_even_more/deep/nestings/1.txt  ./more_and_more/deep/nestings/1.txt  ./more/deep/nestings/1.txt
./deep/nesting/of/directories/2.txt  ./more_and_even_more/deep/nestings/2.txt  ./more_and_more/deep/nestings/2.txt  ./more/deep/nestings/2.txt
