# The Command Line - An Introduction
--------------------------------------------------------------

Graphical user interfaces are fast, often more than fast enough to suit our needs. GUIs are feature rich, can be intuitive, and often filter out a lot of stuff we don't need to know about and aren't interested in. Nearly everything we need to do can be done simply and quickly using a GUI.

__Plus:__

* The command line is old fashioned
* Potential efficiency gains take time to manifest
* Even Neal Stephenson says it's obsolete
* ...

## But some things are just tedious... ##

For example, cleaning all the PDFs off of a cluttered desktop. Backing up files and data. Getting file and permissions info. Routine stuff.

All that clicking and dragging. We have better things to do with our time. Plus, for some activities, GUIs use an unnecessary amount of resources and can quickly add up to a cluttered workspace.

__The command line is a great resource for speeding up and automating routine activities without using a lot of processing power. In some cases, it can be better for__

* Searching for files
* Searching _within_ files
* Reading and writing files and data
* Network activities

Some file and data recovery processes can __only__ be executed from the command line.

Finally, the concepts and tools use in UNIX command line environments carry directly into other programming languages including R (`system`) and Python (`os`). 

* There are libraries in most programming languages that enable interaction with the host operating system to perform basic operating system commands and complex processes that can be run from the command prompt. 
* The *read-eval-print loop* (REPL) process that is a foundational concept upon which command line interaction operates is also common to many scripting languages like R and Python - *read* an entered command, *evaluate* the command, and *return/print* the output of the command.These individual commands can then be combined in sequences to complete more compelex tasts. 

## Overview ##

This morning we will focus on using a command line client using the `bash` command interpreter to navigate the filesystem, read and write files, and practice stringing together commands using pipes. For various good reasons, the emphasis will be on UNIX shell commands which are used on Linux and Mac OS operating systems. 

**Windows Notes**

Most if not all of the commands covered will not work within the standard Windows command line client. Windows users can and are encouraged to avail themselves of UNIX command line clients:

* Git provides an excellent client: [https://git-scm.com/](https://git-scm.com/)
* Cygwin is another popular option: [https://www.cygwin.com/](https://www.cygwin.com/)
* For Windows 10 Anniversary Edition and later try out the [Windows Subsystem for Linux](https://msdn.microsoft.com/en-us/commandline/wsl/about)

**Mac Notes**

The release of Mac OS 10.15 (Catalina) marked the transition from the use of the `bash` shell to the `zsh` shell as the default command line interpreter for the Mac terminal application. To ensure that you are running the `bash` interpreter you can enter `bash` at the command prompt and hit return. This will run the `bash` interpreter and execute subsequent commands using `bash`. 


### Live Coding

Today we are going to be executing commands together and therefore need to have a **command line** environment on our systems. If you are already running the Mac OS or Linux you already have one - the **terminal**. If you are running Windows you need to install a UNIX command environment such as those listed above. 

**Check for operating command line environment**

### Command line syntax ###

Command line programs all follow a similar syntax

```bash
command -options arguments
```

where the behavior of a _command_ can be modified using one or more different _options_. We will see several examples of commonly used options for commands like `ls`. The _argument_ is generally the object that the command is being run against. Many commands have default arguments that do not need to be specified. 

### Getting Help and More Information ... ###

Many shell commands have flags or options that can be utilized to refine the execution of the command. In order to find out more about a specific command and its options, three resources (one or more of which may be available for most commands) are the `--help` flag, the manual pages available through the `man` command, and command information available through the `info` command. 

```bash
ls --help

```

OR

```bash
man ls
```
    
OR

```bash
info ls
```

#### Give it a try ####

1. Follow along with me executing the above commands
2. Look up the help information for the `pwd` and `cd`


### Navigation ### 

__Where am I and what's here?__

#### The relationship between your GUI view of your file system and the command line view ####

([full resolution image](graphics/shellPathFigure.png))

![View of how the Windows Explorer and Mac Finder views of the file system related to the same locations in shell paths](graphics/shellPathFigure_sm.png)

#### Interaction with the file system in `bash` ####

```bash
pwd
```

`pwd` is short for 'print working directory.' The output of this command is the absolute path to the directory a user is currently in. Often, knowing the absolute path is necessary in order to move or copy files, or to run scripts.

```bash
ls
```

`ls` lists the contents of a directory or directory tree. It is commonly executed with the '-l' (for long output) and/or the '-a' (for a listing of all files - including hidden ones) flag. If `ls` is executed without providing a specific location (path) it will list the contents of the current working directory. If a path is provided it will provide the list of files from that location.  

#### Give it a try ####

* Use the `pwd` command to see the path of the current *working directory*
* Use the `ls` command to see the contents of the *current directory*
* Use the `ls ..` command to see the content of the *parent* of the current directory

The `..` character sequence is a special reference to the parent of the current directory. This character combination can be used multiple times to represent higher levels relative to the current directory. For example `../..` refers to the parent of the current directory's parent (grandparent), and `../../..` refers to the great grandparent of the current directory. This shorthand reference can be used in many other commands where you would provide a *path* for a command to operate on. 

* Use the `ls -l` command to see the detailed listing of the contents of the current directory

The `-l` flag used above stands for "long list" when used with `ls`. Compared with the list of filenames produced by the unflagged `ls` command, the long list provides some useful information, including whether a list item is a file or directory, file/directory permissions, the owning user, the owning group, the size and the time when the file was last modified.

![elements of the `ls -l` command output](lsOutput.png)

* Use the `ls -la` command to see the detailed listing of the contents of the current directory - including hidden and special files and pointers

The `-a` option, or just `a` when combined with another option such as in this example tells the `ls` command to display all files, including hidden files and the special reference shortcuts `..` (for the parent directory of the current directory), and `.` for the current directory. Any file or directory name that begins with a `.` (such as a file named `.ipynb_checkpoints`) is treated as a hidden file and won't be displayed in an ls command unless the -a flag is provided.


Notice the `.`, `..` items that are now in the file listing. These items are not displayed by default, as they are *system* provided representations of the current directory location `.`, or the parent directory location `..`. Any file or directory name that begins with a `.` (such as the `.ipynb_checkpoints` file) is treated as a hidden file and won't be displayed in an `ls` command unless the `-a` flag is provided. 

__How do I get out?__

There are some shortcuts to reference specific locations in the computer's file system:

`/` is a shortcut for the top directory in the file system

`~` (tilde) is a shortcut for the current user's home directory

`.` is a shortcut for the current working directory

`..` is a shortcut for the parent of the current working directory

```bash
cd
```

`cd` is one command that will also work in the Windows shell, and stands for... _change directory_.

The fastest way to move up within a folder hierarchy is to use the _dot-dot_ notation for the parent directory:

#### Give it a try ####

* Use the `cd ..` command to move up one directory
* Use the `cd ../..` comand to move up two directories
* Use the `cd /` command take you to the topmost directory
* Use the `cd ~` (or just plain `cd`) command to take you to the current user's home directory


__Quick Check:__

__How far up can we go from here?__

```bash
pwd
cd ..
pwd
```

__Higher?__

```bash
pwd
cd ../..
pwd
```

__Getting moving__

```bash
# Navigate to the root directory ('/'), generate a list of its contents, and then navigate back to our working directory.
pwd

# Set pwd to a variable for later use
DEMO=$(pwd)
echo $DEMO

cd /
pwd

ls -la

cd $DEMO
pwd
```

Three good things to know:

* `clear`: Clear the screen.
* _tab completion:_  Auto-complete file and directory names. 
* _up- and down-arrow:_ Scroll through previous commands.

#### Relative versus Absolute Paths ####

At first, navigating across directories within the shell may seem slower and more cumbersome than simply using mouse clicks and a GUI. Using tab completion with absolute OR relative paths is an excellent way to increase efficiency.

The _relative path_ is the path to a directory or file from the context of a specific starting point (usually the current working directory).

The _absolute path_ is the path to a directory or file from the filesystem root ('/').

As an example, to navigate to the root directory from our current working directory, we can use

_cd ../../../_ (using the relative path)

**OR**

_cd /_ (using the absolute path)

The second option is plainly faster in this case. In other contexts, the relative path would be faster. 

Relative and absolute paths can also be used to list the contents and read, write, or delete files in other directories.

__Give it a try__

```bash
# what is my current directory?
pwd

# save the current directory for use later
DEMO=$(pwd)

# Move up two directories in the file structure using the relative path.
cd ../..
pwd

# Return to the saved directory and then navigate to the root directory using the absolute path.
cd $DEMO
pwd
cd /
pwd

# Go home
cd
pwd

# Go to a specific directory with an absolute path
cd /Users/kbene
pwd
```

### Read and Write Files ###

If we consider the way we use file explorers, navigating the file system is just one of several common operations
we execute using GUIs. We also do a lot of file management through these applications. In addition to creating directories and files, we often use these tools to move, copy, rename, and delete files as well as directories. All of these things can be done using the command line.

The quickest way to create (empty) directories and files is using `mkdir` (make directory) and `touch` (create/update). 

__Make a directory__

```bash
# Make a directory to experiment in and change into that directory
mkdir demo
cd demo

# It's possible to make multiple directories using the `-p` flag:
mkdir -p classes/econ/readings/week_1
ls -FR
```

__Make a directory and add a file to that directory__

```bash
mkdir drafts
cd drafts
touch report.txt # Use 'touch' to create an empty file
ls -l            # Note that 'report.txt' has a file size of 0 bytes.
```

__Print the contents of a file to screen using `cat`__

```bash
cat report.txt   # There is no output because the file is empty.
```

There are multiple ways to edit or add information to files.

__Use `echo` to copy some text into a new or existing file__

First let's see what `echo` does:

```bash
echo "I would gladly pay you Tuesday for a hamburger today!"
```

The `echo` command acts like a print statement. In the example above it prints to screen the argument, which in this
case is the sentence _I would gladly pay you Tuesday for a hamburger today._

Using `echo` to write some text to our "report.txt" file is a good way to demonstrate using _pipes_ in the command line. Rather than printing to screen, pipes are used to pass the output of a command to another command. They can also be used to specify an alternative output besides printing to screen (also known as the standard output).

```bash
cat report.txt
# Note the `>` Pipe redirects the output of the `echo` statement to the file 'report.txt'
echo "From: Jon Smith" > report.txt
cat report.txt
```

Note the `>` pipe will overwrite any text already in a file:

```bash
echo "To: Dept. of Human Resources" > report.txt
cat report.txt
```

If we have text in a file that we don't want to overwrite, we can append to existing text using the `>>` pipe:

```bash
echo "From: Jon Smith" > report.txt
cat report.txt
echo "To: Dept. of Human Resources" >> report.txt
cat report.txt
```

__Use a command line text editor such as Nano or Vim to edit a file__

We can also edit files using text editors with command line interfaces.

```bash
nano report.txt
cat report.txt
```

### Example Use Case: Documenting data sources for a text analysis ###

The workflow decribed above is not artificial or only for example purposes. It has been used to script and document the harvest of data sources for files used in text analysis. Say for example we wanted to create a concordance of work by Charles Baudelaire.

A list of works can be found at Project Gutenberg: <https://www.gutenberg.org/ebooks/search/?query=baudelaire&submit_search=Go%21>

We could click through the links and download individual files. That approach doesn't scale well, and we'd have to keep a separate record of which files we downloaded.

An alternative approach using a command line utility might look like:

```bash
mkdir sources
cd sources
touch baudelaire_pg_sources.txt
echo "https://www.gutenberg.org/cache/epub/6099/pg6099.txt" >> baudelaire_pg_sources.txt
echo "https://www.gutenberg.org/cache/epub/13792/pg13792.txt" >> baudelaire_pg_sources.txt
# Note - this next command requires 'wget' to be installed on your system
# Wget is an open source web harvester: <https://www.gnu.org/software/wget/>
wget -r -i baudelaire_pg_sources.txt
```

While this example solution is lacking important documentation, we note that it scales well, can be shared with others, and is reproducible.

### Copy, Move, and Delete Files ###

Remember `pwd`? Often, to copy, move, or delete a file it is necessary to know either the relative or absolute paths of the source and destination directories. `cp` and `mv` require source and destination paths. Note that it is possible to rename a file when copying it, but we use 'move' to rename a file without copying it.

* `cp`: Copy.
* `mv`: Move AND/OR rename.
* `rm`: Remove.

__Copy a file__

```bash
# Copy our Project Gutenberg sources to a different directory
cp baudelaire_pg_sources.txt ../baudelaire_pg_sources.txt

# It is possible to give the copy a different name
cp baudelaire_pg_sources.txt ../cb_sources_backup.txt
```

__Rename a file using `mv`__

This is a bit counterintuitive at first, but there is no Unix 'rename' command. In order to rename a file, we can use the `mv` command to rename a file in place by "moving" it into the same directory with a new name:

```bash
# Rename our source file to add version info and shorten the filename
mv baudelaire_pg_sources.txt cb_pg_v1.txt
ls -l
```

__Move a directory and its contents__

Let's say we didn't intend to store our data sources for our text analysis with our drafts. We can use the `mv` command to move the whole directory.

```bash
pwd
cd ../../
pwd      # Make sure we are in the 'demo' directory
mv drafts/sources .     # Here the '.' as a shorthand for the current working directory saves us a lot of typing
ls -l
ls -FR 
```

### Exercise ###

1. Create a directory named "backup" in the "demo" directory.
2. Copy the file "report.txt" to the "backup" directory and rename it "reports_v1.txt"

__Delete files and directories using `rm`__

__NOTE:__ Deleting files and directories using the command line is permanent. Objects are not sent to a recycle bin - they are immediately removed from your system and cannot be retrieved.

```bash
# Make sure we are in the "demo" directory
pwd
ls
# Remove the copy of the text data sources file that is still in our 'drafts' directory
rm drafts/cb_sources_backup.txt
```

Try to remove a directory:

```bash
rm drafts
```

A directory must be empty before it can be deleted using `rm`. In cases where you only have a few files, removing them one by one is recommended. If you have a lot of files and are certain you want to delete everything, using `rm -rf` tells the client to recursively (`-r`) force (`-f`) the deletion of any subdirectories or files.

The following command should be used with extreme caution - it works quickly and cannot be undone:

```bash
rm -rf drafts
ls
```

## What Next ##

Some additional learning materials

* O'Reilly Safari Books Online [*Bash Scripting Fundamentals*](https://learning.oreilly.com/videos/bash-scripting-fundamentals/9780134541730)
* [Webmonkey "Unix Guide"](http://www.webmonkey.com/2010/02/unix-guide/)

* [Linux Command Line Cheatsheet](http://www.cheatography.com/davechild/cheat-sheets/linux-command-line/)
* [Bash Guide for Beginners](http://www.tldp.org/LDP/Bash-Beginners-Guide/html/)
* [An A-Z Index of the Bash command line for Linux](https://ss64.com/bash/)



Some additional commands of immediate interest:

* [find](https://ss64.com/bash/find.html)
* [grep](https://ss64.com/bash/grep.html)
* [curl](https://ss64.com/bash/curl.html)