# Creating and navigating files and directories with the Unix Shell (Bash)

This lesson is an introduction to using the Unix shell and the Bash scripting language to create and navigate through files and directories (folders) on your local computer. It is an alternative to "clicking" and provides some significant advantages in documenting processes, automating tasks, and enabling reproducible workflows.

This lesson material is based upon the Software Carpentries lesson on [The Unix Shell](http://swcarpentry.github.io/shell-novice/), the [Introduction to the Command Line for Genomics](https://datacarpentry.org/shell-genomics/) lesson from Data Carpentries, and an [Intro to the Shell](https://datacarpentry.org/2015-11-04-ACUNS/shell-intro/) workshop from softwarecarpentry.org.

## Setup requirements

If you are running Windows, you will need to install Git for Windows following the instructions [here](https://carpentries.github.io/workshop-template/#shell). *Participants should install Git/Gitbash before day 1*

The default shell in some versions of macOS is Bash, and Bash is available in all versions, so no need to install anything. You access Bash from the Terminal. The easiest way to find it is by using the magnifying glass symbol at the top right corner of your page and search "Terminal". You should open it up and it's a good idea to right click the icon at the bottom of your screen and choose Options > Keep in Dock.

## Data and script for Day 1
Before today's workshop, we sent out a link to a folder called `workshop_data/` that you should have downloaded. You should save these files exactly as they are in a folder on your Desktop called `workshop_data/`.


## Background
The shell is a program that enables us to send commands to the computer and receive output. It is also referred to as the terminal or command line.
Humans and computers commonly interact in many different ways, such as through a keyboard and mouse, touch screen interfaces, or using speech recognition systems. The most widely used way to interact with personal computers is called a graphical user interface (GUI). With a GUI, we give instructions by clicking a mouse and using menu-driven interactions.

While the visual aid of a GUI makes it intuitive to learn, this way of delivering instructions to a computer scales very poorly. Imagine the following task: __. Using a GUI, you would not only be clicking at your desk for several hours, but you could potentially also commit an error in the process of completing this repetitive task. This is where we take advantage of the Unix shell. The Unix shell is both a command-line interface (CLI) and a scripting language, allowing such repetitive tasks to be done automatically and fast. With the proper commands, the shell can repeat tasks with or without some modification as many times as we want. Using the shell, the task in the example can be accomplished in seconds.

The most popular Unix shell is Bash (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

Using the shell will take some effort and some time to learn. While a GUI presents you with choices to select, CLI choices are not automatically presented to you, so you must learn a few commands like new vocabulary in a language you’re studying. However, unlike a spoken language, a small number of “words” (i.e. commands) gets you a long way, and we’ll cover some of those essentials today.

The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows.


## Running commands

The shell accepts commands one line at a time following a prompt, which is usually a symbol like `$` or `%`. When typing commands, you do not need to type this symbol, only the command that comes after it.

Let's try our first command. We are going to our change directory to Desktop:

`cd` stands for Change Directory and can be used to navigate to another folder or workspace on your computer. Anything you can click on manually can be entered through the shell using this command. Let's navigate to the Desktop by:

`$ cd Desktop`

Remember to capitalise the word Desktop.

You should now see that your command prompt has changed to indicate that you are in the Desktop directory of your computer. Any commands we run will now take place on the Desktop, which is what we want.

Now, let's say you want to see the contents of a directory after you enter it. That is accomplished with the ls command, or 'Listing'. You can use this one by itself like this:

`my_Desktop $ ls`

The output should be a list of the files and folders you have saved on your Desktop. This is also a good way of verifying that you are in the directory you think you are in after you run `cd`.

What if you type something wrong? That's no problem - you'll just see a "command not found" error which can happen if the command was mistyped or if the program that uses the command you tried is not installed. Let's try something on purpose.

`my_Desktop $ ks`

You should see a message "ks: command not found" This is what we'd expect, no problem - learning to respond to error messages is an important computational skill. :)

## Make a workshop folder called `world_cities_workshop/`

The same way you can make folders on your computer by selecting "New Folder" from a file browser window, you can also use the shell to make directories and files.

This is done using the `mkdir` command.

Now that you are on your Desktop, let's make a folder for the project we will be working on for this workshop. Let's call it `world_cities_workshop/`. Notice that we don't use spaces in naming any folders - this is helpful to avoid errors when scripting with Bash. It's good practice to use underscores or dashes instead of space whenever you are naming files or folders.

In your shell, write the command:

`$ mkdir world_cities_workshop`

And hit enter. If you watch your Desktop, you should see a new folder arrive with that name! Of course you could click on it to see its contents (it should be empty), but let's use the shell to do that.

`$ cd world_cities_workshop`

Now, you are in the world_cities_workshop folder. Let's verify that it's empty:

`$ ls`

Your output should not show anything....Looks good!

Let's set up a folder structure for our project that is generally useful for any project working with data and code. It's good practice to make separate folders each for raw data, data outputs, scripts and documents in the context of any project you're working on. For now, let's create two new directories, `data/` and `scripts/` inside the world_cities_workshop/ folder.

We can make these directories at the same time using Bash. Make sure you are in your world_cities_workshop/ folder and run:

`$ mkdir data scripts`

You won't see anything in your shell, but if you run the ls command again you should see two new folders in your project directory. This is good!
Now let's try creating a folder for `documents/` in `world_cities_workshop/`. Try this on your own.


## Moving files to the project folder

The data we want to use are contained in multiple files located in the workshop_data/ folder that we sent out prior to this workshop. Let's take a look at the contents of that folder by navigating there in the shell. If you are in your `world_cities_workshop/` folder, you can use a special notation that means "move one folder up", `../` and then move directly into the `workshop_data/` folder:

`$ cd ../workshop_data`

Now you should see your prompt change to let you know you are in the `workshop_data folder`. Use ls to take a look at the files that are in there:

`$ ls`

You should see eight files in the folder, seven of which are csv files (Comma Separated Values) and one of which is a plain text file (.txt). We can take a look at the contents of each file using the cat command followed by the full name of one of the files including the extension:

`$ cat amsterdam-nl.csv`

You should see one line of text containing coordinates and population data for Amsterdam pop up right in your shell. This is the contents of the file (it's very short!) and the command prompt returns below it. You can cat into other files as well, give it a try on your own.

## Moving files from one folder to another

You will see that one of the files in the `workshop_data/` folder is a text file called `data-documentation.txt`. This looks like something we should save, but separately from our raw data. You can use the `mv` command to Move files from one place to another on your computer without using drag and drop. Let's try moving `data-documentation.txt` to our `world_cities_workshop/documents` folder that you created earlier.

`$ mv data-documentation.txt ../world_cities_workshop/documents`

Note how we use the `../` notation to move one folder "up" from `workshop_data/` to Desktop, and from there we can specify the exact path to the new folder.

You should also see a file called `map.py` - this extension lets us know that this is a Python script. We can `cat` into it the same way we did to other files because it is simply a text file with a very particular syntax that Python can read and understand. We don't know much else about it at this moment, and that's ok - let's just move it to the right subfolder within our `world_cities_workshop/` directory.

Try this on your own.

(`$ mv map.py ../world_cities_workshop/scripts`)

## Using wildcards to access multiple files at once

Use `ls` again to see the list of files in the `workshop_data/` directory.

You'll see that all the .csv files follow a naming convention that goes `<cityname>-<countryabbreviation>.csv`. This is important as a consideration when you're working with scripts, as there are very easy commands that can be used to quickly move, create and work with files if you know that their names follow a predictable pattern.

Here we'll work with an example. We only want data for cities that are in the Netherlands for this first part of our analysis. Of course, being familiar with this country we may recognize that Amsterdam and Rotterdam are in the Netherlands, but we can also take advantage of the naming convention to quickly choose and move all data files from Netherlands' cities to the `world_cities_workshop/data/` folder where we want them.

In Bash, `*` is a wildcard, which matches zero or more characters. Let’s consider the `workshop_data/` directory: `*.csv` matches amsterdam-nl.csv, eindhoven-nl.csv and every file that ends with ‘.csv’. On the other hand, `n*.csv` only matches newyorkcity.csv because the ‘n’ at the front only matches filenames that begin with the letter ‘n’.

`$ mv *-nl.csv ../world_cities_workshop/data`

If you now look in your `world_cities_workshop/data/` folder, you should see 5 files that contain data for cities in the Netherlands.


## Select and concatenate data files

Right now, the data are each stored in separate files per city. That might be helpful for some applications, but we want a list of data in one .csv file. We can do this on the command line by using a combination of the `awk` command (used for pattern scanning and processing) and `>`, which concatenates the contents of multiple files to a new file that in this case we will call `netherlands-cities.csv`.

We can use the wildcard `*` again to grab all the .csv files this time in the `world_cities_workshop/data/` folder. We already know that these are all related to cities in the Netherlands, so we can use purely `*.csv` to match all the files in that directory.

`$ awk 1 *.csv > netherlands-cities.csv`

This will save a new file called `netherlands-cities.csv` in the `data/` folder and automatically create a new line between each city. Check to see if it's there! You can use `ls` to see the files in the `data/` folder, and `cat` to check out the contents of the newly created `netherlands-cities.csv`.

`$ cd data`
`$ ls`
`$ cat netherlands-cities.csv`

## Add road data to `netherlands-cities.csv`

While these are not geographically accurate to real roads, we may want to add connections between cities of interest. We can do this by editing the `netherlands-cities.csv` file we just created - you guessed it, using the command line!

Nano is a text editor that you can use direcly in your shell interface. There are a few of these, including vi and vim, but today we'll just work with nano. These should come with your shell so there's no need for a separate installation. To open a file with nano, use:

`$ nano <filename>`

Let's open the netherlands-cities.csv file to add some road data that we will specify now.

`$ nano netherlands-cities.csv`

Unlike using the `cat` command, nano opens the file contents in a new interface with a black bar at the top and some options at the bottom. Also new is that you can edit the contents of a file and save or 'write out' the file afterward. This is what we'll do now. Note that you can use your arrow keys to navigate the file to make changes, as clicking and inserting is not an option.

Let's add the following lines to the netherlands-cities.csv file:

r,Amsterdam,Rotterdam

r,Rotterdam,Utrecht

r,Amsterdam,Utrecht

r,The-Hague,Rotterdam

Make sure there is a new line between each of them.

Now, press shift + control + O to "WriteOut" the file (save it). Nano will ask you if you want to save this file with the same name, and we do, so press enter. You can then press shift + control + X to exit the Nano text editor and get back to your bash shell.

If you now use cat to check out the contents of your file, you should see that it has been updated:

`$ cat netherlands-cities.csv`
