# The most basic Commands

## Print the working directory

In [None]:
pwd

## List the files in the working directory

In [None]:
ls

### Thats not that many files!  Let's make a new one

## Print a message

In [None]:
echo "Some Message"

## Write output to a file - this is called "piping"

In [None]:
echo "Some Message" > myNewFile.txt

### Lets see those files again.  But lets also look at some details about the files this time:

In [None]:
ls -l

## Make a new directory

In [None]:
mkdir myDirectory

In [None]:
ls -l

## Change the current directory

In [None]:
cd myDirectory

In [None]:
ls -l

### is this really empty? Lets view hidden files too!

In [None]:
ls -la

### Thats so cool!
Yes, it is.  Bash has built in links to folders.  `./` refers to the current directory. `../` refers to the parent directory.

### So lets check those out!

In [None]:
ls -la .

In [None]:
ls -la ..

## Bash also has `~/` which is your home directory

In [None]:
ls ~/

## Let's move `myNewFile.txt`, which you created earlier, to the current directory:

In [None]:
mv ../myNewFile.txt ./
ls -la

## Copying files:

In [None]:
cp myNewFile.txt ../

What did you just move, and where?

## Deleting a file

In [None]:
ls ../

In [None]:
rm ../myNewFile.txt

In [None]:
ls ../

## Viewing a file

In [None]:
cat myNewFile.txt

### Lets make a longer file for this example

In [None]:
printf "a\nb\nc\nd\ne\nf\ng\n" > exampleView.txt

In [None]:
cat exampleView.txt

In [None]:
head -n 5 exampleView.txt

In [None]:
tail -n 5 exampleView.txt

### You can also use the `less` and `more` commands to preview a file interactively. This won't work in the notebook, but try it in the shell sometime! 

## Counting lines in a file

In [None]:
wc -l exampleView.txt

#  More basic commands

## Creating a link

There are two kinds of links: hard links, and soft links.  A hard link is a pointer to a file, a softlink is a pointer to a file name.


Think of a softlink like a shortcut, and a hard link like a semi-copy of a file.

To create a link, use `ln`. The default is a hard link; to create a soft link, use the `-s` flag.

In [None]:
echo "A new file" > ../originalFile.txt

In [None]:
ls ../

In [None]:
ln -s ../originalFile.txt myShortcut.txt

In [None]:
ls

In [None]:
ln ../originalFile.txt myHardlink.txt

In [None]:
ls -l

### So what happens if we delete the original file?

In [None]:
rm ../originalFile.txt
ls -l

### Oh no!  This shortcut is broken!

In [None]:
cat myShortcut.txt

### But this hardlink still works, since we pointed to the file, not the file name

In [None]:
cat myHardlink.txt

Alright, let's remove those links now.

In [None]:
rm myHardlink.txt
rm myShortcut.txt

## Creating a variable
This can be useful for storing strings, such as long filepaths. 

In [None]:
MY_VAR="this is a variable string"

In [None]:
echo $MY_VAR

## Creating an alias
Aliases are like shortcuts to longer Unix commands. This can be helpful for storing long commands that you use often. 

Let's create an alias for examining disk usage. 

The `-h` flag is an option specifying that memory should be displayed in human-readable format (e.g. Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte).

We will pipe the output of this command to a `sort` command. The `-rn` flag sorts by numerical value in reverse order.

In [None]:
alias diskUse="du -h * | sort -rn"

In [None]:
cd ../
diskUse
cd myDirectory

## Creating a function

What are `$1` and `$2`?  These are the arguments directly following what you type.  Think of these as quick, poor man's versions of command line arguments

In [None]:
function lnhs { ln $1 $2; ln -s $1 ${2}.path; }

What does this function do?

In [None]:
echo "blah" > ../blah.txt
lnhs ../blah.txt myLinks.txt
ls -l

## Check if something is in your path and where it is

Yup!

In [None]:
which python

Nope!

In [None]:
which matlab

## Download a file

`wget` is a very useful command for downloading large files the web. Let's practice downloading results from a study by [Gasperini et al. 2019](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120861).

Under Supplementary Files, right click on the (ftp) link for GSE120861_all_deg_results.at_scale.txt.gz, and click on Copy Link Address. Paste the link in the command below.

In [None]:
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE120nnn/GSE120861/suppl/GSE120861_all_deg_results.at_scale.txt.gz

In [None]:
ls -l

#### Unzip the file.

In [None]:
gunzip GSE120861_all_deg_results.at_scale.txt.gz

#### Now let's preview the file!

In [None]:
head GSE120861_all_deg_results.at_scale.txt

## Zip a file

Let's zip that file back up.

In [None]:
gzip GSE120861_all_deg_results.at_scale.txt

In [None]:
ls -l

## Edit permissions

In [None]:
chmod 777 myLinks.txt
ls -l

### Woah check out all those letters!  What do they mean?

4: Read

2: Write

1: Execute

The three numbers XXX following `chmod` are, from left to right: owner (u), group (g), everyone (o).

<img src="./chmod_permissions.png">

So `chmod 777` sets all permission to everyone as read (r), write (w), execute (x).

How would we set `myLinks.txt` permissions to:

Group:  read, execute

Everyone: read

Owner: read, write, execute?

In [None]:
### Try it here






# Advanced Commands

## More advanced piping

So we saw how `>` can pipe command line output into a file.  How do we pipe output from one command to another?  

We use ` | `.  We can do this as many times as we want to chain commands. Lets check it out.

In [None]:
ls -l

In [None]:
ls | wc -l

In [None]:
ls | wc -l | wc -l > someOutput.txt
cat someOutput.txt

## `tee`

`tee` lets us display output, and also pipe it to a file at the same time.  As `tee` is a command, we pipe to it with ` | `

In [None]:
ls | tee someOutput.txt

In [None]:
cat someOutput.txt

#### Before we proceed, we need to get a file that has multiple columns.  Lets download a file via git to mess around with going forward

In [None]:
git clone https://github.com/zrcjessica/gwas_tutorial.git

In [None]:
ls gwas_tutorial/*

In [None]:
mv gwas_tutorial/IBD_GWAS_summary_thinned.txt ./
rm -rf gwas_tutorial
ls -l

#### Before we proceed, lets check out what is in that file

In [None]:
head -n 5 IBD_GWAS_summary_thinned.txt

##  Cut

`cut` lets us pull specific columns out of a file.  The normal delimeter is `\t`, though we can set a different one with `-d` (e.g. for a .csv or a white-space delimited file)

In [None]:
cut -f2 IBD_GWAS_summary_thinned.txt | head

In [None]:
### Try to print out the third column, store it to a file name "testingFile.txt", and display it at the same time






## Awk

`awk` is more advanced language for text processing and data manipulation, and can be used in the command line or in a  script.  You can get pretty advanced with it.

Lets just print the first two columns with a tab between them.

In [None]:
awk '{print $1 "\t" $2}' IBD_GWAS_summary_thinned.txt | head 

Lets print the first two columns with "..." between them by overriding the default field separator. We specify this with `OFS='...'`. `OFS` is one of many special variables that `awk` has. It stands for Output Field Separator Variable. By default, it is a space character. The `-v` flag specifies that we are using a special `awk` variable.

In [None]:
awk -v OFS='...' '{print $1,$2}' IBD_GWAS_summary_thinned.txt | head

Let's play with another useful special variable, `NR`, or Number of Records Variable. This is better known as line number. Let's do what we did above, but printing only the header and all even numbered lines.

In [None]:
awk -v OFS='...' 'NR==1 || NR %2 == 0 {print $1, $2}' IBD_GWAS_summary_thinned.txt | head

Lets print the first two columns with "..." between them, but not for the first column, and then also add the line number starting at the second line. By the way, notice how we broke up the long command with `\` - this allows you to visualize longer commands better.

In [None]:
awk -v OFS='...' \
'{if (NR==1) print $1 "\t" $2 "\t" "line"; else print $1,$2, NR-1}' IBD_GWAS_summary_thinned.txt | head

In [None]:
### Try to print the first 5 columns and the line number, with the first 5 columns separated by ".." 
### and the line number separated by two tabs.  Then only show the first 7 lines of the output







## Grep

`grep` performs character matching, and then returns all the lines that match that string

In [None]:
wc -l IBD_GWAS_summary_thinned.txt

In [None]:
grep "chr10" IBD_GWAS_summary_thinned.txt | head -n 5

In [None]:
grep "chr10" IBD_GWAS_summary_thinned.txt | wc -l

In [None]:
grep "rs" IBD_GWAS_summary_thinned.txt | wc -l

## Sed
Another way to edit files.  Sed has many commands, so its less easy to teach at once.  Lets look at one example where we add a prefix string to every line in the file.

In [None]:
sed -e 's/^/---PFX/' IBD_GWAS_summary_thinned.txt | head

## Paste
`paste` lets us join two files together horizontally instead of vertically (like `cbind` in R). Let's do this with the beginning of the file, looking at just the first five columns.

In [None]:
head IBD_GWAS_summary_thinned.txt | cut -f1-5 > tmp
paste tmp tmp

What if the files arent the same lengths?  Doesnt matter, it just stops pasting from the shorter file

In [None]:
paste tmp  myLinks.txt | head

In [None]:
rm tmp

## Sort a file
`sort` is a standard command line program that does exactly what its name implies.

Let's see what happens when we try sorting our file normally on the first five columns, without any options.

In [None]:
cut -f1-5 IBD_GWAS_summary_thinned.txt | sort - | head

By the way, you can sort in reverse as well by adding the `-r` option:

In [None]:
cut -f1-5 IBD_GWAS_summary_thinned.txt | sort -r - | head

Why does chr9 come out at the top? Aren't there 22 chromosomes? It's because it's being treated as a string.

You can specify the field to sort on using `-k` followed by the number of the column. You can also sort by numerical value using the flag `-n`.

Let's sort on the P-values of the GWAS SNPs (column 11) and show select columns for the top hits. 

In [None]:
sort -k 11 -n IBD_GWAS_summary_thinned.txt| head | cut -f1-5,11

Looks like there's a lot of hits on chromosome 10. Must be relevant for IBD! Also looks like there's a hit in the MHC region on chromosome 6 - no surprise, lots of research has corroborated the role of MHC in complex diseases. For extra practice, let's sort on chromosome, and then the p-values of the SNPs on that chromosome.

##### Sorting on multiple fields
Let's sort the table on chromosome, and then the p-values of the SNPs on that chromosome. This requires a `-k m,n` format, where `-m` and `-n` refer to the start and end columns of keys that potentially span multiple columns. To sort on chromosome first, we use `-k 1,1n` to indicate that the key to be sorted by numerical value starts and ends in column 1. If we use only `-k 1n`, then the entire row starting from column 1 would be used as the key for sorting. To sort on the p-value second, we use `-k 11,11n`. 

In [None]:
sort -k 1,1n -k 11,11n IBD_GWAS_summary_thinned.txt | head | cut -f1-5,11

## For loops

Loop over values

In [None]:
for i in a b c d; do echo $i; done

Loop over a sequence

In [None]:
for i in `seq 1 10`; do echo $i; done

Let's make a directory and write a series of dummy files to that directory.

In [None]:
mkdir ./dummy_files

In [None]:
ls -l

In [None]:
for i in a b c d; do echo $i > ./dummy_files/dummy_file_$i.txt ; done

In [None]:
ls -l dummy_files/

Now let's loop through all the dummy files and view their contents!

In [None]:
FILES=./dummy_files/*

for f in $FILES; do echo "contents of $f" && cat $f ; done