# Sequence alignment in the WSI (Sanger) compute environment

## Introduction

In this tutorial, we will be aligning whole-genome sequencing from a mouse zygote which was subject to CRISPR-induced mutagenesis. We will then find the resulting engineered alleles, and track down some other alleles in this mouse.  

## Authors
This tutorial was written by [Vivek Iyer](https://www.sanger.ac.uk/people/directory/iyer-vivek) and [Thomas Keane](https://github.com/tk2) and adapted to be run in the WSI (Sanger) compute environment by [Vivek Iyer](https://www.sanger.ac.uk/people/directory/iyer-vivek).


## Tutorial sections
This tutorial comprises the following sections:   
 1. [Visualising the the reference genome](reference_genome_visualisation.ipynb)   
 2. [Aligning paired FASTQ files with BWA](bwa_alignment.ipynb)   
 3. [Converting a SAM file to a BAM file](sam_bam_conversion.ipynb)   
 4. [Sorting and indexing the BAM file](sort_and_index_bam.ipynb)   
 5. [Using Unix pipes to combine the commands together](piping_commands.ipynb)
 6. [Marking PCR duplicates](pcr_duplicates.ipynb)
 7. [Generating QC stats](qc_stats.ipynb)
 8. [BAM visualisation](bam_visualisation.ipynb)

## Running the commands from this tutorial

The commands in this tutorial are designed to be run in the WSI (Sanger) compute environment. At Sanger, there are many different compute environments. For this tutorial, we will be logging into a server node called **vr-login**. 

We have pre-prepared a **working directory** for each user: 

    /lustre/scratch115/teams/hgi/WSI_NGS_tutorial/read_alignment/Exercise2/[user_name]  

You will need to replace **[user_name]** with *your* user name (e.g. "kj6" or "ls7" etc). For example, if your username was *abc* then your **working directory** for this tutorial would be:

    /lustre/scratch115/teams/hgi/WSI_NGS_tutorial/read_alignment/Exercise2/abc 

All of the directories specified in this tutorial have been set up inside your **working directory**. If you're not sure what we mean by this, please ask one of the instructors and they'll be happy to go over what this means.

## Sanger user id, 'ssh gateway' and 'network' passwords

This tutorial assumes that you have a **Sanger user ID**, **Sanger user password** and **Sanger SSH password** which  that you have previously used to:

* log into the Sanger SSH gateway
* log into individual servers


## Logging into the WSI server environment and running commands in the terminal

To run the commands in this tutorial, you first have to:

* open a terminal window
* log into the WSI (Sanger) compute environment
* navigate to the working directory

This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

**Open a terminal window in your VM.**

For this tutorial, we will need to be connected to the WSI (Sanger) compute environment. To do this, we will be using **SSH** (Secure Shell) which allows us to connect to a remote server node, in this case **vr-login**, and execute shell commands.

**Type the following commands in your terminal window to connect to the WSI (Sanger) compute environment using SSH.**

In [None]:
ssh -L localhost:3128:wwwcache.sanger.ac.uk:3128 vvi@ssh.sanger.ac.uk

*You will need to replace 'vvi' with your user name (e.g. "kj6" or "ls7" etc).*

**You should now see a password prompt. Enter your *Sanger SSH password*.** 

    vvi@ssh.sanger.ac.uk's password: 
    Last login: Sun Aug 25 08:30:51 2019 from cpc91218-cmbg18
                  Wellcome Sanger Institute
                         SSH Gateway

         ********************************************
         * This system is for authorised users only *
         ********************************************

The terminal won't show you the password that you enter for security reasons. It may look as if nothing is happening, but the terminal is registering every key stroke. If you're really stuck, you some folks try typing their password into a text editor and copy/pasting it into the terminal when prompted, but that isn't secure and we don't recommend it.

*Note: you need your **SSH** password here and not your **user** password. In most cases, these passwords will be different and you need to use the password which corresponds to the command prompt. So, for this initial SSH Gateway prompt, you will need to use your **SSH** password.*

**After entering the SSH Gateway, you will be prompted "Where would you like to go today"?  Enter the server node name *vr-login*.**

    Where would you like to go today (exit to logout)? 

        > vr-login

**You will then be prompted for your *Sanger user password*.**

    Connecting to vr-login.internal.sanger.ac.uk ...
    vvi@vr-login.internal.sanger.ac.uk's password: 
    
*This is the standard network password that you use to log into other Sanger resources like Helix, webmail etc.*

**Unless you see an error, you now have access to a command prompt on *vr-login*.**

    Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-105-generic x86_64)
     * Documentation:  https://help.ubuntu.com/
    This system is managed by CFEngine 3
    Last login: Sun Aug 25 11:11:46 2019 from ssh.sanger.ac.uk
    vvi@vr-2-2-02:~$ 

**Run the following command to set up your software environment to mirror NPG's software environment.**

In [None]:
. /software/npg/etc/profile.npg  

This will allow you to access the same software (samtools, bcftools etc) that NPG uses.

**Navigate to your working area of the course, replacing '[user_name]' with your user name.**    

In [None]:
cd /lustre/scratch115/teams/hgi/WSI_NGS_tutorial/read_alignment/Exercise2/[user_name]

All of the commands from this point on assume that you are in this directory.

## Let’s get started!

To get started with the tutorial, head to the first section: [Visualising the the reference genome](reference_genome_visualisation.ipynb).