# Visualising the the reference genome

In this tutorial, we will use **BWA** (http://bio-bwa.sourceforge.net) to align one small region of Illumina sequencing data to the *Mus musculus* genome. 

The sequencing data (Illumina reads) that we'll be aligning come from Whole-Genome Sequencing (WGS) of a mouse embryo which has been mutagenised while at the one-cell stage using CRISPR-Cas9 and a gRNA targeting an exon of the *Tyr* gene. Successful mutation of the *Tyr* gene will delete one or both alleles. A bi-allelic null *Tyr* mouse will be albino, but otherwise healthy.

First, let's take a look at the reference genome, in this case *Mus musculus*.

**Go to the `ref` directory.**

In [None]:
cd ref

In this directory, you will find a FASTA file which contains the reference genome. FASTA files (.fa or .fasta) are used to store raw sequencing information, such our reference genome, before aligning data. The mouse genome is in the file called `GRCm38.68.dna.toplevel.fa`.

**Take a look at the reference genome (`GRCm38.68.dna.toplevel.fa`) with `less`.**

In [None]:
less GRCm38.68.dna.toplevel.fa

As with BAM files, an index file is often required to allow fast retrieval of data from our reference genomes. In this case, we have already created the indices for our reference genome:

* `GRCm38.68.dna.toplevel.fa.fai` – allows rapid sequence retrieval with `samtools`
* `GRCm38.68.dna.toplevel.fa.amb` - records appearances of N (or other non-ATGC) in the reference
* `GRCm38.68.dna.toplevel.fa.sa` – suffix trees, bwt transform etc.

### Exercises

**Q1: What is the length of chromosome 1 in the reference (mouse) genome?**  
_Hint: look at the FASTA header for chromosome 1_

**Q2: Can you quickly check if there other sequences in the assembly other than the 'standard' chromosomes?**  
_Hint: try `grep '>' GRCm38.68.dna.toplevel.fa`_

The answers to the questions on this page can be found [here](answers.ipynb).   

Now, continue to the next section of the tutorial: [Aligning paired FASTQ files with BWA](bwa_alignment.ipynb).   
You can also return to the [index page](index.ipynb).