# Whole Exome Sequencing Variant Calling

This tutorial will provide an example pipeline for calling variants in whole exome sequencing (WES) data across multiple samples, assuming the samples are paired-end. It is strongly recommended that these step be run on a large compute resource such a high-performance compute cluster (HPC) or a cloud resource.  For HPC, these commands should be wrapped inside a shell script with the HPC scheduling language installed on the cluster.  I will try to denote which ones are critical for the cluster, versus which ones can be run locally.

<br/><br/>

---------------------------------------

## STEP0: Fastq Raw Quality Check

Before any sample processing can begin, it is critical to know the initial state of the raw sequences. This is typically delivered from the sequencer in fastq format (.fastq/.fq) or gzipped (fastq.gz/fq.gz) so the data is compressed to decrease file size and storage use. 


|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | local, HPC, or Cloud |
| Operating System         | Windows, MacOS, Linux |
| Software                 | [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)|
| Dependencies             | Perl, Java + JDK |
| Tested on                | FastQC==v0.11.9, Ubuntu 18.04.4 LTS (Bionic Beaver)| 
| Input File Type          | raw sequencing fastq file w/extension *.fastq.gz, *.fq.gz, *.fq, or *.fastq |  
| Output File Type         | html report and compressed images |

    
<br/>

For <span style="background:yellow;color:black">**_each fastq file_**</span>, you will want to run the following command:
<br/>

<p style="background:white">&emsp; &emsp; &emsp; &emsp;
<code style="background:grey;color:black;font-size:16px">fastqc myFile.fastq.gz -o /path/to/my/output/directory
</code>
</p>
    
<br/>

Example submission script <span style="background:lightgreen;color:black">**_if using a HPC with a scheduler and parallelizing with GNUparallel_**</span> for all available fastq files:
<p style="background:white;border:1px solid gray;margin-left:40px">&emsp; &emsp; &emsp; &emsp;
<code style="background:white">
    #!/bin/bash
    &ltinsert scheduler resources here -- #BSUB (LSF), #SBATCH (SLURM), #PBS (PBS)&gt
    
    # dependency of fastqc
    export PERL5LIB=/path/to/perl
    
    
    # declare and define variables
    fastqc="/path/to/fastqc"
    listOfFastqs="/path/to/file/with/fastq/pathnames.txt"
    outDir="/path/to/output/directory/"
    gnuParallel="/path/to/GNUparallel --tag --memfree 5G --delay 0.2 --jobs 5 -u --progress --joblog STEP0_fastqc.parallel.log --resume"
    
    # store the prefix list as an array
    readarray allFiles < $listOfFastqs
    
    # run the command and have GNUparallel auto-submit jobs as resources are available
    $gnuParallel "time $fastqc {} -o $outDir" ::: ${allFiles[@]}

</code>
</p>


<br/><br/>

### Expected Output


For each fastq file an html report and a zipped directory will be produced.  The zipped directory contains the same information ast the html report; the difference being that the zipped directory contains all the report images as image files that can be embedded into documents/presentations/etc...

<br/><br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [ ] <span style="color:blue">**Sequence trimming and clean up**</span>
- [ ] Fastq Check on Trimmed Fastq
- [ ] Alignment/Mapping
- [ ] Alignment Quality Check
- [ ] Duplicate Removal
- [ ] Independent Variant Calling
- [ ] Independent Variant Filtering
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>

### Useful Resources
**FastQC**
* There is not a formal paper publication on this software, so cite the software as open-source code or webpage.
* For [FastQC Docs](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) click on the link.

<br/><br/>

---------------------------------------

## STEP1: Basic quality trimming and adapter removal

Based on the output of fastqc, you will want to make sure poor quality bases are trimmed, any adapter contamination as a result of the library prep or sequencing linkers are removed, and any paired reads that may be too short are *removed together.  This is critical for paired-end reads as the order between the forward (R1) and the reverse (R2) must be maintained.*  Reads that are too short will ineviitbaly lead to high multimappers in the alignment phase and can computationally slow down the aligner.

|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | local, HPC, or Cloud |
| Operating System         | Windows, MacOS, Linux |
| Software                 | [cutadapt](https://cutadapt.readthedocs.io/en/stable/)|
| Dependencies             | Python3 |
| Tested on                | cutadapt==v3.2, Python>=3.6.9, Ubuntu 18.04.4 LTS (Bionic Beaver)| 
| Input File Type          | raw sequencing fastq file w/extension *.fastq.gz, *.fq.gz, *.fq, or *.fastq |  
| Output File Type         | raw sequencing fastq file w/extension *.fastq.gz, *.fq.gz, *.fq, or *.fastq |

    
<br/>

Given cutadapt is a python package, it can be installed using pip:

```
    pip3 install --user cutadapt==3.2
```

Assuming FastQC shows Illumina Universal Adapter contamination, we will want to remove those adapter sequences from both reads ends, trim/remove poor quality sequences, and dicard read pairs that may be too short.  One thing to consider, if the sequences were run on the NovaSeq, quality scores are no longer granular!  The NovaSeq bins Phred quality scores to 4 potential bins of either 2, 12, 23, and 37.  Therfore, when selecting a quality cut-off know that if you pick something between the bin values, it will ony remove/trim sequences based on the lower bin.  [Heng Li has great blog](https://lh3.github.io/2017/07/24/on-nonvaseq-base-quality) about this if you are interested.

<br/>

For <span style="background:yellow;color:black">**_each pair of fastq files_**</span>, you will want to run the following command:

<p style="background:white">
<code style="background:grey;color:black;font-size:16px;margin-left:40px">cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -q 30 --minimum-length=10 --pair-filter=any -o Read1Cleaned_1.fq.gz -p Read2Cleaned_2.fq.gz Read1.fq.gz Read2.fq.gz
</code>
</p>

<br/>

Example submission script <span style="background:lightgreen;color:black">**_if using a HPC with a scheduler and parallelizing with GNUparallel_**</span> for all available fastq files:
<p style="background:white;border:1px solid gray;margin-left:40px">
<code style="background:white">
    #!/bin/bash
    &ltinsert scheduler resources here -- #BSUB (LSF), #SBATCH (SLURM), #PBS (PBS)&gt
    
    export PERL5LIB=/path/to/perl
    export PYTHONPATH=/path/to/python/site-packages
    
    # declare and define variables
    cutadapt="/path/to/cutadapt"
    listOfPEfastqs="/path/to/fastqPE/prefixs.txt"
    gnuParallel="/path/to/GNUparallel --tag --memfree 5G --delay 0.2 --jobs 5 -u --progress --joblog STEP1_cutadapt.parallel.log --resume"
    
    
    # store the prefix list as an array
    readarray allFiles < $listOfPEfastqs
    
    
    # run the command and have GNUparallel auto-submit jobs as resources are available
    $gnuParallel "time $cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -q 30 --minimum-length=10 --pair-filter=any -o {}Cleaned_1.fq.gz -p {}Cleaned_2.fq.gz {}1.fq.gz {}2.fq.gz" ::: ${allFiles[@]} 

</code>
</p>

<br/>

### Expected Output

<br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [x] Sequence trimming and clean up
- [ ] <span style="color:blue">**Fastq Check on Trimmed Fastq**</span>
- [ ] Alignment/Mapping
- [ ] Alignment Quality Check
- [ ] Duplicate Removal
- [ ] Independent Variant Calling
- [ ] Independent Variant Filtering
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>

### Useful Resources
**cutadapt Hallmark Paper**
* *Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), pp. 10-12. doi:https://doi.org/10.14806/ej.17.1.200*
* For [cutadapt Docs](https://cutadapt.readthedocs.io/en/stable/guide.html) click on the link.
        
<br/><br/>  
        
---------------------------------------


<br/><br/>

## STEP2: Fastq Cleaned Quality Check

This is the exact same step as STEP0, with the exception that now you are feeding in the the cleaned fastq files generated in STEP1 that have now been cleaned.  This step is to ensure all cleaning was performed as expected.


|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | local, HPC, or Cloud |
| Operating System         | Windows, MacOS, Linux |
| Software                 | [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)|
| Dependencies             | Perl, Java + JDK |
| Tested on                | FastQC==v0.11.9, Ubuntu 18.04.4 LTS (Bionic Beaver) | 
| Input File Type          | raw sequencing fastq file w/extension *.fastq.gz, *.fq.gz, *.fq, or *.fastq |  
| Output File Type         | html report and compressed images |

    
<br/>

For <span style="background:yellow;color:black">**_each fastq file_**</span>, you will want to run the following command:

<p style="background:white">&emsp; &emsp; &emsp; &emsp;
<code style="background:grey;color:black;font-size:16px">fastqc myCleanedSTEP1File.fastq.gz -o /path/to/my/output/directory/
</code>
</p>

<br/>

Example submission script <span style="background:lightgreen;color:black">**_if using a HPC with a scheduler and parallelizing with GNUparallel_**</span> for all available fastq files:
<p style="background:white;border:1px solid gray;margin-left:40px">
<code style="background:white">
    #!/bin/bash
    &ltinsert scheduler resources here -- #BSUB (LSF), #SBATCH (SLURM), #PBS (PBS)&gt
    
    # dependency of fastqc
    export PERL5LIB=/path/to/perl
    
    
    # declare and define variables
    fastqc="/path/to/fastqc"
    listOfFastqs="/path/to/file/with/fastq/cleaned/pathnames.txt"
    outDir="/path/to/output/directory/"
    gnuParallel="/path/to/GNUparallel --tag --memfree 5G --delay 0.2 --jobs 5 -u --progress --joblog STEP2_fastqc.parallel.log --resume"
    
    # store the prefix list as an array
    readarray allFiles < $listOfFastqs
    
    # run the command and have GNUparallel auto-submit jobs as resources are available
    $gnuParallel "time $fastqc {} -o $outDir" ::: ${allFiles[@]}

</code>
</p>


### Expected Output

<br/><br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [x] Sequence trimming and clean up
- [x] Fastq Check on Trimmed Fastq
- [ ] <span style="color:blue">**Alignment/Mapping**</span>
- [ ] Alignment Quality Check
- [ ] Duplicate Removal
- [ ] Independent Variant Calling
- [ ] Independent Variant Filtering
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>

### Useful Resources
**FastQC**
* There is not a formal paper publication on this software, so cite the software as open-source code or webpage.
* For [FastQC Docs](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) click on the link.

<br/><br/>  

---------------------------------------

## STEP3: Aligning Reads to Reference Genome

This is typically the most computationally expensive part of the analysis and should be performed on a machine with large enough RAM/memory and cores.  Since this is DNA, there is no need to use a splice-aware aligner, therefore a DNA aligner such as bowtie2 or bwa is fine.  


|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | HPC or Cloud |
| Operating System         | Linux/Unix-based |
| Software                 | [BWA](http://bio-bwa.sourceforge.net/), [Samtools](http://www.htslib.org/) |
| Dependencies             |  |
| Tested on                | bwa==v0.7.11, samtools==v1.8 | 
| Input File Type          | raw sequencing fastq file w/extension *.fastq.gz, *.fq.gz, *.fq, or *.fastq |  
| Ouput File Type          | alignment file w/extension *.sam or *.bam |


<br/>

For <span style="background:yellow;color:black">**_each pair of fastq files_**</span>, you will want to run the following command.  This assumes you have 12 cores available for threading -- 6 for alignment and 6 for sorting: 
<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">bwa mem -t 6 /path/to/myBWAIndex/filePrefix Read1Cleaned_1.fq.gz Read2Cleaned_1.fq.gz | samtools sort -@6 -o SampleID_trimmed_sorted.bam -
</code> 
</p>
<br/>
<div class="alert alert-block alert-info"><span style="color:black"><b>Tip:</b> In order to use BWA, you must index your reference genome first.  This only needs to be done one time per reference genome.  This can be accomlished by using <code style="background:black;color:white">bwa index youReferenceGenome.fasta</code>.  For more information, <a href="http://bio-bwa.sourceforge.net/bwa.shtml">look at the bwa documentation here</a>.</span></div>

<br/>

Example submission script <span style="background:lightgreen;color:black">**_if using a HPC with a scheduler and parallelizing with GNUparallel_**</span> for all available fastq files:
<p style="background:white;border:1px solid gray;margin-left:40px">
<code style="background:white">
    #!/bin/bash
    &ltinsert scheduler resources here -- #BSUB (LSF), #SBATCH (SLURM), #PBS (PBS)&gt
    
    module load samtools/1.8 # or declare variable to samtools executable
    
    # declare and define variables
    bwa="/path/to/bwa"
    bwaIndexGenome="/path/to/bwa_indices/including/prefix"
    listOfPEfastqs="/path/to/fastqPE/prefixs.txt"
    #samtools="/path/to/samtools/if/needed"
    gnuParallel="/path/to/GNUparallel --tag --memfree 50G --delay 0.2 --jobs 2 -u --progress --joblog STEP3_WES_bwa_alignment.parallel.log --resume"
    
    # store the prefix file list as an array
    readarray allFiles < $listOfPEfastqs
    
    # run the command and have GNUparallel auto-submit jobs as resources are available
    $gnuParallel "time $bwa mem -t 6 $bwaIndexGenome {}R1.fq.gz {}R2.fq.gz | samtools sort -@6 -o {}trimmed_sorted.bam -" ::: ${allFiles[@]}
    
</code>
</p>
    
    
    
    
    
    
    


### Expected Output

<br/><br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [x] Sequence trimming and clean up
- [x] Fastq Check on Trimmed Fastq
- [x] Alignment/Mapping
- [ ] <span style="color:blue">**Alignment Quality Check**</span>
- [ ] <span style="color:blue">**Duplicate Removal**</span>
- [ ] Independent Variant Calling
- [ ] Independent Variant Filtering
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>

### Useful Resources
**Burrows-Wheeler Alignment Tool (BWA) Hallmark Paper**
* *Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754–1760. https://doi.org/10.1093/bioinformatics/btp324*
* For [BWA Docs](http://bio-bwa.sourceforge.net/), click on the link


<br/><br/>

---------------------------------------

## STEP4: Collect and Assess Alignment Quality + Remove Duplicates

This can easily be performed on a local computer as the memory and storage footprint are relatively low.  This collect alignment information and statistics, insert size metrics, and duplicate metrics.  The duplicate read removal step is critical for WES and variant calling.  In this case, I physically remove the duplicate reads instead of "marking" the duplicates, in case the GATK pipeline is not used for variant calling.  All of these metrics collected should be looked at and if there are enough samples, graphically reprensented using your own custom code to ensure the differences can be explained within and between samples.  **It may be the case that after looking at these metrics, alignmnet paramters need to be adjusted and STEP3 needs to be re-run and re-assessed to fit the needs of the project.**


|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | Local, HPC or Cloud |
| Operating System         | Linux/Unix-based |
| Software                 | [Picard](https://broadinstitute.github.io/picard/), [Samtools](http://www.htslib.org/) |
| Dependencies             | Java + JDK |
| Tested on                | Picard==v2.21.1, samtools==v1.8 | 
| Input File Type          | alignment file w/extension *.bam or *sam |
| Output File Type         | series of txt files and alignment file w/extension *.bam and index *.bai | 


<br/>

For <span style="background:yellow;color:black">**_each alignment file_**</span>, you will want to run the following sets of commands:
<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">samtools flagstat alignmentFileName.bam > alignmentFileName.flagstat
</code> 
</p>

<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">java -jar Picard.jar CollectAlignmentSummaryMetrics I=alignmentFileName.bam O=picard.metrics.txt R=refGenome.fasta
</code> 
</p>

<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">java -jar Picard.jar CollectInsertSizeMetrics I=alignmentFileName.bam O=insert_size_metrics.txt H=insert_size_histogram.pdf
</code> 

<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">java -jar Picard.jar MarkDuplicates I=alignmentFileName.bam O=alignmentNoDups.bam M=markDupMetrics.txt REMOVE_DUPLICATES=true
</code>

<p style="background:white">&emsp;&emsp;
<code style="background:grey;color:black;font-size:16px">samtools index alignmentNoDups.bam
</code> 
</p>

<br/>

Example submission script <span style="background:lightgreen;color:black">**_if using a HPC with a scheduler and parallelizing with GNUparallel_**</span> for all available alignment files:
<p style="background:white;border:1px solid gray;margin-left:40px">
<code style="background:white">
    #!/bin/bash
    &ltinsert scheduler resources here -- #BSUB (LSF), #SBATCH (SLURM), #PBS (PBS)&gt
    
    module load java
    module load picard/2.21.1
    module load samtools/1.8
    export PERL5LIB=/home/bin/perl
    
    # declare and define variables
    #picardJar="/path/to/picard/if/needed"
    #samtools="/path/to/samtools/if/needed"
    bamPath="/path/to/bam/output/location/directory/"
    outDir="/path/to/output/results/"
    bamList="/path/to/file/with/bamPrefix.txt"
    fastaFile="/path/to/reference/genome.fasta"
    
    # store the prefix file list as an array
    readarray allBamFiles < $bamList
    
    # run the command and have GNUparallel auto-submit jobs as resources are available
    echo "STEP4: 1/5...flagstats"
    for bamFile in ${bamPath}*.bam;
    do
      time samtools flagstat ${bamFile} > ${bamFile}.flagstat
    done
    wait
    sleep 60
                                           
    echo "STEP4: 2/5...picard alignment metrics"
    gnuParallel="/path/to/GNUparallel --tag --memfree 18G --delay 0.2 --jobs 5 -u --progress --joblog STEP6_alignmentMetrics_picard.parallel.log --resume"
    $gnuParallel "time java -jar $picardJar CollectAlignmentSummaryMetrics I=${bamPath}{}.bam O=${outDir}{}.picard.metrics.txt R=$fastaFile" ::: ${allBamFiles[@]}
    wait
    sleep 60
    
    echo "STEP4: 3/5...collect insert size metrics"
    gnuParallel="/path/to/GNUparallel --tag --memfree 18G --delay 0.2 --jobs 5 -u --progress --joblog STEP6_collectInsertMetrics_picard.parallel.log --resume"
    $gnuParallel "time java -jar $picardJar CollectInsertSizeMetrics I=${bamPath}{}.bam O=${outDir}{}insert_size_metrics.txt H=${outDir}{}insert_size_histogram.pdf" ::: ${allBamFiles[@]}
    wait
    sleep 60
    
    echo "STEP4: 4/5...mark duplicates"
    gnuParallel="/path/to/GNUparallel --tag --memfree 18G --delay 0.2 --jobs 5 -u --progress --joblog STEP6_markDuplicates_picard.parallel.log --resume"
    $gnuParallel "time java -jar $picardJar MarkDuplicates I=${bamPath}{}.bam O=${outDir}{}noDups.bam M=${outDir}{}markDupMetrics.txt REMOVE_DUPLICATES=true" ::: ${allBamFiles[@]}
    wait
    sleep 60
    
    echo "STEP4: 5/5...index new bam"
    for newBam in ${outDir}*.bam;
    do
        time samtools index ${newBam}
    done
    wait
    sleep 60
                                           
             
                                           
</code>
</p>

<br/>

### Expected Output

<br/><br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [x] Sequence trimming and clean up
- [x] Fastq Check on Trimmed Fastq
- [x] Alignment/Mapping
- [x] Alignment Quality Check
- [x] Duplicate Removal
- [ ] <span style="color:blue">**Independent Variant Calling**</span>
- [ ] Independent Variant Filtering
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>


### Useful Resources
**SamTools Hallmark Paper**
* *Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352*
* For [SamTools Docs](http://www.htslib.org/doc/samtools.html), click on the link
* [Dave Tang's Blog](https://davetang.org/wiki/tiki-index.php?page=SAMTools) also has a great wiki and tutorial for SamTools

**Picard Tools**
* For [Picard Docs](https://broadinstitute.github.io/picard/) click on the link
* There does not seem to be a formal publication, so please cite their GitHub page located [here](https://github.com/broadinstitute/picard)
    
<br/><br/>

---------------------------------------

## STEP5: Independent Variant Calling per Sample

|Specifications            |                      |
|:------------------------:|:--------------------:|
| Recommended run location | HPC or Cloud |
| Operating System         | Linux/Unix-based |
| Software                 | [Platypus](https://www.well.ox.ac.uk/research/research-groups/lunter-group/lunter-group/platypus-a-haplotype-based-variant-caller-for-next-generation-sequence-data) |
| Dependencies             | Python2.7 |
| Tested on                | Platypus==v0.8.1, Python==v2.7.6 | 
| Input File Type          | alignment file w/extension *.bam or *sam |
| Output File Type         | variant call file w/extension .vcf.gz | 



### Expected Output
<br/>

### Next Steps Checklist Status
- [x] Raw Fastq Check
- [x] Sequence trimming and clean up
- [x] Fastq Check on Trimmed Fastq
- [x] Alignment/Mapping
- [x] Alignment Quality Check
- [x] Duplicate Removal
- [x] Independent Variant Calling
- [ ] <span style="color:blue">**Independent Variant Filtering**</span>
- [ ] Multi-sample VCF Generation
- [ ] Joint Variant Calling/Backfill/Squaring
- [ ] Split muliallelic variants
- [ ] Joint Variant Filtering
- [ ] Annotation

<br/>


### Useful Resources

**Platypus Hallmark Paper**
* *Andy Rimmer, Hang Phan, Iain Mathieson, Zamin Iqbal, Stephen R. F. Twigg, WGS500 Consortium, Andrew O. M. Wilkie, Gil McVean, Gerton Lunter. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics (2014) doi:10.1038/ng.3036* 
* For the [Platypus Docs](https://www.rdm.ox.ac.uk/research/lunter-group/lunter-group/platypus-documentation), click on the link
