Skip to content

Latest commit

 

History

History
195 lines (100 loc) · 8.33 KB

Covid19_MSA_Group_Comparison.md

File metadata and controls

195 lines (100 loc) · 8.33 KB

Comparison of Covid-19 Spike Protein Sequences Using Multiple Sequence Alignment

Introduction to the Workflow

workflow screenshot

The workflow compares two sets of peptide, DNA, or RNA sequences using following steps:

  1. Multiple sequence alignment (MSA) and calculate a distance matrix using the Bioconductor’s msa package.
  2. Generate and plot a phylogenetic tree by neighbor-joining using the packages ape and ggtree.

There are two required input parameters (fasta_1 and fasta_2) that represent two sets of sequences in fasta format. Optional parameters include the sequence type, msa method, and distance type. The workflow’s main output is a plot of a phylogenetic tree showing the relation of the two sets.

The workflow can be found at: https://github.com/CompEpigen/msa_group_compare

For further information, please also see: https://w3id.org/cwl/view/git/93d3f03cdd9c44bdc609a11f097a4bad9451be84/CWL/workflows/msa_group_compare.cwl

Aim of the Demo

Using the above workflow, we would like to investigate whether there has been a change in the sequence of COVID19 surface glycoprotein between the early outbreak in China and the subsequent wave in Europe.

Therefore we will compare sequences from two patient cohorts:

  • China, until the end of 2019, 11 patients
  • Europe, in January 2020, 10 patients

Source: NCBI Virus (04/06/2020), https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/

Please Note: The workflow and its application presented here are intended for demonstration purposes only. We do not claim that this analysis and the produced result represent best scientific practice. For this tutorial you can run the analysis on our DEMO workflUX instance, where you can get access to with an ELIXIR Account (see instructions below). Keep in mind, that the DEMO workflUX instance is designed to run only small workflows for testing and training purposes.

Step 1: Register with Your ELIXIR Account

This step is only necessery if you want to acess the DEMO workflUX instance. If you want to acess your own installation of workflUX then connect to your own URL and continue with step 2.

  1. Connect to the DEMO version of workflUX using your browser of choice: https://cwlab.krini.ingress.rancher.computational.bio/

You should see a welcome screen like this:

welcome screenshot

  1. Press on login / register in the top bar.

You will be forwarded to ELIXIR AAI.

alt text

ELIXIR AAI will ask you to register your account to be part of a specific group.

  1. Please fill in the registration form and submit.

Upon successful registration, you should see a message like this:

alt text

  1. Please press continue to be redirected to workflUX.

Step 2: Login with Your ELIXIR Account

welcome screenshot

Now that you have registered, you may log in.

  1. Please press the login / register button again.

You will be forwarded to ELIXIR AAI.

alt text

  1. Please provide workflUX access to the requested information.

Step 3: Import a Workflow

After login, additional options will appear in the top bar.

  1. If you use a specific workflow for the first time, please click on the Import CWL Workflow/Tool button.

import workflow screenshot

  1. Select in the drop-down menu URL to public CWL document (e.g from github).

  2. Provide the URL to the raw file of the workflow by copy & pasting following HTTP URL:

https://raw.githubusercontent.com/CompEpigen/msa_group_compare/master/CWL/workflows/msa_group_compare.cwl

  1. Name the workflow: MSA_Group_Compare.

Step 4: Create a New Job

  1. Please click on the Create Job button.

create job screenshot

  1. Select the MSA comparison workflow in the left panel.

create job screenshot

In the first section, you will be informed with a workflow description.

title screenshot

Please Note: This description has been directly extracted from the CWL workflow upon import.

  1. Please choose a title for your analysis job.

Please Note: You may submit batches of runs at once. This greatly simplifies the specification of large sample sets.

For this demo, we will leave the batch submission disabled.

scroll down

  1. To provide parameters, please select HTML form.

html screenshot

Step 5: Provide Input Parameters

An HTML form will appear that asks you to provide input parameters for this run. There are only two required parameters (fasta_1 and fasta_2). The remaining parameters are optional and pre-set with their defaults, but feel free to play around.

  1. Provide the S-protein sequences for the two patient cohorts by copy & pasting following HTTP URLs:

fasta_1: https://raw.githubusercontent.com/CompEpigen/msa_group_compare/master/example_job/data/china_2019_covid_surf_prot_seq.fasta

fasta_2: https://raw.githubusercontent.com/CompEpigen/msa_group_compare/master/example_job/data/europe_2020_covid_surf_prot_seq.fasta

parameter screenshot

Please Note: Your parameters are automatically validated to avoid bad surprises when starting the run.

Please Note: The entire parameter form has been created from the information parsed from the CWL workflow itself. No additional configuration was needed upon import. If you don‘t know the meaning of a parameter, please click on the ℹ️ button to get documentation.

scroll down

  1. Click on validate and create job.

validate screenshot

Please wait

  1. Once a green success message occurs, you may click on Job Execution & Results in the top bar.

execute screenshot

Step 6: Submit to WES

Now it is time to submit your job to a WES endpoint for execution.

  1. Select the newly created job in the left panel.

select job screenshot

  1. Select the run.

select run screenshot

  1. Select a WES endpoint.

  2. Press start.

select running screenshot

After a few seconds, the run status should turn to RUNNING. If many executions are happening in parallel you might also see QUEUED.

Please wait

select complete screenshot

Once the status turned to RUNNING, your execution should complete within 2-5 min.

Step 7: Explore the Output

Once the execution has finished, you are probably interested in retrieving and exploring the output.

  1. Click on Details for the completed run.

select details screenshot

  1. Click on Output Files.

select output screenshot

  1. Please select the phylogenetic_tree.png.

  2. Press download selected file.

select output screenshot

Please open the png with an image viewer of your choice.

Congratulation, you have reached the end of this demo!

You have successfully carried out an analysis based on public data using GA4GH-compliant infrastructure. All of this without a line of code.

With the obtained results, you may know try to answer the original question of this task: Is there evidence for a difference in the SARS-CoV-2 S-protein sequence between the early outbreak in China and the subsequent wave in Europe?