# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

nf-core aims to collect / develop and maintain pipelines to use in bioinformatics that are reproducible and usable across institutions. The pipelines should follow the FAIR guidelines (findable, accessible, interoperable and reusable) and are intended to simplify the process of finding and analysing data.

Source: Ewels, P.A., Peltzer, A., Fillinger, S. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020). https://doi.org/10.1038/s41587-020-0439-x

2. How many pipelines are there currently in nf-core?

There are 139 pipelines currently: 84 released ones, 43 under development and 12 archived. 

Source: https://nf-co.re/pipelines/ 

3. Are there any non-bioinformatic pipelines in nf-core?

Yes, there are some pipelines that are not strictly in the field of bioinformatics, although they are in related or neighboring fields. For example there are pipelines more focused on handling data tasks (f.e. a pipeline called datasync) or pipelines for astronomy data (rangelands) or for a simulation of the Industrial Revolution (spinningjenny). 

Source: https://nf-co.re/pipelines/ 

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

A pipeline is a series of tools to collect and analyse data, which are connected to each other. These are used to just give the input once, and the pipeline will hand over the outputs of tools as inputs for the next tools. 

5. Why do you think nf-core adheres to strict guidelines?

Adhering to strict guidelines simplifies the accessibility and usability of the pipelines across differents users. 

6. What are the main features of nf-core pipelines?

nf-core pipelines are documented, have stable releases, are open source, uses continuous-integration testing when changes are made, are runnable on most devices, and has packaged software. Also, the follow the FAIR guidelines (as already described).

Source: https://nf-co.re/ 

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

Searching for "differential abundance" on the https://nf-co.re/pipelines/ website leads to the pipeline called differentialabundance. 

In [None]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


In [None]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir out_day01


Run in Command Line, Output:  
...  
executor >  local (21)  
[85/bb7fb7] NFC…E:DIFFERENTIALABUNDANCE:GUNZIP_GTF (Mus_musculus.GRCm38.81.gtf.gz) | 1 of 1 ✔  
[89/f45ee7] NFC…RENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GTF_TO_TABLE (Mus_musculus) | 1 of 1 ✔  
[b5/7b1f15] NFC…NDANCE:DIFFERENTIALABUNDANCE:VALIDATOR (SRP254919.samplesheet.csv) | 1 of 1 ✔  
[bb/86136e] NFC…UNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:SRP254919]) | 1 of 1 ✔  
[01/6bc588] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 1 of 1 ✔  
[f9/546a3c] NFC…reatment, reference:mCherry, target:hND6, blocking:sample_number]) | 2 of 2 ✔  
[61/81518b] NFC…E_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (2) | 2 of 2 ✔  
[e8/1c8d5b] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEAGCT (treatment_mCherry_hND6_) | 1 of 1 ✔  
[eb/cdae24] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEACLS (treatment_mCherry_hND6_) | 2 of 2 ✔  
[b7/7f2df2] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:TABULAR_TO_GSEA_CHIP (1) | 1 of 1 ✔  
[bf/32c841] NFC…ERENTIALABUNDANCE:GSEA_GSEA (treatment_mCherry_hND6_sample_number) | 2 of 2 ✔  
[54/fa29e2] NFC…ENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY (treatment) | 1 of 1 ✔  
[b1/cce3c2] NFC…ABUNDANCE:PLOT_DIFFERENTIAL (treatment_mCherry_hND6_sample_number) | 2 of 2 ✔  
[03/7c74b4] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:SHINYNGS_APP (SRP254919) | 1 of 1 ✔  
[47/320420] NFC…NTIALABUNDANCE:DIFFERENTIALABUNDANCE:RMARKDOWNNOTEBOOK (SRP254919) | 1 of 1 ✔  
[06/4e995c] NFC…TIALABUNDANCE:DIFFERENTIALABUNDANCE:MAKE_REPORT_BUNDLE (SRP254919) | 1 of 1 ✔  
Completed at: 29-Sep-2025 12:21:51  
Duration    : 10m 24s  
CPU hours   : 0.1  
Succeeded   : 21  

In [None]:
# repeat the run. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir out_day01_run2


Output:  
...    
executor >  local (21)  
[fc/7784c8] NFC…E:DIFFERENTIALABUNDANCE:GUNZIP_GTF (Mus_musculus.GRCm38.81.gtf.gz) | 1 of 1 ✔  
[f7/47a68a] NFC…RENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GTF_TO_TABLE (Mus_musculus) | 1 of 1 ✔  
[84/c80c75] NFC…NDANCE:DIFFERENTIALABUNDANCE:VALIDATOR (SRP254919.samplesheet.csv) | 1 of 1 ✔  
[8e/7cb2d1] NFC…UNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:SRP254919]) | 1 of 1 ✔  
[5e/0e3ac8] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 1 of 1 ✔  
[53/991ba3] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 2 of 2 ✔  
[55/6389c6] NFC…E_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (2) | 2 of 2 ✔  
[68/470bcb] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEAGCT (treatment_mCherry_hND6_) | 1 of 1 ✔  
[b9/ac38e3] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEACLS (treatment_mCherry_hND6_) | 2 of 2 ✔   
[28/7ade8c] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:TABULAR_TO_GSEA_CHIP (1) | 1 of 1 ✔  
[44/95dbb2] NFC…BUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (treatment_mCherry_hND6_) | 2 of 2 ✔  
[35/03c07a] NFC…ENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY (treatment) | 1 of 1 ✔  
[2b/8ca703] NFC…:DIFFERENTIALABUNDANCE:PLOT_DIFFERENTIAL (treatment_mCherry_hND6_) | 2 of 2 ✔  
[cc/5a70ce] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:SHINYNGS_APP (SRP254919) | 1 of 1 ✔  
[4b/e7962b] NFC…NTIALABUNDANCE:DIFFERENTIALABUNDANCE:RMARKDOWNNOTEBOOK (SRP254919) | 1 of 1 ✔  
[a8/b31e8b] NFC…TIALABUNDANCE:DIFFERENTIALABUNDANCE:MAKE_REPORT_BUNDLE (SRP254919) | 1 of 1 ✔  
Completed at: 29-Sep-2025 12:29:03  
Duration    : 4m 9s  
CPU hours   : 0.1  
Succeeded   : 21  
  
What did change?  
- the second run took about half the time of the first run (reason: Docker container is already there)  


In [None]:
# now set -resume to the command. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir out_day01_run3 -resume

Output:  
...  
executor >  local (3)  
[fc/7784c8] NFC…E:DIFFERENTIALABUNDANCE:GUNZIP_GTF (Mus_musculus.GRCm38.81.gtf.gz) | 1 of 1, cached: 1 ✔  
[f7/47a68a] NFC…RENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GTF_TO_TABLE (Mus_musculus) | 1 of 1, cached: 1 ✔  
[84/c80c75] NFC…NDANCE:DIFFERENTIALABUNDANCE:VALIDATOR (SRP254919.samplesheet.csv) | 1 of 1, cached: 1 ✔  
[8e/7cb2d1] NFC…UNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:SRP254919]) | 1 of 1, cached: 1 ✔  
[5e/0e3ac8] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 1 of 1, cached: 1 ✔  
[00/e9ca25] NFC…reatment, reference:mCherry, target:hND6, blocking:sample_number]) | 2 of 2, cached: 2 ✔  
[52/a2ec0f] NFC…E_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (2) | 2 of 2, cached: 2 ✔  
[68/470bcb] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEAGCT (treatment_mCherry_hND6_) | 1 of 1, cached: 1 ✔  
[b9/ac38e3] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEACLS (treatment_mCherry_hND6_) | 2 of 2, cached: 2 ✔  
[28/7ade8c] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:TABULAR_TO_GSEA_CHIP (1) | 1 of 1, cached: 1 ✔  
[be/7a2226] NFC…ERENTIALABUNDANCE:GSEA_GSEA (treatment_mCherry_hND6_sample_number) | 2 of 2, cached: 2 ✔  
[35/03c07a] NFC…ENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY (treatment) | 1 of 1, cached: 1 ✔  
[2b/8ca703] NFC…:DIFFERENTIALABUNDANCE:PLOT_DIFFERENTIAL (treatment_mCherry_hND6_) | 2 of 2, cached: 2 ✔  
[72/f64022] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:SHINYNGS_APP (SRP254919) | 1 of 1 ✔  
[63/4b990a] NFC…NTIALABUNDANCE:DIFFERENTIALABUNDANCE:RMARKDOWNNOTEBOOK (SRP254919) | 1 of 1 ✔  
[e2/dedeab] NFC…TIALABUNDANCE:DIFFERENTIALABUNDANCE:MAKE_REPORT_BUNDLE (SRP254919) | 1 of 1 ✔  
  
What did change?   
- significantly faster  
- most files are annotated as cached (hashes, f.e. fc/7784c8 are the same!)  


Check out the current directory. Next to the outdir you specified, what else has changed?

There is now a work directory (I am actually not sure if it appeared now or already while some tests on how to run nextflow)

In [None]:
# delete the work directory and run the pipeline again using -resume. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir out_day01_run4 -resume

Output:  
...  
executor >  local (21)  
[37/f42418] NFC…E:DIFFERENTIALABUNDANCE:GUNZIP_GTF (Mus_musculus.GRCm38.81.gtf.gz) | 1 of 1 ✔  
[1a/963054] NFC…RENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GTF_TO_TABLE (Mus_musculus) | 1 of 1 ✔  
[72/6377d4] NFC…NDANCE:DIFFERENTIALABUNDANCE:VALIDATOR (SRP254919.samplesheet.csv) | 1 of 1 ✔  
[12/98d1aa] NFC…UNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:SRP254919]) | 1 of 1 ✔  
[c2/3cea19] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 1 of 1 ✔  
[dc/7196d7] NFC…_, variable:treatment, reference:mCherry, target:hND6, blocking:]) | 2 of 2 ✔  
[4e/39bed4] NFC…E_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (2) | 2 of 2 ✔  
[02/b6dc13] NFC…RENTIALABUNDANCE:CUSTOM_TABULARTOGSEAGCT (treatment_mCherry_hND6_) | 1 of 1 ✔  
[8d/0ffb12] NFC…NCE:CUSTOM_TABULARTOGSEACLS (treatment_mCherry_hND6_sample_number) | 2 of 2 ✔  
[ec/3ef786] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:TABULAR_TO_GSEA_CHIP (1) | 1 of 1 ✔  
[03/69706c] NFC…BUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (treatment_mCherry_hND6_) | 2 of 2 ✔  
[b8/58f340] NFC…ENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY (treatment) | 1 of 1 ✔  
[d7/9268ef] NFC…:DIFFERENTIALABUNDANCE:PLOT_DIFFERENTIAL (treatment_mCherry_hND6_) | 2 of 2 ✔  
[fd/1b5171] NFC…FFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:SHINYNGS_APP (SRP254919) | 1 of 1 ✔  
[46/0b0153] NFC…NTIALABUNDANCE:DIFFERENTIALABUNDANCE:RMARKDOWNNOTEBOOK (SRP254919) | 1 of 1 ✔  
[6d/c9f27e] NFC…TIALABUNDANCE:DIFFERENTIALABUNDANCE:MAKE_REPORT_BUNDLE (SRP254919) | 1 of 1 ✔  
Completed at: 29-Sep-2025 12:47:35  
Duration    : 4m 2s  
CPU hours   : 0.1  
Succeeded   : 21  
  
What changed?  
- runtime similar to second run, not as fast as run before (the run with -resume and a work directory)  
- no cached items  


## Lets look at the results

### What is differential abundance analysis?

A differential abundance analysis aims to identify features with significantly different amounts, f.e. to identify genes which are expressed to a significantly different amount between a test and a controul group. 

In the test pipeline, there are the two treatments mCherry and hND6; most significantly different expressed genes seem to come from the hND6 treatment, and are called Pou2af1, Jchain, Igkc, Ighg2b.

Give the most important plots from the report:


The most informative plots about which genes are differentially expressed are probably the volcano plots. On more information on gene counts or how the samples are related, other plots might be helpful.

The volcano plots:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)