# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

It's a community that collects pipelines built with the nf-core language. 

2. How many pipelines are there currently in nf-core?

Currently, there are 122 pipelines, but 66 are released.  

3. Are there any non-bioinformatic pipelines in nf-core?

There are non-bioinformatic pipelines in nf-core, i.e. for economics and astronomy. 

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

It's an automated structure consisting of modular steps. It calls existing tools to perform specific tasks. Information (input/output) is passed from one step to another according to a set of rules. We use the pipeline to automate processes and make it accessible for many people to use.

5. Why do you think nf-core adheres to strict guidelines?

The nf-core adheres to strict guidelines to maintain reproducibility, especially to obtain consistent results across different operating systems.  

6. What are the main features of nf-core pipelines?

The main features are: documentation, CI Testing, stable releases, packaged software, portable and reproducible and cloud-ready. 

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd

# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/Users/weronikajaskowiak/Desktop/practical_course_2


In [2]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

####1
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /Users/weronikajaskowiak/Desktop/practical_course_2

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Pulling nf-core/differentialabundance ...
 downloaded from https://github.com/nf-core/differentialabundance.git
Launching `https://github.com/nf-core/differentialabundance` [romantic_church] DSL2 - revision: 3dd360fed0 [master]
Downloading plugin nf-validation@1.1.3
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f

In [23]:
# repeat the run. ###2 
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /Users/weronikajaskowiak/Desktop/test_repeated

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [pensive_wescoff] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[

What did change? The second runtime was faster. Moreover, the physical memory usage and CPU usage were  different. 

In [13]:
# now set -resume to the command. ###3
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /Users/weronikajaskowiak/Desktop/test_repeated -resume

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [friendly_mahavira] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster

Check out the current directory. Next to the outdir you specified, what else has changed?

Next to the processes we have information about cached files. After opening the report, 1 file is succeeded (that one task was executed successfully during the current run and that task was not previously completed or was marked as failed in the last execution) and 20 is cached - the term "cached" indicates that the outputs from these tasks are stored and available for use without needing to reprocess them. The runtime was much shorter (around 30 sec).

In [14]:
# delete the work directory and run the pipeline again using -resume. What did change?
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /Users/weronikajaskowiak/Desktop/test_repeated_without_work -resume

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [nauseous_mcclintock] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmast

What changed? The run time was similar to the repeated one (second subtask). All files succeeded. When the work directory was deleted, nextflow cannot find any previously cached results. In this case, it treats the run as a fresh execution. All tasks will be executed anew. 

## Lets look at the results

### What is differential abundance analysis?

Differential abundance analysis is a method used to determine whether the abundance (presence and quantity) of specific feature or genes - differs significantly between two or more groups or conditions.

Give the most important plots from the report:

![alt text](/Users/weronikajaskowiak/Desktop/practical_course_2/plots/exploratory/treatment/png/boxplot.png "Title")

![alt text](/Users/weronikajaskowiak/Desktop/practical_course_2/plots/exploratory/treatment/png/density.png "Title")

![alt text](/Users/weronikajaskowiak/Desktop/practical_course_2/plots/exploratory/treatment/png/pca3d.png)