# Mitochondria removal tutorial (QIIME2 Command Line Interface)

This tutorial will cover how to use the preconstructed extended taxonomies from Sonett et al. [preprint](https://www.biorxiv.org/content/10.1101/2021.02.23.431501v2) to remove mitochondria from 16S rRNA gene sequence data in QIIME2 using the QIIME2 Amplicon Command Line Interface

If you need to build your own, custom extended taxonomy, see the extended [taxonomy creation tutorial here](extended_taxonomy_construction_tutorial.ipynb)

We will use as an example a 16S rRNA amplicon dataset studying how the activity of certain innate immunity proteins interacted with microbiome structure in corals affected by the disease Montipora White Syndrome (MWS). For the experimental design, see [Brown *et al*., 2021](https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15899). In this case, we are just using these data to give an example of mitochondria removal, so you don't need to read the paper in order to do the tutorial.


## Requirements


It's assumed that QIIME2 amplicon is installed. If you don't have QIIME2, see installations [here](https://docs.qiime2.org/2023.9/install/). You can run this tutorial from a Jupyter notebook in Terminal on Mac, any BASH command line interface in Linux or the Windows Subsystem for Linux as long as QIIME2 is installed.

## Activating your QIIME2 virtual environment
You have to activate your QIIME2 virtual environment before doing the steps of this tutorial.

I usually do this by first reminding myself of which virtual environments I have available:

`conda env list`

Here's an example of the output in my case:

`# conda environments:
`

`
base                    *  /Users/zaneveld/opt/anaconda3
`

`
qiime2-amplicon-2023.9     /Users/zaneveld/opt/anaconda3/envs/qiime2-amplicon-2023.9
`

If I wanted to activate the `qiime2-amplicon-2023.9` virtual environment, I would then use: 

`conda activate qiime2-amplicon-2023.9`



## Arrangement of tutorial files

It's  assumed that you've downloaded the zipped tutorial with all files in a single folder, and that this script is within that provided procedure folder (where it starts). 

Given all that, this tutorial will discuss how to use the the supplemented databases to remove mitochondria from your 16S datasets using QIIME2. 

You can check that your input files are in ther right place by running `ls`. You should see the following files:

`ls`

```
feature_table_live_vs_dead.qza
organelle_removal_CLI.ipynb
organelle_removal_cli.sh
rep_seqs_merged.qza
sample_metadata_live_vs_dead_combo.tsv
taxonomy_references
```

# Classify taxonomy with VSEARCH
In order to remove mitochondria and chloroplast sequences, we must first classify the taxonomy of all 16S rRNA sequences in the library to figure out which derive from organelles. In this tutorial, we'll use VSEARCH to align our 16S sequences to the extended reference database. VSEARCH is included with QIIME2 and so you shouldn't need to install any additional software to run it.
We will use vsearch to annotate taxonomy. This will be done once for each of the refernce taxonomies :  Silva, and Silva + MeTaxa2 + phytoref reference mitocondrial sequences. 

We'll run VSEARCH first using the 'traditional' base SILVA classifier.

**Note:** On a full dataset this step can take a while to run. Adjusting the vsearch threads parameter can help speed up the process if you have enough memory to support multiple threads. (About 8GB / thread has worked for us as a rule of thumb, but the scaling may not be totally linear - see https://forum.qiime2.org/t/vsearch-classifier-memory/8667/5 for discussion).

```
#base silva 138 annotation

qiime feature-classifier classify-consensus-vsearch \
  --i-query rep_seqs_merged.qza\
  --i-reference-reads taxonomy_references/silva_sequences.qza \
  --i-reference-taxonomy taxonomy_references/silva_taxonomy.qza \
  --p-threads 4 \
  --o-classification silva_classification_taxonomy.qza \
  --o-search-results silva_classification_results.qza
```

When that finishes, we'll repeat the command but use the extended SILVA classifier

```
qiime feature-classifier classify-consensus-vsearch\
--i-query rep_seqs_merged.qza \
--i-reference-reads taxonomy_references/silva_extended_sequences.qza\
--i-reference-taxonomy taxonomy_references/silva_extended_taxonomy.qza\
--p-threads 4 \
--o-classification silva_extended_classification_taxonomy.qza\
--o-search-results silva_extended_classification_results.qza`
```

# Remove mitochondria from samples

Now that we have created study-specific taxonomic annotations (e.g. `silva_extended_classification_taxonomy.qza`), we can use them to filter our feature tables to remove mitochondria or chloroplast 16S sequences using QIIME2.



### Generate filtered feature tables tables with organelle sequences removed.

Next we'll use the QIIME2 filter_table command to remove features that are annotated as mitochondria or chloroplasts according to each taxonomy, and output filtered feature tables.

```
#filter the feature table with each classification taxonomy to remove organelles

#base silva 138 filtering
qiime taxa filter-table \
--i-table feature_table_live_vs_dead.qza \
--i-taxonomy silva_classification_taxonomy.qza \
--p-exclude 'mitochondria,chloroplast' \
--o-filtered-table feature_table_live_vs_dead_silva_filtered.qza 


#extended silva 138 filtering
qiime taxa filter-table \
--i-table feature_table_live_vs_dead.qza \
--i-taxonomy silva_extended_classification_taxonomy.qza \
--p-exclude 'mitochondria,chloroplast' \
--o-filtered-table feature_table_live_vs_dead_silva_extended_filtered.qza
```

## Visualize the difference

Now that we've removed the mitochondria, let's see how our estimates of the relative abundance of microbial taxa vary based on whether we used the base SILVA or extended taxonomy.

If you don't see a difference, that's not unexpected - our work found that many studies were minimally affected, though some had vast changes in "Unassigned" annotations.

```
#base silva 138 barplot

qiime taxa barplot \
--i-table feature_table_live_vs_dead_silva_filtered.qza \
--i-taxonomy silva_classification_taxonomy.qza \
--m-metadata-file sample_metadata_live_vs_dead_combo.tsv \
--o-visualization feature_table_live_vs_dead_silva_filtered_barplot.qzv
```

```
#extended silva 138 barplot

qiime taxa barplot \
--i-table feature_table_live_vs_dead_silva_extended_filtered.qza \
--i-taxonomy silva_extended_classification_taxonomy.qza \
--m-metadata-file sample_metadata_live_vs_dead_combo.tsv \
--o-visualization feature_table_live_vs_dead_silva_extended_filtered_barplot.qzv
```

Now we can open a Qiime Zipped Visualization viewer by going to [view.qiime2.org](view.qiime2.org), and either opening the QZV files. 