## Processing Biodata Using Conda, Jupyterlab and CARC OnDemand

#### Conda

Conda is a package and environment manager primarily used for open-source data science packages for the Python and R programming languages. It also supports other programming languages like C, C++, FORTRAN, Java, Scala, Ruby, and Lua.

#### Note: Commands from this notebook should be run in a terminal. 

#### Note: Setting up conda is a one-time process. Consecutive use requires only purging any loaded modules and activating the conda environment.

#### Using Conda on CARC systems

Go to `File`->`New`->`Terminal` and launch it, then use it to run the commands below.

To use Conda, first load the corresponding module:

This module is based on the minimal Miniconda installer which includes the package and environment manager Conda that installs and updates packages and their dependencies. This module also provides Mamba, which is a drop-in replacement for most conda commands that enables faster package solving, downloading, and installing.

The next step is to initialize your shell to use Conda and Mamba:

This modifies your ~/.bashrc file so that Conda and Mamba are ready to use every time you log in (without needing to load the module).

If you want a newer version of Conda or Mamba than what is available in the module, you can also install them into one of your directories. We recommend installing either [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Mambaforge](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html).

Conda can also be configured with various options. Read more about Conda configuration [here](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html).

#### Integrated development environments
JupyterLab, VSCode, RStudio, and other integrated development environments (IDEs) can be used on compute nodes via our [OnDemand](https://www.carc.usc.edu/user-guides/carc-ondemand) service. To install Jupyter kernels, see our guide [here](https://www.carc.usc.edu/user-guides/hpc-systems/software/jupyter-kernels.html).

### Installing Conda environments and packages
You can create new Conda environments in one of your available directories. Conda environments are isolated project environments designed to manage distinct package requirements and dependencies for different projects. We recommend using the `mamba` command for faster package solving, downloading, and installing, but you can also use the `conda` command.

The process for creating and using environments has a few basic steps:

Create an environment with `mamba create`
Activate the environment with `mamba activate`
Install packages into the environment with `mamba install`
To create a new Conda environment in your home directory, enter:

where `env_name` is the name your want for your environment. Then activate the environment:

Once activated, you can install packages in that environment. Software can be installed from various channels. The conda community maintains software channels. The most popular channels are `conda-forge` and `bioconda`. Most of the time, you can use `mamba` and `conda` commands interchangeably as they both have the same function and differ in the programming language in which they were written.

Let's install `python3.10` in the freshly created empty environment with `conda`:

The above command installed pip as a dependency. Pip is often used to install Python packages. However, pip does not maintain the Python environment and should be run in one.

We need to connect the new environment with this Jupyterlab, so we need to install the `ipykernel` package. This time let's use `mamba`:

Now, we need to run the following command to add the new environment to the list of jupyterlab kernels:

To find packages names, visit [anaconda.org](https://anaconda.org) and serach for the package. When you install a missing package in an environment connected to jupyterlab it is immediately available and you can just re-run the cell that displayed an error regarding the missing package.

1. `sra-tools`
2. `fastqc`
3. `openjdk`
4. `bwa`
5. `bcftools`
6. `vcftools`