# Installing Packages

If there is a Python or R package not available in the data science lab image, you can _temporarily_ install it.

You can do that by running a bash command inside a notebook cell that will call either the `mamba`, `conda` or `pip` commands.

## Installing packages with Anaconda

```
!conda install -y -c conda-forge <list-of-packages>
```
or
```
!mamba install -q -y -c conda-forge <list-of-packages>
```

You can search for packages in https://anaconda.org/

Prefer using packages from the `conda-forge` owner.

In [1]:
!mamba install -q -y -c conda-forge python-levenshtein fuzzywuzzy

  Package             Version  Build           Channel                   Size
───────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────

[32m  fuzzywuzzy        [00m   0.18.0  pyhd8ed1ab_0    conda-forge/noarch       22 KB
[32m  python-levenshtein[00m   0.12.2  py39h3811e60_0  conda-forge/linux-64     80 KB

  Summary:

  Install: 2 packages

  Total download: 102 KB

───────────────────────────────────────────────────────────────────────────────

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done


## Installing Python packages with pip

`!pip install <list-of-packages>`

You can search for packages in https://pypi.org/

In [2]:
!pip install "elasticsearch>=5.0.0,<6.0.0"

Collecting elasticsearch<6.0.0,>=5.0.0
  Downloading elasticsearch-5.5.3-py2.py3-none-any.whl (119 kB)
[K     |████████████████████████████████| 119 kB 29.9 MB/s eta 0:00:01
Installing collected packages: elasticsearch
  Attempting uninstall: elasticsearch
    Found existing installation: elasticsearch 7.13.4
    Uninstalling elasticsearch-7.13.4:
      Successfully uninstalled elasticsearch-7.13.4
Successfully installed elasticsearch-5.5.3


## Installing CRAN R packages with Anaconda

If there is R package that is exists in CRAN, but not in the Anaconda repository, you can install it by running the `install.packages` R command inside the conda command.

```
!mamba -q run R -e "install.packages(\
    '<package-name>',\
    lib='/opt/conda/lib/R/library',\
    repos='http://cran.us.r-project.org'\
)"
```

In [3]:
!mamba run R -e "install.packages(\
    'googleAnalyticsR',\
    lib='/opt/conda/lib/R/library',\
    repos='http://cran.us.r-project.org'\
)"


                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.14.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


R version 4.1.0 (2021-05-18) -- "Camp Pontanezen"
Copyright (C) 2021 The R Foundation for Statistical Computing


## Running anaconda command to install packages inside a R kernel

The Jupyter R kernel doesn't have the `!` magic cell that allows you to run terminal commands, it is available only for the Python kernel.

In that case, you should use the `system` R function. Simply create a cell and call the function inside it:
```
system("!mamba install -q -y -c conda-forge r-igraph r-multidplyr", intern=TRUE)
```

The `intern=TRUE` is to show the standard output of the command inside the notebook cell output.

## conda vs mamba

You can use the `conda` and `mamba` commands interchangeably. 

[Mamba](https://github.com/mamba-org/mamba) is a reimplementation of the conda package manager in C++, running much faster than the original conda implementation. 

**We highly recommend using `mamba` instead of `conda`, since it is much faster.**

## Anaconda vs pip

Installing packages with `mamba` is the preferred way since it is less prune to errors, because it has pre-compiled packages and will correctly solve dependencies (differently from `pip`), and will be much faster than `conda`.

Anaconda works not only for Python, but for any software (C, C++, Scala, R, Julia, etc).

The disadvantage of `conda` is that, depending on the complexity of the package, it might take some time to install. But this problem of performance is alleviated when using `mamba` instead of `conda`.

If you want to speed the installation of a **Python** package, you can use `pip`, just make sure that it will be correctly installed and will not break other packages.

## Persistency

After installing a package, it will be available to be loaded and used in all of your notebooks.

The package remains installed as long as the docker container is running. 

If the container is restarted, the environment will be reloaded to its default configuration, so the package will not be available anymore and you will have to reinstall it.

**We recommend that you create a cell in the top of your notebook with the `!mamba`, `!conda`, or `!pip` that installs all the extra libraries needed for your notebook to run.**

This way you guarantee that your notebook will correctly run all the times, even if it runs on a fresh environment.

## Requesting packages

If you have an idea of a package to be included by default in the data science lab image (so that you wouldn't need to install it) create an issue in the GitHub repository asking for it!

https://github.com/seek-ai/shared-datascience-lab/issues

## Updating all packages

It is possible to update all the anaconda packages in the environment with one command.

In [None]:
!mamba update -q -y --all

  Package                      Version  Build                   Channel                    Size
─────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
─────────────────────────────────────────────────────────────────────────────────────────────────

[32m  brotli-bin              [00m       1.0.9  h7f98852_5              conda-forge/linux-64      19 KB
[32m  debugpy                 [00m       1.4.1  py39he80948d_0          conda-forge/linux-64       2 MB
[32m  importlib_resources     [00m       5.2.2  pyhd8ed1ab_0            conda-forge/noarch        21 KB
[32m  libbrotlicommon         [00m       1.0.9  h7f98852_5              conda-forge/linux-64      65 KB
[32m  libbrotlidec            [00m       1.0.9  h7f98852_5              conda-forge/linux-64      33 KB
[32m  libbrotlienc            [00m       1.0.9  h7f98852_5              conda-forge/linux-64     286 KB
[32m  libsanitizer            [00m       9.4.0  h79bfe98_9