# 04.2 conda
aka
## Never touch _base_

Although Conda is not the official provider for python packages, it has gained a large following. The easy, out of the box way of installing most of the tools you need for data analysis is user friendly. You have a graphical interface with the navigator, if you need a more interactive way of doing things.

<div class="alert alert-warning">
    <b>For this demo I am assuming you have anaconda with the graphical user interface installed (it is possible you installed miniconda alone).<b><br>
    <b>The working rationale behind conda is based on pip. Some commands will be very similar.<b>
<\div>

On Windows, open your "Anaconda prompt". On a MAC or Linux, open a terminal.

If you installed Anaconda properly, you should (on Windows) see a __(base)__ string before the path.

The __(base)__ string indicates you are working in the default anaconda environment.

Imagine having a factory that can build trucks. You need the trucks to carry construction material from one place to another.

The trucks can be upgraded, disassembled, reduced, lent to other construction workers, etc..

However, the factory must be able to always produce another truck when you see fit.

In this allegory, the factory is the __(base)__ environment. Which must always be working. Your additional __virtual environments__ are the trcuks.

Again, as most examples you encounter over this course, this is not an oversimplification.

The inner workings and the methodology behind the functionality of some of the tools can get very complicated. Advanced programming concepts, databases, grapg theory, and so.

__HOWEVER__, these are tools. The creators of these tools intended them to be of easy access to the user. At the end of the day, if a tool does not have a good design, and it is not accessible, no one will use it.

<div class="alert alert-info">
    <b>Question: Conda is an attractive tool for Data Scientists to show off their skills. We discussed the Data Scientist's most valuable skill. What's the second most-valuable skill (debatable)?<b>
</div>

In [3]:
import matplotlib as mpl
mpl.__version__

'3.3.4'

In [4]:
import numpy as np
np.__version__

'1.20.0'

---

<div class="alert alert-info">
    <b>Let's do the demo in navigator first.<b>
<\div>

<div class="alert alert-warning">
    <b>The navigator is really heavy! But don't shut it down just yet!<b>
</div>

<div class="alert alert-success">
    <b>The command line is much faster. Let's do some more advanced things in the command line.<b>
<\div>

First, let's see what we have already installed on our system.

```shell
conda info --envs
```

Remember we are probably seeing different outputs for the last command. This depends on what you have worked so far.

On the prompt, let's create a new environment for ourselves to test.

```shell
conda create -n testenv
```

If you go to the navigator, you should see that a new virtual environment was created. It has almost nothing because you have not specified what you want to do with it.

This will remove the environment:

```shell
conda remove -n testenv --all
```

If you check the navigator again, testenv should be gone.

<div class="alert alert-success">
    <b>This exercise is also meant to familiarise you with the command line.<b>
</div>

Let's say we now have a project we want to develop to analyse some data. We know we will use pandas. Let's create a ```schoolgrade``` project with pandas.

```shell
conda create -n schoolgrade pandas
```
Examine what conda is proposing to install. Conda is proposing to __Download__ some packages and then tells you __Some NEW packages will be installed__. This outcome may vary from person to person, but what matters is that the end result is always the same: with the same command, you will end up with the same installed packages.

What about requirements? You see right away that numpy is there, as numpy is a requirement for pandas.

But python is also there (and pip).

To list all available version of a library, just do:

```shell
conda search numpy
```
Since the fundamental action of searching packages is available on conda, we will stick to conda for the rest of the class.

If you want to install a package, just do:
```shell
conda install numpy
```

if you want to install a specific package, you can:
```shell
conda install numpy=1.19
```
__IMPORTANT__: in pip you force a version with two equals "==", in conda with just one "=".

To remove that package:
```shell
conda remove numpy
```

You can install a list of packages in one go. As long as conda can find all of them, it installs them in one go.
```shell
conda install matplotlib jupyterlab
```

To see what you have under the hood:
```shell
conda info
```

At another ocasion (this might take too long for the class) you may wish to upgrade your conda distribution. Not the packages you have installed, but the conda manger itself. To do so:
```shell
conda upgrade (or update)
```

To clean unused packages and caches you can do
```shell
conda clean
```
But do so only if you have a very strong reason.

If you're lost, you can always do
```shell
conda help
```

Or go to [https://docs.conda.io/en/latest/](https://docs.conda.io/en/latest/)

---
## Saving time

<div class="alert alert-danger">
    <b>This step is not fundamental and requires a fairly recent anaconda installation on Windows. It should work on Mac and Linux.<b>
</div>

Some advanced IDEs are smart enough to know you have either pip or conda venvs. You can change the working venv inside your IDE. This is also possible if you are using jupyter-lab.

First, change into your working vevn and make sure you have the dependency:
```shell
conda activate venv
conda install -c anaconda ipykernel
```

Now you tell jupyter-lab where your venv is:
```shell
ipython kernel install --user --name="name to show on jupyterlab"
```

At his point, no matter where you run your jupyter-lab, it will always see the environment. You can just start a notebook from there.

To see what kernels you have installed 
```shell
jupyter kernelspec list
```

And should you wish to remove it
```shell
jupyter kernelspec uninstall kernel_to_delete_without_airquotes
```

---
## Sharing environments

Much like pip, conda can create "snapshots" of an environment. You can share the snapshot files with colleagues and create the same exact venv, so everybody is working with the exact same distribution of packages. You can avoid a lot of pain in a collaborative environment if everyone is using the exact same libraries.

To create an environment snapshot, activate the environment you wish to share and run:

```shell
conda env export > environment.yml
```

To create an environment file without the specific builds, just run:

```shell
conda env export --no-builds > environment.yml
```
However, there is still a prefix associated to how your filesystem has its directories organized. This could be a safety risk. Conda actually does not care about the last line, the prefix line. You could just build the package list (environment file) and remove the package list with a nice piece of shell scripting: 

Windows:
```shell
conda env export --no-builds | findstr -v "prefix:" > environment.yml
```

Mac OS and Linux:
```shell
conda env export --no-builds | grep -v "prefix:" > environment.yml
```

The later command will do a "reverse select": it will include every line that does not include the "prefix:" substring.

The output is a yaml file with the libraries of the active environment. You can name the yaml file as you want, but it is important it retains the extension, to warn another user it is a configuration file (yaml files are commonly configuration files, so this is a simple convention).

If you share the yaml file with a collegue, the colleague can reproduce your environment with:

```shell
conda env create -f environment.yml
```

Note that the created environment will have the same name as the environment your colleague created. If you want to change the name of environment, you must change the first line of the file.




---
## Conda channels

Not all packages are in the repositories of the Anaconda corporation. Some are in __conda-forge__, for example, others are in dedicated repositories. For conda, the repositories are called __channels__ (in git it would be different repositories).

As we seen in a previous class, __make__ is not available in the main channel, but someone contributed it to the __conda-forge__ channel. To install __make__ we needed to tell conda to specifically look for __make__ in a specific channel.

```shell
conda install -c conda-forge make
```

If you remember, the original instruction to install was

```shell
conda install sphinx sphinx_rtd_scheme make
```

But since __make__ is not part of the default channel, people without the instructions for conda to search other channels would get an error and conda would do absolutely nothing. To add __conda-forge__ to the list of channels that conda verifies by default, you can do:

```shell
conda config --add channels conda-forge
```

This will make conda-forge part of the channel list for conda to look for libraries.

When you add a new channel, it will have a higher priority that your current ones. This means that if a package with the same name exists in both __conda-forge__ and the __default__ channel, it will install the package it finds in __conda-forge__.

Since conda is also part social network, it is possible for you to create your own channel. You can share your packages with your colleagues with a conda install.

<div class="alert alert-warning">
    <b>In large companies, it is not uncommon for the company to have its own private package channel. You set it up much the same way you set up a local GitLab hosting service. Since you usually issue a standard laptop to employees, you only have to pre-compile the packages for the same chipset.<b>
</div>

<div class="alert alert-danger">
    <b>A package compiled on an IBM will probably not work on an Intel.<b>
</div>

---
## Mamba (Experimental)

Sometimes conda can take too long in the installation procedure. Mamba is a third-party alternative to conda __for the installation and removal of packages alone__. Remember, __mamba__ does not manage venvs (yet).

After creating a simple venv with just python=3.8 (for example), change into that venv. You can install __mamba__ right away with:

```shell
conda install -c conda-forge mamba
```

Mamba is an optimised implementation of conda in c++. It can download packages in parallel and it is much faster than conda. For very large projects it might come in handy. Use mamba for installations as you would use conda. Remember, this is still experimental, but it can speed up you installs by a lot. But if you somehow ruin the venv, it is easy to recover.

---
## Finally, an exercise!

### Tensorflow

[Tensorflow](https://www.tensorflow.org/) is one of the most popular open-source machine learning libraries, on par with [PyTorch](https://pytorch.org/).

We want to start to learn how to use Tensorflow, but the installation process is complicated. Whilst installing Tensorflow we might also "break" something from our base installation.

<div class="alert alert-info">
    <b>Exercise: Create a new virtual environment with Tensorflow. Make sure you have python 3.8 and jupyterlab 3.0. Also install numpy and matplotlib (don't specificy a version). You can do this together with your group.<b>
</div>

After you install Tensorflow, create a directory for the project. Download the demo [notebook](https://www.tensorflow.org/tutorials/keras/classification). As soon as you are operating inside that environment, you can run the notebook.