# Conda - A package and environment management system for Python
[Conda](https://docs.conda.io/projects/conda/en/latest/index.html) is an open source package and environment management system that runs on Windows, macOS and Linux. Conda installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer or HPC. While Conda was created for Python programs it can package and distribute software for any languages such as R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN. We encourage the use of Conda as a development tool for building and sharing project specific software environments that facilitate reproducible (data) science workflows.

Material credit to the Introduction to Conda for (Data) Scientists [link text](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/) in The Carpentries Incubator

## Getting Started with Conda
From the official Conda documentation. Conda is an open source package and environment management system that runs on Windows, Mac OS and Linux.

Conda can quickly install, run, and update packages and their dependencies.
Conda can create, save, load, and switch between project specific software environments on your local computer.
Although Conda was created for Python programs, Conda can package and distribute software for any language such as R, Ruby, Lua, Scala, Java, JavaScript, C, C++, FORTRAN.
Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager, because Conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment.

### Conda vs. Miniconda vs. Anaconda
![CondaVsMiniCondaVsAnaconda](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/fig/miniconda_vs_anaconda.png)
Users are often confused about the differences between Conda, Miniconda, and Anaconda. Conda is a tool for managing environments and installing packages. Miniconda combines Conda with Python and a small number of core packages; Anaconda includes Miniconda as well as a large number of the most widely used Python packages.

### Why should I use a package and environment management system?
Installing software is hard. Installing scientific software is often even more challenging. In order to minimize the burden of installing and updating software (data) scientists often install software packages that they need for their various projects system-wide.

Installing software system-wide has a number of drawbacks:

- It can be difficult to figure out what software is required for any particular research project.
- It is often impossible to install different versions of the same software package at the same time.
- Updating software required for one project can often “break” the software installed for another project.
Put differently, installing software system-wide creates complex dependencies between your reearch projects that shouldn’t really exist!

Rather than installing software system-wide, wouldn’t it be great if we could install software separately for each research project?

### Environment management
An environment management system solves a number of problems commonly encountered by (data) scientists.

- An application you need for a research project requires different versions of your base programming language or different versions of various third-party packages from the versions that you are currently using.
- An application you developed as part of a previous research project that worked fine on your system six months ago now no longer works.
- Code that have written for a joint research project works on your machine but not on your collaborators’ machines.
- An application that you are developing on your local machine doesn’t provide the same results when run on your remote cluster.
An environment management system enables you to set up a new, project specific software environment containing specific Python versions as well as the versions of additional packages and required dependencies that are all mutually compatible.

- Environment management systems help resolve dependency issues by allowing you to use different versions of a package for different projects.
- Make your projects self-contained and reproducible by capturing all package dependencies in a single requirements file.
- Allow you to install packages on a host on which you do not have admin privileges.

### Package management
A good package management system greatly simplifies the process of installing software by…

1. identifying and installing compatible versions of software and all required dependencies.
1. handling the process of updating software as more recent versions become available.

If you use some flavor of Linux, then you are probably familiar with the package manager for your Linux distribution (i.e., apt on Ubuntu, yum on CentOS); if you are a Mac OSX user then you might be familiar with the Home Brew Project which brings a Linux-like package management system to Mac OS; if you are a Windows OS user, then you may not be terribly familiar with package managers as there isn’t really a standard package manager for Windows (although there is the Chocolatey Project).

Operating system package management tools are great but these tools actually solve a more general problem than you often face as a (data) scientist. As a (data) scientist you typically use one or two core scripting languages (i.e., Python, R, SQL). Each scripting language has multiple versions that can potentially be installed and each scripting language will also have a large number of third-party packages that will need to be installed. The exact version of your core scripting language(s) and additional, third-party packages will also probably change from project to project.

### Why use Conda (+pip)?
Whilst there are many different package and environment management systems that solve either the package management problem or the environment management problem, Conda solves both of these problems and explicitly targeted at (data) science use cases.

- Conda provides prebuilt packages, avoiding the need to deal with compilers, or trying to work out how exactly to set up a specific tool. Fields such as Astronomy use conda to distribute some of their most difficult-to-install tools such as IRAF. TensorFlow is another tool where to install it from source is near impossible, but Conda makes this a single step.
- Conda is cross platform, with support for Windows, MacOS, GNU/Linux, and support for multiple hardware platforms, such as x86 and Power 8 and 9. In future lessons we will show how to make your environment reproducible (reproducibility being one of the major issues facing science), and Conda allows you to provide your environment to other people across these different platforms.
- Conda allows for using other package management tools (such as pip) inside Conda environments, where a library or tools is not already packaged for Conda (we’ll show later how to get access to more conda packages via channels).
Additionally, Anaconda provides commonly used data science libraries and tools, such as R, NumPy, SciPy and TensorFlow built using optimised, hardware specific libraries (such as Intel’s MKL or NVIDIA’s CUDA), which provides a speedup without having to change any of your code.

### Key Points
- Conda is a platform agnostic, open source package and environment management system.
- Using a package and environment management tool facilitates portability and reproducibility of (data) science workflows.
- Conda (+pip) solves both the package and environment managment problems and targets multiple programming languages. Other open source tools solve either one or the other, or target only a particular programming language.

## What is a Conda environment
A [Conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) is a directory that contains a specific collection of Conda packages that you have installed. For example, you may be working on a research project that requires NumPy 1.18 and its dependencies, while another environment associated with an finished project has NumPy 1.12 (perhaps because version 1.12 was the most current version of NumPy at the time the project finished). If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them.

#### Avoid installing packages into your `base` Conda environment
Conda has a default environment called `base` that include a Python installation and some core system libraries and dependencies of Conda. It is a **“best practice” to avoid installing additional packages into your `base`** software environment. Additional packages needed for a new project should always be installed into a newly created Conda environment.

### Creating environments
To create a new environment for Python development using `conda` you can use the `conda create` command.
```
$ conda create --name python3-env python pip
```
### Always install `pip` in your Python environments
[Pip](https://pip.pypa.io/en/stable/), the default Python package manager, is often already installed on most operating systems (where it is used to manage any packages need by the OS Python). Pip is also included in the Miniconda installer. Including `pip` as an explicit dependency in your Conda environment avoids difficult to debug issues that can arise when installing packages into environments using some other `pip` installed outside your environment.

It is a good idea to give your environment a meaningful name in order to help yourself remember the purpose of the environment. While naming things can be difficult, `$PROJECT_NAME-env` is a good convention to follow.

The command above will create a new Conda environment called “python3” and install the most recent version of Python. If you wish, you can specify a particular version of packages for `conda` to install when creating the environment.

```
$ conda create --name python36-env python=3.6 pip=20.0
```
### Specify a version number for each package you wish to install
In order to make your results more reproducible and to make it easier for research colleagues to recreate your Conda environments on their machines it is a “best practice” to always explicitly specify the version number for each package that you install into an environment. If you are not sure exactly which version of a package you want to use, then you can use search to see what versions are available using the `conda search` command.
```
$ conda search $PACKAGE_NAME
```
So, for example, if you wanted to see which versions of [Scikit-learn](https://scikit-learn.org/stable/), a popular Python library for machine learning, were available, you would run the following.
```
$ conda search scikit-learn
```
As always you can run `conda search --help` to learn about available options.

You can create a Conda environment and install multiple packages by simply listing the packages that you wish to install.
```
$ conda create --name basic-scipy-env ipython=7.13 matplotlib=3.1 numpy=1.18 pip=20.0 scipy=1.4
```
When `conda` installs a package into an environment it also installs any required dependencies. For example, even though Python is not listed as a packaged to install into the `basic-scipy-env` environment above, `conda` will still install Python into the environment because it is a required dependency of at least one of the listed packages.

### Activating an existing environment
Activating environments is essential to making the software in environments work well (or sometimes at all!). Activation of an environment does two things.

- Adds entries to PATH for the environment.
- Runs any activation scripts that the environment may contain.

Step 2 is particularly important as activation scripts are how packages can set arbitrary environment variables that may be necessary for their operation. Aou activate the `basic-scipy-env` environment by name using the `activate` command.
```
$ conda activate basic-scipy-env
```
You can see that an environment has been activated because the shell prompt will now include the name of the active environment.
```
(basic-scipy-env) $
```
### Deactivate the current environment
To deactivate the currently active environment use the deactivate command as follows.
```
(basic-scipy-env) $ conda deactivate
```
You can see that an environment has been deactivated because the shell prompt will no longer include the name of the previously active environment.
```
$
```
### Returning to the base environment
To simply return to the `base` Conda environment, it’s better to call `conda activate` with no environment specified, rather than to use `deactivate`. If you run `conda deactivate` from your `base` environment, you may lose the ability to run `conda` commands at all. Don’t worry if you encounter this undesirable state! Just start a new shell.



### Installing a package into an existing environment
You can install a package into an existing environment using the `conda install` command. This command accepts a list of package specifications (i.e., `numpy=1.18`) and installs a set of packages consistent with those specifications and compatible with the underlying environment. If full compatibility cannot be assured, an error is reported and the environment is not changed.

By default the `conda install` command will install packages into the current, active environment. The following would activate the `basic-scipy-env` we created above and install Numba, an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code, into the active environment.
```
$ conda activate basic-scipy-env
$ conda install numba
```
As was the case when listing packages to install when using the `conda create` command, if version numbers are not explicitly provided, Conda will attempt to install the newest versions of any requested packages. To accomplish this, Conda may need to update some packages that are already installed or install additional packages. It is always a good idea to explicitly provide version numbers when installing packages with the `conda install` command. For example, the following would install a particular version of Scikit-Learn, into the current, active environment.
```
$ conda install scikit-learn=0.22
```
### Freezing installed packages
To prevent existing packages from being updating when using the `conda install` command, you can use the `--freeze-installed` option. This may force Conda to install older versions of the requested packages in order to maintain compatibility with previously installed packages. Using the `--freeze-installed` option does not prevent additional dependency packages from being installed.

### Where do Conda environments live?
Environments created with conda, by default, live in the `.conda/envs/` folder of your DartFS home directory the relative path to which will look something the following.
```
$ ~/.conda/envs
```
Running `ls` on your `~/.conda/envs` directory will list out the directories containing the existing Conda environments.


### Listing existing environments
Now that you have created a number of Conda environments on your local machine you have probably forgotten the names of all of the environments and exactly where they live. Fortunately, there is a `conda` command to list all of your existing environments together with their locations.
```
$ conda env list
```
### Listing the contents of an environment
In addition to forgetting names and locations of Conda environments, at some point you will probably forget exactly what has been installed in a particular Conda environment. Again, there is a conda command for listing the contents on an environment. To list the contents of the basic-scipy-env that you created above, run the following command.
```
$ conda list --name basic-conda-env
```

### Deleting entire environments
Occasionally, you will want to delete an entire environment. Perhaps you were experimenting with `conda` commands and you created an environment you have no intention of using; perhaps you no longer need an existing environment and just want to get rid of cruft on your machine. Whatever the reason the command to delete an environment is the following.
```
$ conda remove --name my-first-conda-env --all
```
If you wish to delete and environment that you created with a --prefix option, then you will need to provide the prefix again when removing the environment.
```
$ conda remove --prefix /path/to/conda-env/ --all
```

### Key Points
- A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.

- You create (remove) a new environment using the conda create (conda remove) commands.

- You activate (deactivate) an environment using the conda activate (conda deactivate) commands.

- You install packages into environments using conda install; you install packages into an active environment using pip install.

- You should install each environment as a sub-directory inside its corresponding project directory

- Use the conda env list command to list existing environments and their respective locations.

- Use the conda list command to list all of the packages installed in an environment.