---
title: "Set Up a Local Development Environment for Machine Learning"
author: "Vahram Poghosyan"
date: "2023-10-28"
categories: ["Python", "Conda", "Quarto", "Machine Learning"]
image: "set_up_a_local_development_environment_for_ML.png"
repo-url: https://www.example.com
format:
  html:
    code-fold: false
    code-line-numbers: true
    code-tools:
      source: repo
jupyter: python3
---

In this post we'll be going over how to set up our local development environment for making machine learning applications and blogging about the process. This is my attempt at installing the required software packages on a Windows machine. I'll do my best to keep things general but some of the steps will be Windows specific. 

# Python and Conda Installation

Most systems these days come with Python pre-installed, however here are the steps to install Python from scratch.

## Installing Python - System Level
First, we download and install the latest version of Python for our OS from the [official website](https://www.python.org/downloads/).

::: {.callout-tip title="💡 Tip" appearance="minimal" collapse="true"}
Make sure to tick the "add to PATH" box during the installation so that the path of the Python executible is added to our system's `PATH` environment variable. The path in question, by default, is `~\AppData\Local\Programs\Python\Python311\python.exe`
:::

::: {.callout-important title="🔧 Troubleshooting" appearance="minimal" collapse="true"}
If the command `python` is unrecognized on Windows after installation, try `py`. We should be able to issue the command `py` to invoke the Python interpreter. Running `py --version` should return the version number (e.g. `Python 3.11.5`)
:::

## Why use Conda

While `pip` is Python's built-in package manager and `venv` is its built-in virtual environment manager, we use Conda because it attempts to do more than what `pip` and `venv` try to accomplish do individually by extending support to library dependencies not written in Python.

Occasionally, when a Conda distribution is not available, but an [PyPI](https://pypi.org/) distribution exists, it makes sense to combine use of `conda` and `pip`. This is done by:

1. Installing `pip` within a Conda environment: `conda install pip`
2. Installing the required package from inside the active Conda environment: `pip install <package_name>`

This way, the packages do *not* go to the system-level Python's packages directory `C:\Users\<username>\AppData\Local\Python\<version>\` (or `Roaming` instead of `Local`, if Python was installed only for a specific user on Windows). Instead, `pip` installs them in the Conda environment's `C:\ProgramData\anaconda3\Lib\site-packages` (or similar) package directory. We can check each package, along with its installation destination by running `pip list -v`. 

### Installing Anaconda Navigator (or Miniconda)
Next, download and install [Anaconda Navigator](https://www.anaconda.com/download) (or Miniconda, which installs the Conda scientific package and Python environment manager without additional software and without the GUI navigator). This installation includes tools like Jupyter Notebooks, Spyder, PyCharm, and other scientific packages and IDEs.

::: {.callout-note title="📖 Note" appearance="minimal" collapse="true"}
**Anaconda's built in Python distribution:** Anaconda comes with its own latest Python version distribution (by default installed into path `c:\ProgramData\anaconda3\python.exe`). The installer will prompt us to select an option which enables third-party editors, such as VSCode, to recognize this Python distribution.
:::

::: {.callout-note title="📖 Note" appearance="minimal" collapse="true"}
**Different Python distributions can live on the same machine:** Running `python --version` in the Anaconda Prompt returns `Python 3.11.4` as of the time of writing this, which is the version of Python that Conda installed in its `base` environment. Crucially, running `py --version`, even in the Anaconda Prompt, still returns `Python 3.11.5`, which is the system's version of Python.
:::

## Choosing the Right Python Kernel in VSCode
In VSCode, we can open the **Command Palette** and run the command **Notebook: Select Notebook Kernel**. At first, this will prompt us to install the Jupyter and Python VSCode extensions. Once that's done, we can rerun the command and select the Python kernel in the desired Conda environment (by default `base`).

## Initializing Conda in the Shell
Before we can use the full capabilities of Conda in the terminal, we need to initialize it by running the command:

```shell
conda init <bash|powershell|tsh|...> # Depending on the shell we're using
```
Restart your terminal for changes to take hold.  

::: {.callout-important title="🔧 Troubleshooting" appearance="minimal" collapse="true"}
For Windows users, Powershell may throw the following error in trying to load the user profile: `execution of scripts is disabled on this system`. This is Powershell's security measure against command hijacking, its way of enforcing control of execution and establishing identity. If this is the case, run `cmd.exe` as Administrator and execute command `powershell Set-ExecutionPolicy RemoteSigned -Scope CurrentUser`. We should now see the active environment in parentheses (e.g. `base`) to the left of the input in Powershell.
:::

## Conda Commands
Some common Conda commands are:

| Command | Description |
| ------- | ----------- |
| `conda env list` | Shows all the Conda environments (the active environment is marked with *) |
| `conda list` | Shows all ther packages installed in the currently active environment |
| `conda update --all` | Updates all packages in the active environment (frequently resolves `environment is inconsistent` errors) | 
| `conda info` | Shows, among other things, the directory where the environment is stored | 
| `conda activate <myenv>` | Activate environment `<myenv>` |
| `conda deactivate` | Deactivates the currently active environment |
| `conda create --name <myenv>` | Create a new empty environment |
| `conda create --name <myenv> --clone base` | Clone the base environment |
| `conda env export -f <path/to/envfile.yml>`| Export the package list of the active environment (e.g. `conda env export -f  /Users/<username>/Documents/MyFiles/personal-blog.yml`) |
| `conda compare <path/to/envfile.yml>` | Compare the active environment to the exported file of another environment |
| `conda remove --name <myenv> --all` | Deletes the environment |

### Comparing Conda Environments
Often we need to compare the packages between two environments. Here's the workflow to do that:

1. Activate one of the environments using `activate`

2. Export its package list using `export` as a `.yml` file to a destination of our choice

3. Activate the second environment

4. Execute the `compare` command, providing the path to the `.yml` file created in the previous step

# Installing Quarto for Blogging with Jupyter Notebooks (Optional)

This blog, as mentioned in its description, uses [Quarto](https://quarto.org/) to create its static pages from Jupyter Notebooks. We can install Quarto from [Conda Forge](https://anaconda.org/conda-forge/quarto) into our environment, but the support for the Conda distribution of Quarto seems to be lacking at the moment. Instead, we can install the Quarto CLI using the provided [installer](https://github.com/quarto-dev/quarto-cli/releases/download/v1.3.450/quarto-1.3.450-win.msi). 

Since Quarto freezes the output of the code blocks in Jupyter Noteboooks, we can run our notebooks locally in the `base` conda environment which has all the necessary scientific packages installed, then just use the globally installed Quarto CLI to preview, or publish our blog.