# An (opinionated) survey of tools and best practices for packaging python code

# An example to guide our discussion

* We want to calculate the [Effective hard-sphere reduced density](https://en.wikipedia.org/wiki/Noro%E2%80%93Frenkel_law_of_corresponding_states#cite_note-2), $\rho^*_{eff}(T)$, of a fluid of particles which interact with the classical [Mie](https://en.wikipedia.org/wiki/Mie_potential) potential:

$$
V(r) = C \, \varepsilon \left[ \left(\frac{\sigma}{r} \right)^{n}-  \left( \frac{\sigma}{r}\right)^m \right]
$$

for different values of $m$ and $n$.  For this we need to calculate the effective diameter $\sigma_{eff}(T)$.  Then

$$
\rho^*_{eff}(T) = \rho \sigma_{eff}(T)^3
$$

* Luckily, there's already a package, [analphipy](https://github.com/usnistgov/analphipy), which can calculate `\sigma_{eff}(T)`,
  so all we have to do is the simple calculation above.
* The potential parameters are in a csv file `params.csv` of the form:

```csv
m,n,temp,dens
12,6,1.0,0.1
12,6,1.0,0.2
12,6,1.0,0.3
...
```

* One option is to write all the code we need in a jupyter notebook, and run it there.
* Another is to write a command line script/application to perform the calculation.

# Installing dependencies

* We'll need to install `analphipy` and any additional packages needed.
* You could install all of this in a mega-environment, but this is bad for a number of reasons
  - Python packages have many interdependencies.
  - For example, most of the scientific stack is based off `numpy`.
  - Recently, `numpy` updated to version `2.0`, with some breaking changes.  Some new packages are designed to work with only `numpy==2.0`, while others only work with `numpy<2.0`.
  - So if you install all your packages in one environment, you'll run into problems.
* Instead, it's better to create an isolated environment per project or related tasks.

In [1]:
# Example of an environment that doesn't work.
# This is a made up example, but you can run into this with mega-environments.
!conda create -p ./example_conda_env "python=3.12" "numpy>2.0" tensorflow

Channels:
 - conda-forge
Platform: osx-64
doneecting package metadata (repodata.json): - 
failedg environment: | 

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides _numpy_rc needed by numpy-2.1.0rc1-py310he367959_0

Could not solve for environment specs
The following packages are incompatible
├─ [32mnumpy >2.0 [0m is installable with the potential options
│  ├─ [32mnumpy [2.0.1|2.0.2][0m would require
│  │  └─ [32mpython_abi 3.9.* *_cp39[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython_abi 3.10.* *_cp310[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython_abi 3.11.* *_cp311[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython >=3.12,<3.13.0a0 [0m with the potential options
│  │     ├─ [31mpython 3.12.0[0m would require
│  │     │  └─ [31mpython_abi 3.12.* *_cp312[0m, whi

# Using conda

* First thing, don't use [anaconda](https://www.anaconda.com/download) distribution.  NIST isn't paying for it anymore, and it is bloated with packages you probably don't need.
* Instead, use [MiniForge](https://github.com/conda-forge/miniforge) or [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)
* On that note, stop using `conda` and switch to using `mamba`.  It's exactly the same, but faster.
* If using conda, you can (and should) use an environment.yaml file

In [2]:
!cat ../../requirements/py312-example-env.yaml

name: python-packaging-tools-example-env
channels:
  - conda-forge
dependencies:
  - python=3.12
  - analphipy
  - ipykernel
  - pandas
  - pip


In [17]:
!mamba env create -y -f ../../requirements/py312-conda-env.yaml

conda-forge/noarch                                          Using cache
conda-forge/osx-64                                          Using cache
[?25l[2K[0G[+] 0.0s
[2K[1A[2K[0G[?25h[?25l[2K[0G[?25h
[31m[1merror    libmamba[m Could not create directory '/usr/local/pkgs': Permission denied

Transaction

  Prefix: /Users/wpk/.conda/envs/python-packaging-tools-conda-env

  Updating specs:

   - python=3.12
   - analphipy
   - ipykernel
   - pandas


  Package                       Version  Build                 Channel           Size
───────────────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────────────

  [32m+ analphipy            [0m         0.4.1  pyhd8ed1ab_0          conda-forge       27kB
  [32m+ appnope              [0m         0.1.4  pyhd8ed1ab_1          conda-forge       10kB
  [32m+ asttokens            [0m         3.0.0  pyhd8ed1ab_1         

# Better yet, use a lock environment

* It's a good idea to lock you dependencies if you can.
* You'll know exactly what you're using
* You can regenerate you environment down the road
* You can share your environment

# What about docker?

* Docker is awesome, but you still need to specify what you want to put into it.  You can still end up in dependency hell

# If using conda, check out [conda-lock](https://github.com/conda/conda-lock)

* lock an `environment.yaml` file for different architectures.

```
conda-lock lock -f {environment file}  --lockfile {output-lock-file}
```

* Create an environment from lockfile using either `conda-lock` or `micromamba`
* There's also [pixi](https://github.com/prefix-dev/pixi)?

# An aside
* I recommend using a single environment to control jupyterlab/notebook.  
* To access a single kernel, you just include `ipykernel`.  
* If new environment is a conda environment, I recommend using [nb_conda_kernels](https://github.com/anaconda/nb_conda_kernels)
  - this will show you all the kernels installed via conda/mamba
* Otherwise, use something like the following from the new environment:
```bash
python -m ipykernel install --user --name {kernel-name} --display-name {display-name}

# Do the calculation in jupyter
[calculation](./calculation.ipynb)

# An alternative way to create environments
* There's a different type of python environment: virtual environments
* These use an already installed version of python, and use symlinks.
* Downside is that they can break if the python install linked against changes
* Plus is that they are very small, and very fast.
* Previously, you'd need to use something like `pyenv`, `brew`, etc to manage your python version
* Now, there is a one stop shop, [uv](https://docs.astral.sh/uv/) to handle virtual environments (and much more)