# An (opinionated) survey of tools and best practices for packaging python code

# TL;DR

* If possible, use a lock file
* Prefer python virtual env (and `uv` in particular) over conda
* If using `conda`, try to use `miniforge` (with `mamba`) or `micromamba` (also check out `pixi`)
* Use `ruff` for linting and formatting code
* Use `pre-commit` to control all linters
* Check out `mypy`/`pyright`
* Pretty simple to make code installable from from `pip`/`uv`/`conda`


# An simple example to play with

* We want to calculate the second virial coefficient of square-well fluids for a table of parameters.

$$
B_2(T) = 2 \pi/ 3 [1 + (1 - \exp(-\epsilon / kT)) (\lambda^3 - 1)]
$$

* The potential parameters are in a csv file `params.csv` of the form:

```csv
sig,eps,lam,temp
1.0,-1.0,0.1,0.5
1.0,-1.0,0.1,1.0
...
```

* We'll use pandas to read the csv file (not strictly needed, but need a simple dependency)
* Calculation options
    1. Use a jupyter notebook
    2. Use a script
    3. Make an application

# Using a notebook with an environment
click [here](./calculation.ipynb)

# Installing dependencies

* How did we get the environment to run the notebook?
* We need to install `pandas`, etc.
* You could install all of this in a mega-environment, but this is bad for a number of reasons
  - Python packages have many interdependencies.
  - For example, most of the scientific stack is based off `numpy`.
  - Recently, `numpy` updated to version `2.0`, with some breaking changes.  Some new packages are designed to work with only `numpy==2.0`, while others only work with `numpy<2.0`.
  - So if you install all your packages in one environment, you'll run into problems.
* Instead, it's better to create an isolated environment per project or related tasks.

In [1]:
# Example of an environment that doesn't work.
# This is a made up example, but you can run into this with mega-environments.
!conda create -p ./example_conda_env "python=3.12" "numpy>2.0" tensorflow

Channels:
 - conda-forge
Platform: osx-64
doneecting package metadata (repodata.json): - 
failedg environment: | 

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides _numpy_rc needed by numpy-2.1.0rc1-py310he367959_0

Could not solve for environment specs
The following packages are incompatible
├─ [32mnumpy >2.0 [0m is installable with the potential options
│  ├─ [32mnumpy [2.0.1|2.0.2][0m would require
│  │  └─ [32mpython_abi 3.9.* *_cp39[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython_abi 3.10.* *_cp310[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython_abi 3.11.* *_cp311[0m, which can be installed;
│  ├─ [32mnumpy [2.0.1|2.0.2|...|2.2.3][0m would require
│  │  └─ [32mpython >=3.12,<3.13.0a0 [0m with the potential options
│  │     ├─ [31mpython 3.12.0[0m would require
│  │     │  └─ [31mpython_abi 3.12.* *_cp312[0m, whi

# Using conda

* First thing, don't use [anaconda](https://www.anaconda.com/download) distribution.  NIST isn't paying for it anymore, and it is bloated with packages you probably don't need.
* Instead, use [MiniForge](https://github.com/conda-forge/miniforge) or [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)
* On that note, stop using `conda` and switch to using `mamba`.  It's exactly the same, but faster.
* If using conda, you can (and should) use an environment.yaml file

In [11]:
!cat example-conda-env/py312-example-env.yaml

name: python-packaging-tools-example-dev
channels:
  - conda-forge
dependencies:
  - python=3.12
  - ipykernel
  - pandas
  - pip


In [12]:
!mamba env create -y -f example-conda-env/py312-example-env.yaml

[?25l[2K[0G[+] 0.0s
[2K[1A[2K[0G[?25h[?25l[2K[0G[+] 0.0s
[2K[1A[2K[0G[+] 0.1s
conda-forge/noarch [38;2;146;054;157m[48;2;146;054;164m━━━━━╸[0m[1m[2m[5m[8m[00m[10m━━━━━━━━━━━━━━━╸[0m[38;2;146;054;157m[48;2;146;054;164m━━━[0m   0.0 B /  ??.?MB @  ??.?MB/s  0.0s
[2K[1A[2K[1A[2K[0G[+] 0.2sm[2m[4m[9m[10m[48;2;000;000;000m---------------[0m   0.0 B /  ??.?MB @  ??.?MB/s  0.0s
conda-forge/noarch [38;2;146;054;157m[48;2;146;054;164m━━━━━━━━━━━━━━━━━━━━━━━━━[0m 687.4kB /  19.2MB @   4.1MB/s  0.1s
[2K[1A[2K[1A[2K[0G[+] 0.3s------------  14.1kB /  36.7MB @  84.9kB/s  0.1s
conda-forge/noarch [134m━━━╸[0m[38;2;146;054;157m[48;2;146;054;164m━━━━━━━━━━━━━━━━━━━━━[0m   3.4MB /  19.2MB @  12.0MB/s  0.2s
[2K[1A[2K[1A[2K[0G[+] 0.4s[00m[50m-[0m------------------------   1.5MB /  36.7MB @   5.5MB/s  0.2s
conda-forge/noarch [134m━━━━━━╸[0m[38;2;146;054;157m[48;2;146;054;164m━━━━━━━━━━━━━━━━━━[0m   5.5MB /  19.2MB @  14.2MB/s  0.3s
[2K[1A[

# Better yet, use a lock environment

* By locking, we mean pinning dependencies to a particular version.
* You'll know exactly what you're using
* Makes regenerating the environment simpler down the road
* Makes sharing/distributing the environment cleaner.

# What about docker?

* Docker is awesome, but you still need to specify what you want to put into it.  You can still end up in dependency hell

# If using conda, check out [conda-lock](https://github.com/conda/conda-lock)

* lock an `environment.yaml` file for different architectures.

```
conda-lock lock -f {environment file}  --lockfile {output-lock-file}
```

* Create an environment from lockfile using either `conda-lock` or `micromamba`
* There's also [pixi](https://github.com/prefix-dev/pixi)?

# An aside
* I recommend using a single environment to control jupyterlab/notebook.  
* To access a single kernel, you just include `ipykernel`.  
* If new environment is a conda environment, I recommend using [nb_conda_kernels](https://github.com/anaconda/nb_conda_kernels)
  - this will show you all the kernels installed via conda/mamba
* Otherwise, use something like the following from the new environment:
```bash
python -m ipykernel install --user --name {kernel-name} --display-name {display-name}

# An alternative way to create environments
* There's a different type of python environment: virtual environments
* These use an already installed version of python, and use symlinks.
* Downside is that they can break if the python install linked against changes
* Plus is that they are very small, and very fast.
* Previously, you'd need to use something like `pyenv`, `brew`, etc to manage your python version
* Now, there is a one stop shop, [uv](https://docs.astral.sh/uv/) to handle virtual environments (and much more)

# Better still, use `uv lock`

* `uv` is setup to be a package manage for python package development
* Create package repo (`uv init ...`)
* Add dependencies (`uv add ...`)
* Lock dependencies (`uv lock ...` and `uv sync ..`)
* Run commands (`uv run ...`)

# Linters/code formatters
* Linters and code formatters are great!
* Don't worry about how your code looks (tabs vs spaces,  Multiline formatting, etc).  The linter takes care of this for you.
* The defacto code formatting standards *were*:
  - [Black](https://github.com/psf/black)
  - [flake8](https://flake8.pycqa.org/en/latest/)
* But the new standard is [ruff](https://docs.astral.sh/ruff/)

# There are linters/formatters for just about everything
* Make them a part of your workflow using [pre-commit](https://pre-commit.com/)
* This has the advantage of versioning tools.

# Using templates
* When starting a new project (not just python, but also latex, data project, etc), consider using:
  - [cookiecutter](https://cookiecutter.readthedocs.io/en/stable/)
  - [cruft](https://github.com/cruft/cruft)
  - [copier](https://copier.readthedocs.io/en/stable/)
* I use my own template [cookiecutter-nist-python](https://github.com/usnistgov/cookiecutter-nist-python) with cruft
```bash
cruft create --checkout develop https://github.com/usnistgov/cookiecutter-nist-python.git
```

# Test your code

* Use [pytest](https://docs.pytest.org/en/stable/)
* Check out [nbval](https://nbval.readthedocs.io/en/latest/) for regression testing with jupyter notebooks.
* Use [nox](https://nox.thea.codes/en/stable/) or [tox](https://tox.wiki/en/4.24.1/) to test across python version.
* I personally use nox, because it uses python script as the config file.  Makes extending to do new/interesting things easy
* `tox` is very powerful.  It uses config files for setup.


# Managing tooling
* There are a host of python tools we've discussed already (formatters, pre-commit, cookiecutter/cruft/copier)
* How do you interact with these?
* Recommend using `uv tool install` and/or `uv tool run`, or equivalently, `uvx` (see [here](https://docs.astral.sh/uv/concepts/tools/))

In [13]:
!which cowsay

cowsay not found


In [14]:
!uvx cowsay -t "Hello from uvx"

  ______________[2m                                                                              [0m
| Hello from uvx |
              \
               \
                 ^__^
                 (oo)\_______
                 (__)\       )\/\
                     ||----w |
                     ||     ||


In [16]:
!uvx --from="git+https://github.com/wpk-nist-gov/python-packaging-tools.git@develop"  sw-second-virial ../examples/data/params.csv

100%|█████████████████████████████████████████| 12/12 [00:00<00:00, 8119.32it/s]                 [0m
    sig  eps  lam  temp   dens_eff
0   1.0 -1.0  0.1   0.5  15.462222
1   1.0 -1.0  0.1   1.0   5.689557
2   1.0 -1.0  0.1   2.0   3.451715
3   1.0 -1.0  0.5   0.5  13.802952
4   1.0 -1.0  0.5   1.0   5.243311
5   1.0 -1.0  0.5   2.0   3.283239
6   1.0 -1.0  1.0   0.5   2.094395
7   1.0 -1.0  1.0   1.0   2.094395
8   1.0 -1.0  1.0   2.0   2.094395
9   1.0 -1.0  2.0   0.5 -91.574060
10  1.0 -1.0  2.0   1.0 -23.096932
11  1.0 -1.0  2.0   2.0  -7.416355


# Running scripts with `uv`
* If you have one off python scripts, you don't need an environment at all!
* Simply use `uv run {script}` (see [here](https://docs.astral.sh/uv/guides/scripts/))

# Use a good editor

* If you don't have strong opinions, it's probably best to use vscode
* Regardless, try to use language server like pyright