Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zackees committed Feb 7, 2024
1 parent 7d75fae commit 570dc26
Showing 1 changed file with 29 additions and 93 deletions.
122 changes: 29 additions & 93 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,27 @@
# isolated-environment

[![Linting](https://github.com/zackees/isolated-environment/actions/workflows/lint.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/lint.yml)
[![MacOS_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_macos.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_macos.yml)
[![Ubuntu_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_ubuntu.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_ubuntu.yml)
[![Win_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_win.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_win.yml)

![image](https://github.com/zackees/isolated-environment/assets/6856673/8dab37f1-0c6e-42ec-9680-2013287baa98)

# Summary

Got pinned dependencies in your python package that make it hard to install? Use isolated-environment to package those up in a runtime `venv` that only your package has access to.

This is a package isolation library designed originally for AI developers to solve the problems
of AI dependency conflicts introduced by the various `pytorch`/`tensorflow`/etc incompatibilities within and between AI apps.

[![Linting](https://github.com/zackees/isolated-environment/actions/workflows/lint.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/lint.yml)
[![MacOS_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_macos.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_macos.yml)
[![Ubuntu_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_ubuntu.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_ubuntu.yml)
[![Win_Tests](https://github.com/zackees/isolated-environment/actions/workflows/push_win.yml/badge.svg)](https://github.com/zackees/isolated-environment/actions/workflows/push_win.yml)

*Install*
```bash
pip install isolated-environment
```

It moves the install of your chosen dependencies from **install time** to **runtime**. The benefit of this is that you can query the system
and make choices on what needs to be installed. For example in `pip` you can't conditionally install packages based on whether `nvidia-smi` has
been installed (indicating `cuda` acceleration), but with `isolated-environment` this is straightfoward.

It also works for any other complex dependency chain. I made this library because `conda` has significant problems and messes up the system
on Windows with its own version of git-bash, standard `pip` doesn't support
implicit `--extra-index-url` so pretty much all AI apps have non-standard install processes. This really sucks. This library
fixes all of this so that complex AI apps can simply be installed with plain old `pip`.

Instead of having your complex, version conflicting dependencies in your `requirements.txt` file, you'll move it to the runtime.

This also allows your dependency chain to be installed lazily. For example, maybe your front end app has multiple backends (like `transcribe-anything`)
and are dependent on whether `cuda` is installed on the system or not. With this library you can query the runtime and decide what you want to
install.

For example, if the computer supports cuda you may want to install `pytorch` with cuda support, a multi-gigabyte download. However
if you are running the app on a CPU only machine you may opt for the tiny cpu only `pytorch`.

In plain words, this package allows you to install your AI apps globally without having to worry about `pytorch`
dependency conflicts.

# Example:


*Runtime*
```python
# Example of running "whisper --help" in an isolated-environment
from pathlib import Path
import subprocess
from isolated_environment import isolated_environment_run
Expand All @@ -64,75 +45,30 @@ cp: subprocess.CompletedProcess = isolated_environment_run(
print(cp.stdout)
```

# Why not just use `venv` directly?

You can! But this package is a better abstraction and solves the platform specific footguns that `venv` makes you go through to work correctly on all platforms.


# Background

After making my first major AI project `transcribe-anything` I quickly learned that `pytorch` has a lot of different versions of
its library and globally installing the package is an absolute nightmare, especially on Windows. The major problem is that out
of the box in Windows, pytorch does not support `cuda` acceleration, you have to use `pip` with an `--extra-index-url` parameter. If this isn't
done right the first time, you will get a cpu-only version of pytorch which is tricky to remove from the `site-packages` directory, requiring
the developer to `pip uninstall` all packages using `pytorch` and then purge the `pip` cache.

This is a real world example of how I was able to purge the cpu-pytorch from Windows, which took me a lot of trial and error to figure out.

*Without this library, you would have to do something like this to purge cpu-pytorch from the global `site-packages`*
For example, here are options for installing a different version of pytorch depending on the runtime environment:

```python
uninstall = [
"torch",
"torchtext",
"torchdata",
"torchaudio",
"torchvision",
"torch-directml"
# This generates an environment that should be passed to subprocess.run(...)
def get_environment() -> dict[str, Any]:
"""Returns the environment suitable for subprocess.run(..., env=env,...)"""
venv_dir = HERE / "venv" / "whisper"
deps = [
"openai-whisper",
]
for package in uninstall:
subprocess.run(["pip", "uninstall", "-y", package], check=True)
subprocess.run(["pip", "cache", "purge"], check=True)
if has_nvidia_smi():
deps.append( # This computer has nvidia cuda installed so install cuda torch.
f"torch=={TENSOR_VERSION}+{CUDA_VERSION} --extra-index-url {EXTRA_INDEX_URL}"
)
else:
# Install CPU version.
deps.append(f"torch=={TENSOR_VERSION}")
env = isolated_environment(venv_dir, deps)
return env
```
...yuck

This means that if I install one tool and force the correct dependencies in, another tool relying on those dependencies will **BREAK**.

# Isn't this just yet another package manager?

If this is a package manager, then so is bash and cmd.exe. Let's get real here. Also, if this library was part of the standard, we might
not have needed `conda` or `pipx` or any of the other alt package managers to fill in the gaps of `pip`.

## `isolated-environment` vs `pipx`

`pipx` seems like a great solution but has major downsides. One downside is that `pipx` is pretty global, it's wants to install a tool
in a global directory and link it to your local bin, which requires a restart or manually adding the path. Also, if you are depending on
two different versions of a tool, then there are going to be conflicts. Additionally, the tool in the `pipx` directory becomes independent
of the package that installed it and requires its own uninstall step, which must be performed manually. And one last final issue with `pipx`
is that creating a virtual environment requires at least one package, before injecting other packages into it. Working around this issue
would require someone to create a dummy package just to get the initial virtual environment constructed, before injecting packages into it. This is a big issue with
`whisper` for example, which requires that cuda-pytorch be installed first, to skip the cpu-pytorch it installs by default.

So given all of these limitations of `pipx`, I created this `isolated-environment` library which solves all of these problems, specifically:

1. The virtual environment name and path can be specified by our code, and is initially empty, as God intended it.
2. The virtual environment can live within your `site-packages` directory, so if you uninstall your package then the isolated environment will be removed as well.

This solves the problem for `transcribe-anything` and now all AI dependencies can be installed during runtime in a private environment only accessible
to its package that will be uninstalled if the tool is uninstalled. This means no conflicts with other libs due to `pytorch` cpu vs gpu installs.

The result was pure bliss. You can now install `transcribe-anything` in your global `python`/`pip` directory without having to be concerned
about global conflicts with `pytorch`. As far as I know, no other AI tool does this.

I hope that `isolated-environment` will help you write great AI software without all of the conflicts that currently plague the python ecosystem that every other AI python tool seems to suffer from.

# The downsides

The downside is that it gets a bit trickier to access the tool installed in an `isolated-environment`. For example, installing `transcribe-anything` no longer globally installs
`whisper`, which means to test out `whisper` I have to `cd` to the correct private environment and activate it before invoking the tool.

Another downside, but this also exists within `pipx` is that you can't directly call into Python code within the `isolated-environment`. The only interface that can be used
at this point are command-based apis (anything that `subprocess.run` can invoke). But this is typical of all code that is isolated in its own environment.
It moves the install of your chosen dependencies from **install time** to **runtime**. The benefit of this is that you can query the system
and make choices on what needs to be installed. For example in `pip` you can't conditionally install packages based on whether `nvidia-smi` has
been installed (indicating `cuda` acceleration), but with `isolated-environment` this is straightfoward.

# Development

Expand Down

0 comments on commit 570dc26

Please sign in to comment.