Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why is there no pipenv / poetry for installing dependencies? #69

Closed
stratosgear opened this issue Sep 24, 2019 · 13 comments
Closed

Comments

@stratosgear
Copy link

I would assume that one or the other (or even a requirements.txt) would be used for setting up the python dependencies.

I've seen so MANY nice libraries/abstractions already used throughout this cookiecutter, that I'm surpised that there is no strict way of controlling python package dependency versions.

@dmontagu
Copy link
Contributor

dmontagu commented Sep 24, 2019

For what it's worth, I adapted this project to use poetry, and would recommend it! It's not really in a state that would be compatible with a pull request any more though 😑

@omrihar
Copy link

omrihar commented Sep 24, 2019

@dmontagu Could you provide more details about that? I currently use pipenv for the editor and add commands to the dockerfiles to install dependencies as required. In a previous project I used pipenv to install globally in the docker container. Is this similar to what you did?

@dmontagu
Copy link
Contributor

dmontagu commented Sep 24, 2019

@omrihar some discussion here of how I'm using pyproject.toml and poetry to generate a requirements.txt which I add to the container and use to install dependencies.

Basically, I use poetry export -f requirements.txt to generate a requirements.txt file I can install from for image caching, then add the project and run poetry install --no-dev to ensure its 100% consistent with my dev environment, minus the --no-dev flag (this is substantially faster than just doing the poetry install directly).

(I think I can actually drop the final poetry install step, but I ran into some inconsistencies in the past where it helped.)

@stratosgear
Copy link
Author

I use poetry locally, as per the tin box instructions, for local development and I have a set of dockerfiles instructions to install the python dependencies in the docker containers.

FROM some_image

ARG env=prod
ARG POETRY_VER=0.12.17

ENV APP_ENV=${env} \
    PYTHONFAULTHANDLER=1 \
    PYTHONUNBUFFERED=1 \
    PYTHONHASHSEED=random \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    POETRY_VERSION=${POETRY_VER}

RUN     pip install "poetry==$POETRY_VER"

WORKDIR /app/

# Copy only requirements to cache them in docker layer
COPY app/poetry.lock app/pyproject.toml /app/

RUN poetry config settings.virtualenvs.create false \
    && poetry install $(test $env = prod && echo "--no-dev") --no-interaction --no-ansi

# do more here

No need to go through a requirements.txt file.

If you would like to avoid installing poetry in the container, you can also generate a 'requirements.txt' file from inside a poetry shell with a:

pip freeze > requirements.txt

and use it with pip.

Latest poetry version (unreleased yet) should provide a way to generate a requirements file too (last I heard)

Bottom line is that I need to have full control of the version numbers being installed, and be sure that my builds are frozen on specific versions (that I have tested) and that accidental upstream changes will not creep in in later builds (unless I manually bump up version numbers, after testing them)

Been bitten many times from somepackage="*" declarations, than i now cringe when I see one!

@dmontagu
Copy link
Contributor

@stratosgear See [this issue](which I linked above) for a discussion about precisely why I want to use requirements.txt instead of poetry.lock and pyproject.toml -- that approach would bust the docker cache every time you bump the version of your project in pyproject.toml. There is a feature request in that issue to allow installation from only the lockfile, which would mitigate the problem.

Also related, if you don't use a poetry-generated virtual environment (I see you are skipping that step) it is difficult to run poetry install without running the container as root (due to poetry not using pip install --user). If you have a workaround for this I'd be interested to see it. (E.g., if the project is installed as root using poetry then the environment is chowned and/or copied as part of a multi-stage build or similar.)

@stratosgear
Copy link
Author

Wow that was a long thread. So, if I understood correctly, you are annoyed that everytime you make a change to the pyproject.toml file the docker build will take longer to complete because the packages will have to get reinstalled?

Yes, I am annoyed too, but never that much to change the above procedure too much. It is a tricky task that I never found an easy to follow instructions on how to do it. Just convoluted workarounds.

I do use the poetry generated virtual environment on my local development machine (along with some dev dependencies) but the production builds are done with --no-dev.

Regarding using root for poetry install in the containers, I never even thought about it. I am installing everything in this full-stack-fastapi-postgresql project with my poetry instructions from above and it seems to work fine (I think)

It would be sweet if there was an easy straightforward way of addressing all the issues that you mentioned, but the complexity level is too much for my taste. I wish there was a way to get notified when you find the gold recipe for all that.

@dmontagu
Copy link
Contributor

dmontagu commented Sep 25, 2019

Yes, I think you understand correctly.

The complexity level is too much for my taste too 😄.

I think the tweak to the process to do a pip install prior to poetry has already much-more-than-paid off for me, but maybe I do more builds (with changes to pyproject.toml) than most. (I also have hundreds of megs of dependencies due to scientific stack stuff, so it is painfully slow to re-download all packages.)

I also have some projects with "interesting" build processes (e.g., one that builds cython and cmake extensions); it is extremely painful to iteratively debug the build process if every change triggers a fresh pip install.

Also, I think using poetry install instead of pip install takes ~1-2 minutes longer per (uncached) build since it calls the pip install command for each package one at a time. Calling poetry install at the end, once the environment is (probably) correct only takes a few seconds, so most of the time is saved, but it can still fix things if the pip install didn't get it quite right.

@stratosgear
Copy link
Author

Help me get this: If you have already installed with pip, what does a poetry install, on a second phase, offer you?

And you haven't even touched the case where installing on an alpine base image, cannot install from wheels having to download and compile everything (numpy, healpy etc, been there done that (by biting the bullet and... waiting :( Multistage builds would surely help but I never invested)).

Do you have in any public repo any current implementation of your install procedure? I might be able to learn something from there...

@dmontagu
Copy link
Contributor

dmontagu commented Sep 25, 2019

  1. In the past, I've run into issues where the generated requirements.txt isn't exactly right, and poetry resolves the dependencies slightly different than pip. Including the poetry install ensures poetry has the final say on dependencies. I don't think I have any good examples of this now, but since I set it up before, there's some inertia causing me to keep things set up this way.

  2. More significantly, I only install dependencies using pip; calling poetry install also ensures any cython/pybind11 extensions that are a part of the project get built. My extension-building system is more or less specific to poetry, so in its current state I don't have a way to build it with non-poetry tools. (A refactor wouldn't be hard, but why bother?)

  3. I think this article does a good job explaining why one might want to avoid using alpine linux images. In general, I don't think the tradeoffs are worth it, especially given that I occasionally use custom C/C++ extensions and don't have the entire userbase of numpy to troubleshoot compilation issues 😄.


I don't have any public repos using this implementation, but here is the dockerfile that is used to build a common base image in one of my projects:

FROM python:3.7

RUN pip install --upgrade pip
RUN useradd -m worker \
 && mkdir /app \
 && chown -R worker:worker /app

USER worker

ENV POETRY_VERSION=1.0.0b1 \
    POETRY_VIRTUALENVS_CREATE=false \
    PYTHONFAULTHANDLER=1 \
    PYTHONUNBUFFERED=1 \
    PYTHONHASHSEED=random \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100

# Install Poetry, and set up PATH
# See https://github.com/sdispater/poetry/issues/1301 for pros/cons of this approach
RUN curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | POETRY_PREVIEW=1 python
ENV PYTHONPATH=/app \
    HOME=/home/worker \
    PATH="/home/worker/.local/bin:/home/worker/.poetry/bin:${PATH}"

WORKDIR /app

COPY --chown=worker:worker ./app/requirements/requirements-poetry.txt /app/requirements/
RUN pip install \
    --user \
    -r requirements/requirements-poetry.txt


COPY --chown=worker:worker ./app/requirements /app/requirements
RUN pip install \
    --user \
    --find-links=requirements/wheels \
    -r requirements/requirements.txt

COPY --chown=worker:worker ./app /app

I generate requirements-poetry.txt via:

poetry export -f requirements.txt > requirements_tmp.txt
mv requirements_tmp.txt requirements/requirements-poetry.txt

# Remove "extra" flags not used by pip:
sed -i "" 's/ extra == "[^"]*"//g' requirements/requirements-poetry.txt

The requirements.txt file just contains references to some locally built wheels. I sometimes need to do iterative development against these wheels, so I have them in a separate step to prevent bigger cache busts.

@dmontagu
Copy link
Contributor

dmontagu commented Sep 25, 2019

You'll notice the poetry install is conspicuously missing, given my comments above. I haven't found the time yet to get a dockerfile working with each of the following three conditions:

  1. Don't have poetry make a virtualenv
  2. Don't run as root
  3. Run poetry install after installing dependencies

(There is a known issue with poetry where it tries to install globally if no virtualenv is being used; this causes permissions errors if not running as root.)

The project using the dockerfile above doesn't have any extensions, so it works okay without the poetry install.

There are various ways to get the poetry install at the end, e.g. adding a placeholder pyproject.toml at the start of the build so that poetry can build the virtualenvironment (and it won't be lost when the cache is reset), but right now I think I'd have to drop the requirement tat poetry not make a virtualenv.

I also haven't tried using multistage builds for this stuff yet, but that also seems like it could be useful.

@stratosgear
Copy link
Author

For what it's worth, I am posting my updated version of how I deal with dependencies, in the hope that might help someone (or even better improve upon it)

I use poetry locally for development.

I have a script (I use invoke to automate many such little things) that autogenerates a requirement.txt from my poetry pyproject.toml everytime I update it:

poetry export -f requirements.txt -o requirements.txt --dev --without-hashes

The requirement.txt gets comitted along with my source code.

Then on my CI pipeline I use a multistage build as such

FROM python:3.7-slim AS python-deps

RUN python -m venv /opt/venv

# Make sure we use the virtualenv:
ENV PATH="/opt/venv/bin:$PATH"

COPY src/requirements.txt .
RUN pip install -r requirements.txt


#############
## Stage #2
##
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7 AS base

# Install Ubuntu packages:
# postgresql-client: Used by worker to COPY data through cli (faster)
RUN apt-get update && apt-get install -y --no-install-recommends \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Copy downloaded python dependency files from python-deps
COPY --from=python-deps /opt/venv /opt/venv

# Make sure we use the virtualenv:
ENV PATH="/opt/venv/bin:$PATH:."

# do more things in your base image if you have too
...
...

Then from that base image I use it to construct further images (with my fastapi server, or celery workers, etc)

Seems to be working fine so far, although it is indeed quite a process to get it right...

Thanks

@tiangolo
Copy link
Owner

Thanks for the discussion everyone! ☕

The latest version is now based on Poetry, for local development and integrated into the Dockerfiles 🚀 🎉

@github-actions
Copy link

Assuming the original issue was solved, it will be automatically closed now. But feel free to add more comments or create new issues.

br3ndonland added a commit to whythawk/full-stack-fastapi-postgresql that referenced this issue Jul 19, 2021
tiangolo#69
tiangolo#123
tiangolo#144
tiangolo/full-stack-fastapi-template@00297f9

Commit 00297f9 gitignored poetry.lock. This commit will add poetry.lock
to version control with Git to avoid dependency resolution errors during
Docker builds.

There is no established convention for working with Poetry in Docker,
so developers have to consider each use case individually. See:
python-poetry/poetry#1879 (comment)

In this project, the Dockerfile copies poetry.lock into the Docker image,
but there's no step to generate poetry.lock in the first place. Without
poetry.lock, dependency resolutions are commonly seen, such as:

```text
❯ bash scripts/test.sh

WARNING: The following deploy sub-keys are not supported and have been ignored: labels
WARNING: The following deploy sub-keys are not supported and have been ignored: labels
WARNING: The following deploy sub-keys are not supported and have been ignored: labels
WARNING: The following deploy sub-keys are not supported and have been ignored: labels
WARNING: The following deploy sub-keys are not supported and have been ignored: labels
db uses an image, skipping
flower uses an image, skipping
pgadmin uses an image, skipping
proxy uses an image, skipping
queue uses an image, skipping
Building backend
[+] Building 15.3s (8/10)
 => [internal] load build definition from backend.dockerfile                                      0.2s
 => => transferring dockerfile: 797B                                                              0.1s
 => [internal] load .dockerignore                                                                 0.1s
 => => transferring context: 2B                                                                   0.0s
 => [internal] load metadata for ghcr.io/br3ndonland/inboard:fastapi-python3.9                    0.3s
 => [1/6] FROM ghcr.io/br3ndonland/inboard:fastapi-python3.9@sha256:5591f436a37490a1569afd9e55ae  0.0s
 => [internal] load build context                                                                 0.0s
 => => transferring context: 64.67kB                                                              0.0s
 => CACHED [2/6] COPY ./app/pyproject.toml ./app/poetry.lock* /app/                               0.0s
 => CACHED [3/6] WORKDIR /app/                                                                    0.0s
 => ERROR [4/6] RUN bash -c "if [ true == 'true' ] ; then poetry install --no-root ; else poetr  14.4s
------
 > [4/6] RUN bash -c "if [ true == 'true' ] ; then poetry install --no-root ; else poetry install --no-root --no-dev ; fi":
 Skipping virtualenv creation, as specified in config file.
 Installing dependencies from lock file
 Warning: The lock file is not up to date with the latest changes in pyproject.toml. You may be getting   dependencies. Run update to update them.

   SolverProblemError

   Because app depends on sqlalchemy-stubs (^0.3) which doesn't match any versions, version solving failed.

   at /opt/poetry/lib/poetry/puzzle/solver.py:241 in _solve
       237│             packages = result.packages
       238│         except OverrideNeeded as e:
       239│             return self.solve_in_compatibility_mode(e.overrides, use_latest=use_latest)
       240│         except SolveFailure as e:
     → 241│             raise SolverProblemError(e)
       242│
       243│         results = dict(
       244│             depth_first_search(
       245│                 PackageNode(self._package, packages), aggregate_package_nodes
------
executor failed running [/bin/sh -c bash -c "if [ $INSTALL_DEV == 'true' ] ; then poetry install --no-root ; else poetry install --no-root --no-dev ; fi"]: exit code: 1
ERROR: Service 'backend' failed to build : Build failed
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants