Skip to content

anomaly in worker_per_core in kubernetes cluster #1235

@sabuhish

Description

@sabuhish

Hello, first of all, I would like to thank for an awesome project that @tiangolo made and my sincere thanks for its contributors. As a team, we decided our microservices to be on FastAPi. It is great to say more than 15 applications are ready and we are heading to a difficult road currently I mean we are on deployment stage. So far everything works perfectly until deploying applications on Kubernetes.

We tried to read all the issues related to this topic, but unfortunately, we are still stuck with errors. Looking for hope from an experienced person to help us in this case.

For deploying we are using @tiangolo's tiangolo/uvicorn-gunicorn-fastapi:python3.7, below at first our dockerfile was looked like this.

FROM  tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY . /app

WORKDIR /app
ENV settings=prod
WORKERS_PER_CORE=2
 
RUN apt-get update -y &&  pip install --upgrade pip &&  \
    pip install -r requirements.txt && \
    apt-get install -y postgresql-client

The problem occurs after successfully running pod, workers booting itself like each 2-3 seconds and HTTP request is not able to handle by the application. Attaching the logs file below. It is weird. As the application did not start it raises error from Gino, Gino engine is not initialized. While the application is not yet ready requests comes in at this moment worker boots itself, totally weird. As ORM we use gino to make queries.

Gino errors as well.

[2020-04-10 10:31:48 +0000] [124] [INFO] Waiting for application startup.
{"loglevel": "info", "workers": 8, "bind": "0.0.0.0:80", "workers_per_core": 2.0, "host": "0.0.0.0", "port": "80"}                                                                                                
10.44.6.8
10.44.6.8:56568 - "POST /token/generator/default HTTP/1.1" 500
[2020-04-10 10:31:48 +0000] [127] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi                                                                                                   
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__                                                                                                         
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 140, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 134, in __call__
    await self.error_middleware(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/usr/local/lib/python3.7/site-packages/starlette_prometheus/middleware.py", line 47, in dispatch
    raise e from None
  File "/usr/local/lib/python3.7/site-packages/starlette_prometheus/middleware.py", line 43, in dispatch
    response = await call_next(request)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/cors.py", line 76, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/app/app/main.py", line 63, in dispatch
    response = await call_next(request)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/gino/ext/starlette.py", line 72, in __call__
    scope['connection'] = await self.db.acquire(lazy=True)
  File "/usr/local/lib/python3.7/site-packages/gino/api.py", line 520, in acquire
    return self.bind.acquire(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gino/api.py", line 540, in __getattribute__
    raise self._exception
gino.exceptions.UninitializedError: Gino engine is not initialized.
[2020-04-10 10:31:49 +0000] [127] [INFO] Application startup complete.
[2020-04-10 10:31:49 +0000] [125] [INFO] Started server process [125]
[2020-04-10 10:31:49 +0000] [125] [INFO] Waiting for application startup.
[2020-04-10 10:31:50 +0000] [126] [INFO] Application startup complete.

We thought to change timeout as some people suggested in some issues. so we did overwrite gunicorn file as well. But still, it was not helpful.
We did overwrite strat.sh and gunicorn.conf file and added WORKERS_PER_CORE.
gunicorn.conf:

import json
import multiprocessing
import os

workers_per_core_str = os.getenv("WORKERS_PER_CORE", "1")
web_concurrency_str = os.getenv("WEB_CONCURRENCY", None)
host = os.getenv("HOST", "0.0.0.0")
port = os.getenv("PORT", "80")
bind_env = os.getenv("BIND", None)
use_loglevel = os.getenv("LOG_LEVEL", "info")
if bind_env:
    use_bind = bind_env
else:
    use_bind = f"{host}:{port}"

cores = multiprocessing.cpu_count()
workers_per_core = float(workers_per_core_str)
default_web_concurrency = workers_per_core * cores
if web_concurrency_str:
    web_concurrency = int(web_concurrency_str)
    assert web_concurrency > 0
else:
    web_concurrency = max(int(default_web_concurrency), 2)

# Gunicorn config variables
loglevel = use_loglevel
workers = web_concurrency
bind = use_bind
keepalive = 120
errorlog = "-"
timeout=300

# For debugging and testing
log_data = {
    "loglevel": loglevel,
    "workers": workers,
    "bind": bind,
    # Additional, non-gunicorn variables
    "workers_per_core": workers_per_core,
    "host": host,
    "port": port,
}
print(json.dumps(log_data))

Dockerfile:

FROM  tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY . /app
COPY start.sh /
COPY gunicorn_conf.py /
WORKDIR /app

ENV settings=prod
#ENV WORKERS_PER_CORE=2
WEB_CONCURRENCY=1

RUN apt-get update -y &&  pip install --upgrade pip &&  \
    pip install -r requirements.txt && \
    apt-get install -y postgresql-client

And if we comment WORKERS_PER_CORE=2 and instead write ENV WEB_CONCURRENCY=1 app behaves normally.

Any suggestions, thanks in advance!
photo_2020-04-10_15-40-45

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions