Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running task with importlib crashes workers #10

Closed
SansDK opened this issue Jun 4, 2019 · 3 comments
Closed

Long running task with importlib crashes workers #10

SansDK opened this issue Jun 4, 2019 · 3 comments

Comments

@SansDK
Copy link

SansDK commented Jun 4, 2019

Hey,

I'm running this image on a Kubernetes cluster and I have a simple FastAPI app in place. Everything basically works, except one thing. I'm using importlib to import some self made python file / lib as following:

    try:
        if 'CustomModule' not in sys.modules:
            sys.path.append('Subfolder/')
            main = importlib.import_module("CustomModule")
        else:
            main = importlib.reload("CustomModule")

        ...
    except:
        ...

This import runs a lot of code which takes a lot of time. But still, this is working fine when I'm running FastAPI locally, with this in the code:

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=80)

But this seems to crash when doing this same thing in combination with Gunicorn (so, this image). Logs from the kubernetes pod:

[2019-06-04 09:36:49 +0000] [1] [DEBUG] Current configuration:
  config: /gunicorn_conf.py
  bind: ['0.0.0.0:80']
  backlog: 2048
  workers: 2
  worker_class: uvicorn.workers.UvicornWorker
  threads: 1
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 30
  graceful_timeout: 30
  keepalive: 120
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /app
  daemon: False
  raw_env: []
  pidfile: None
worker_tmp_dir: None
  user: 0
  group: 0
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: None
  disable_redirect_access_to_syslog: False
  access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
  errorlog: -
  loglevel: debug
  capture_output: False
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  statsd_prefix: 
  proc_name: None
  default_proc_name: main:app
  pythonpath: None
  paste: None
on_starting: <function OnStarting.on_starting at 0x7f7568b5ee18>
  on_reload: <function OnReload.on_reload at 0x7f7568b5ef28>
  when_ready: <function WhenReady.when_ready at 0x7f75688d30d0>
  pre_fork: <function Prefork.pre_fork at 0x7f75688d31e0>
  post_fork: <function Postfork.post_fork at 0x7f75688d32f0>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f75688d3400>
  worker_int: <function WorkerInt.worker_int at 0x7f75688d3510>
  worker_abort: <function WorkerAbort.worker_abort at 0x7f75688d3620>
  pre_exec: <function PreExec.pre_exec at 0x7f75688d3730>
  pre_request: <function PreRequest.pre_request at 0x7f75688d3840>
  post_request: <function PostRequest.post_request at 0x7f75688d38c8>
  child_exit: <function ChildExit.child_exit at 0x7f75688d39d8>
  worker_exit: <function WorkerExit.worker_exit at 0x7f75688d3ae8>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f75688d3bf8>
  on_exit: <function OnExit.on_exit at 0x7f75688d3d08>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: TLSv1
  raw_paste_global_conf: []
[2019-06-04 09:36:49 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-06-04 09:36:49 +0000] [1] [DEBUG] Arbiter booted
[2019-06-04 09:36:49 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2019-06-04 09:36:49 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2019-06-04 09:36:49 +0000] [8] [INFO] Booting worker with pid: 8
[2019-06-04 09:36:49 +0000] [9] [INFO] Booting worker with pid: 9
[2019-06-04 09:36:49 +0000] [1] [DEBUG] 2 workers
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 09:36:58 +0000] [9] [INFO] Started server process [9]
[2019-06-04 09:36:58 +0000] [9] [INFO] Waiting for application startup.
[2019-06-04 09:36:58 +0000] [8] [INFO] Started server process [8]
[2019-06-04 09:36:58 +0000] [8] [INFO] Waiting for application startup.
[2019-06-04 09:37:07 +0000] [34] [INFO] ('172.17.0.1', 49573) - "GET /docs HTTP/1.1" 200
[2019-06-04 09:37:08 +0000] [34] [INFO] ('172.17.0.1', 49573) - "GET /openapi.json HTTP/1.1" 200
Endpoint called
Reached function
Module is not in sysmodules
Executing sys append
[2019-06-04 09:37:15 +0000] [46] [INFO] Booting worker with pid: 46
Endpoint called
Reached function
Module is not in sysmodules
Executing sys append
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 09:37:23 +0000] [46] [INFO] Started server process [46]
[2019-06-04 09:37:23 +0000] [46] [INFO] Waiting for application startup.
[2019-06-04 09:37:25 +0000] [50] [INFO] Booting worker with pid: 50
[2019-06-04 09:37:25 +0000] [46] [DEBUG] ('172.17.0.1', 49580) - Connected
Endpoint called
Reached function
Module is not in sysmodules
Executing sys append
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 09:37:33 +0000] [50] [INFO] Started server process [50]
[2019-06-04 09:37:33 +0000] [50] [INFO] Waiting for application startup.
[2019-06-04 09:37:34 +0000] [54] [INFO] Booting worker with pid: 54
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 09:37:38 +0000] [54] [INFO] Started server process [54]
[2019-06-04 09:37:38 +0000] [54] [INFO] Waiting for application startup.

On the /docs page, I get to see TypeError: Failed to fetch after the workers restart.

I really don't know why this is happening. It looks like Importlib is not throwing any exception, and Gunicorn is also not logging any Critical Timeout error or anything like that.

I would greatly appreciate any help, thanks!

@SansDK
Copy link
Author

SansDK commented Jun 4, 2019

I rewrote the module which I previously imported with importlib, so that I can import it at the beginning of the file without it executing all code.
Now, when I call an endpoint, I call a method in the imported module ('run()') which executes all the (long running) code.

The result is still the same. I also already tried making everything async / await with no luck.
This is the log with just 1 worker when calling the endpoint which crashes the worker:

[2019-06-04 12:05:11 +0000] [8] [INFO] ('172.17.0.1', 53413) - "GET /info HTTP/1.1" 200
Start training
Called long-running method 'run()' in module
[2019-06-04 12:05:20 +0000] [17] [INFO] Booting worker with pid: 17
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 12:05:27 +0000] [17] [INFO] Started server process [17]
[2019-06-04 12:05:27 +0000] [17] [INFO] Waiting for application startup.
[2019-06-04 12:05:27 +0000] [17] [DEBUG] ('172.17.0.1', 53429) - Connected
Start training
Called long-running method 'run()' in module
[2019-06-04 12:05:33 +0000] [21] [INFO] Booting worker with pid: 21
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
[2019-06-04 12:05:39 +0000] [21] [INFO] Started server process [21]
[2019-06-04 12:05:39 +0000] [21] [INFO] Waiting for application startup.

@SansDK
Copy link
Author

SansDK commented Jun 4, 2019

Seems that it was a (out-of-)memory problem which caused the workers to be restarted. It was really difficult to debug because it wasn't indicated anywhere, but in hindsight it sounds very logical and I should've paid attention to that before.

For anyone who is running into similar problems; check your memory usage and allocation.

@SansDK SansDK closed this as completed Jun 4, 2019
@tiangolo
Copy link
Owner

tiangolo commented Jun 4, 2019

Thanks for reporting back and closing the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants