Skip to content

gateway process was crashed and not auto restart #6224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iamfengdy opened this issue Jan 10, 2025 · 7 comments
Closed

gateway process was crashed and not auto restart #6224

iamfengdy opened this issue Jan 10, 2025 · 7 comments
Labels

Comments

@iamfengdy
Copy link

to be honest, i really do not know how to dscribe this bug. The flow has run for many days,i found the problem in the log yesterday.
image
image
image

@JoanFM
Copy link
Contributor

JoanFM commented Jan 10, 2025

there seemed to be a conncetion error, but did the gateway process just die?

@JoanFM
Copy link
Contributor

JoanFM commented Jan 10, 2025

is it possible that u were running close to the OOM?

@iamfengdy
Copy link
Author

there seemed to be a conncetion error, but did the gateway process just die?

yes, i can not find the gateway process by ps command or port

is it possible that u were running close to the OOM?

i donot think so. i did not find OOM message in syslog

@JoanFM
Copy link
Contributor

JoanFM commented Jan 10, 2025

maybe the OS decided to remove the process because it was runnong low on memory?

@JoanFM
Copy link
Contributor

JoanFM commented Jan 10, 2025

when running Jina locally, it will not restart automatically

@jina-bot
Copy link
Member

@jina-ai/product This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days

@jina-bot jina-bot added the Stale label Apr 11, 2025
@jigarpatel1007
Copy link

I’d like to suggest a resolution direction that aligns with best practices in production-grade process supervision and high-availability design.


🧠 Root Cause Recap

It appears that the Jina Gateway process crashed silently after running for several days. There was:

  • No auto-restart behavior
  • No OOM indication in system logs
  • No longer visible gateway port or process via ps

This suggests the process failed without a recovery mechanism, especially when running Jina locally (non-K8s/Docker) where supervision isn't built-in.


✅ Recommended Resolution

To prevent similar crashes from halting long-running Flows, I propose:

1. Add Native Process Supervision Support in Local Mode

  • Integrate a lightweight watchdog (e.g., supervisor, watchdog, or a background Python thread) that periodically checks if the gateway process is alive and restarts it if it fails.
  • Alternatively, expose a --restart-on-failure flag when launching the gateway via CLI to wrap it in a resilient subprocess runner.

2. Provide systemd or pm2 Integration Scripts

  • For system-level deployments, provide sample systemd unit files or pm2 startup scripts to ensure that gateway restarts on crash or system reboot.

3. Log a Critical Alert + Exit Code

  • Ensure that if the gateway crashes (not via SIGINT or CTRL+C), a clear exit code and error log are captured. This helps identify silent terminations in logs and third-party monitoring tools.

Thanks.

@jina-bot jina-bot closed this as completed May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants