Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling of upstream zone breaks upsync module #215

Closed
hadret opened this Issue May 28, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@hadret
Copy link

hadret commented May 28, 2018

There is a bug in memory handling when zone parameter is specified. Regardless of the zone's size parameter, upsync fails whenever change in the upstreams is introduced which that involves replacing, including new or removing old upstream server (adding or removing down flag does not introduce the bug). Way to reproduce the problem is relatively easy.

Sample upstream configuration:

upstream sample {
    zone upstream_grafana 64k;
    upsync 127.0.0.1:8500/v1/health/service/sample upsync_timeout=6m upsync_interval=5s upsync_type=consul_health strong_dependency=off;
    upsync_dump_path /tmp/sample.conf;
    include /tmp/sample.conf;
}

Once the above is set in config file, following workflow will reveal the bug:

# prepare dummy server to be replaced once data from consul is loaded:
echo "server 127.0.0.1:11111;" > /tmp/sample.conf
# start nginx
server nginx start

Once config from consul kicks in and dummy server is replaced with proper one, CPU usage goes up and nginx's workers are constantly exiting and respawning with following message in the error log:

[...]
2018/05/27 14:27:00 [alert] 19040#19040: worker process 19617 exited on signal 11
2018/05/27 14:27:06 [alert] 19040#19040: worker process 19618 exited on signal 11
2018/05/27 14:27:11 [alert] 19040#19040: worker process 19619 exited on signal 11
2018/05/27 14:27:17 [alert] 19040#19040: worker process 19624 exited on signal 11
2018/05/27 14:27:23 [alert] 19040#19040: worker process 19625 exited on signal 11
2018/05/27 14:27:29 [alert] 19040#19040: worker process 19626 exited on signal 11
2018/05/27 14:27:34 [alert] 19040#19040: worker process 19627 exited on signal 11
2018/05/27 14:27:40 [alert] 19040#19040: worker process 19628 exited on signal 11
2018/05/27 14:27:45 [alert] 19040#19040: worker process 19631 exited on signal 11
2018/05/27 14:27:50 [alert] 19040#19040: worker process 19632 exited on signal 11
2018/05/27 14:27:55 [alert] 19040#19040: worker process 19633 exited on signal 11
2018/05/27 14:28:01 [alert] 19040#19040: worker process 19634 exited on signal 11
2018/05/27 14:28:06 [alert] 19040#19040: worker process 19635 exited on signal 11
2018/05/27 14:28:12 [alert] 19040#19040: worker process 19636 exited on signal 11
2018/05/27 14:28:17 [alert] 19040#19040: worker process 19641 exited on signal 11
2018/05/27 14:28:23 [alert] 19040#19040: worker process 19642 exited on signal 11
2018/05/27 14:28:28 [alert] 19040#19040: worker process 19643 exited on signal 11
[...]

Once zone parameter is removed, everything gets back to normal state.

I have nginx compiled with debug flag and can provide more logs on request.

@xiaokai-wang

This comment has been minimized.

Copy link
Member

xiaokai-wang commented May 28, 2018

Not support zone command. Memory is managed in a different way.

@hadret

This comment has been minimized.

Copy link
Author

hadret commented May 28, 2018

OK, there really are no plans in implementing this? Reason I ask is rather simple -- not long ago nginx-vts has merged this change: vozlt/nginx-module-vts#112. We are using nginx-vts for metrics collection of our upstream servers and to have their state updated dynamically, we have to use zone in the upstream definition. If it's impossible to handle zone properly in upsync module, can you think of any workaround or alternative for reliable metrics collection of the upstream servers (200s, 300s, 400s, 500s etc.).

@xiaokai-wang

This comment has been minimized.

Copy link
Member

xiaokai-wang commented May 28, 2018

The nginx-vts is cool. Reqstat module is referred to.

@gfrankliu

This comment has been minimized.

Copy link
Collaborator

gfrankliu commented May 28, 2018

I tried reqstat from tengine with standard nginx, and it works fine:
https://github.com/gfrankliu/nginx-http-reqstat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.