Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The value of total connections is wrong #35

Closed
tkgeng opened this issue Mar 11, 2016 · 7 comments
Closed

The value of total connections is wrong #35

tkgeng opened this issue Mar 11, 2016 · 7 comments
Labels

Comments

@tkgeng
Copy link

tkgeng commented Mar 11, 2016

Hi, I'm back

The new problem is that the value of total connections in Server main of the status webpage is too big, it's wrong, it should be less than 15k, there's not so many connections in this one of our systems. after I restart Nginx, it's back to normal(about 10000 connections), but after about a week, it's into wrong again.

BTW, when it's in the abnormal state, the data of original module stub_status_module of Nginx is also abnormal, of course, they are equal.

SS:

561818f4-3cce-49e4-8513-2d40d5e475cf

nginx-module-vts:

d81a4716-353d-4543-8af1-005cbf410f9c

stub_status_module:

d55a9022-68e2-4718-9a0a-73427f584f35

@vozlt
Copy link
Owner

vozlt commented Mar 14, 2016

Hi, I’d like to ask you some questions.

  1. Do you use a SSL(https)?
  2. Is there anything unusual log in error_log(e.g. repeatedly)?

@tkgeng
Copy link
Author

tkgeng commented Mar 14, 2016

OH, my god, it maybe a serious problem, looks like a shared memory operation error:

  1. Don't.
  2. When the value of total connections is wrong, a error was repeatedly logged to ngx's error_log:
    2016/03/14 17:25:17 [alert] 2465#0: worker process 31188 exited on signal 11

The PID in error_log was keep changing.

I also find a error log below:
2016/03/11 16:14:27 [alert] 2395#0: *664085514 open socket #2213 left in connection 2186
2016/03/11 16:14:27 [alert] 2395#0: aborting

Looks like it happened when I restarted Nginx Last week.

@vozlt
Copy link
Owner

vozlt commented Mar 16, 2016

The abnormal values probably seem to happen because of the failure to decrease the status variable values. In other words, It seems to due to the terminating of worker process by signal 11 before running decrease process. By the way, more important thing is why worker process exited on signal 11.

Status variables

The status variables used in ngx_http_stub_status_module and nginx-module-vts are a shared memory variables(atomic). Some variables are only increased without decrease as follows:

stats increase decrease
accepted yes no
handled yes no
requests yes no
active yes yes
reading yes yes
writing yes yes
waiting yes yes

Increase function

(void) ngx_atomic_fetch_add($ngx_stat_variable, 1);

Decrease function

(void) ngx_atomic_fetch_add($ngx_stat_variable, -1);

Some questions

  1. Does the error(exited on signal 11) frequently appears?
  2. How about the system status when logged in error_log?(e.g memory usage, loadavg)
  3. How about the Req/s when logged in error_log if peak than usual?
  4. Is there anything unusual log in /var/log/messages when logged in error_log(e.g. memory...)?
  5. Does it also happen in mainline version(1.9.x)?
  6. Please try again backtrace(core dump)

backtrace

Check max core file size

See the nginx debugging.

  1. Please check if the soft limit of max core file size is 0 in worker process.
    It can see in /proc/$(worker_process_id)/limits.

    $ cat /proc/$(ps aux | awk '/worker/ {print $2}' | sed -n 1p)/limits

    Limit                     Soft Limit           Hard Limit           Units
    .
    .
    Max core file size        0                    unlimited            bytes
    .
    .
  2. If so, fix(> 0 or unlimited) and restart and check it.

    $ ulimit -S -c unlimited
    $ /usr/sbin/nginx -c /etc/nginx/nginx.conf
    $ cat /proc/$(ps aux | awk '/worker/ {print $2}' | sed -n 1p)/limits
    Limit                     Soft Limit           Hard Limit           Units
    .
    .
    Max core file size        unlimited            unlimited            bytes
    .
    .
    
  3. If you succeed, it looks like as follows:
    2016/... [alert] ... : worker process ... exited on signal 11 (core dumped)

Script to check max core file size

$ vi nginx-core-limit-check.sh

#! /bin/env bash

echo "# Max core file size of nginx worker process for core dump(must be soft > 0)"
printf "%-16s%-16s%-16s\n" "PID" "Soft Limit" "Hard Limit"
for pid in $(echo /proc/[0-9]*)
do
    [ -e "$pid/cmdline" ] && cmdline=$(<$pid/cmdline)
    if [[ $cmdline == *"nginx: worker"* ]]
    then
        limits=$(<$pid/limits)
        core_soft_limit=$(awk '/core/ {print $5" "$6}' $pid/limits)
        printf "%-16s%-16s%-16s\n" $pid $core_soft_limit
    fi  
done

$ bash nginx-core-limit-check.sh

# Max core file size of nginx worker process for core dump(must be soft > 0)
PID             Soft Limit      Hard Limit       
/proc/24262     unlimited       unlimited       
/proc/24263     unlimited       unlimited       

Source trace

Traced the nginx core source(1.9.12)

The signal 11 is as follows:

$ man 7 signal | grep 11
SIGSEGV      11       Core    Invalid memory reference

Trace the message exited on signal 11

2016/03/14 17:25:17 [alert] 2465#0: worker process 31188 exited on signal 11

$ vi src/os/unix/ngx_process.c

452 static void
453 ngx_process_get_status(void)
454 {
.
.
516 #ifdef WCOREDUMP
517             ngx_log_error(NGX_LOG_ALERT, ngx_cycle->log, 0,
518                           "%s %P exited on signal %d%s",
519                           process, pid, WTERMSIG(status),
520                           WCOREDUMP(status) ? " (core dumped)" : "");
521 #else
522             ngx_log_error(NGX_LOG_ALERT, ngx_cycle->log, 0,
523                           "%s %P exited on signal %d",
524                           process, pid, WTERMSIG(status));
525 #endif

Trace the message left in connection

2016/03/11 16:14:27 [alert] 2395#0: *664085514 open socket #2213 left in connection 2186
2016/03/11 16:14:27 [alert] 2395#0: aborting

$ vi src/os/unix/ngx_process_cycle.c

 726 static void
 727 ngx_worker_process_cycle(ngx_cycle_t *cycle, void *data)
 728 {
 .
 .
 737 
 738     for ( ;; ) {
 739 
 740         if (ngx_exiting) {
 741             ngx_event_cancel_timers();
 742 
 743             if (ngx_event_timer_rbtree.root == ngx_event_timer_rbtree.sentinel)
 744             {
 745                 ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0, "exiting");
 746 
 747                 ngx_worker_process_exit(cycle);
 748             }
 749         }
 750 
 751         ngx_log_debug0(NGX_LOG_DEBUG_EVENT, cycle->log, 0, "worker cycle");
 752 
 753         ngx_process_events_and_timers(cycle);
 754 
 755         if (ngx_terminate) {
 756             ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0, "exiting");
 757 
 758             ngx_worker_process_exit(cycle);
 759         }
 .
 .
 946 static void
 947 ngx_worker_process_exit(ngx_cycle_t *cycle)
 948 {
 .
 .
 958     if (ngx_exiting) {
 959         c = cycle->connections;
 960         for (i = 0; i < cycle->connection_n; i++) {
 961             if (c[i].fd != -1
 962                 && c[i].read
 963                 && !c[i].read->accept
 964                 && !c[i].read->channel
 965                 && !c[i].read->resolver)
 966             {
 967                 ngx_log_error(NGX_LOG_ALERT, cycle->log, 0,
 968                               "*%uA open socket #%d left in connection %ui",
 969                               c[i].number, c[i].fd, i);
 970                 ngx_debug_quit = 1;
 971             }
 972         }
 973 
 974         if (ngx_debug_quit) {
 975             ngx_log_error(NGX_LOG_ALERT, cycle->log, 0, "aborting");
 976             ngx_debug_point();
 977         }
 978     }

@tkgeng
Copy link
Author

tkgeng commented Mar 18, 2016

I have added the core dump to nginx.conf.
Now I am waiting the problem happens agin then I will post you something useful.

@tkgeng
Copy link
Author

tkgeng commented Mar 24, 2016

Hi, I have got the core dump file, below is a pice of gdb debug output, there has some <Address x out of bounds>

Core was generated by `nginx: worker process                                         '.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004706c9 in ngx_http_script_flush_complex_value (r=0x21983f0, val=0x2183d98) at src/http/ngx_http_script.c:44
44  src/http/ngx_http_script.c: No such file or directory.
    in src/http/ngx_http_script.c
Missing separate debuginfos, use: debuginfo-install nginx-kuyun-1.8.0-1.centos6.x86_64
(gdb) bt
#0  0x00000000004706c9 in ngx_http_script_flush_complex_value (r=0x21983f0, val=0x2183d98) at src/http/ngx_http_script.c:44
#1  0x0000000000470731 in ngx_http_complex_value (r=0x21983f0, val=0x2183d98, value=0x7ffc30142fe0) at src/http/ngx_http_script.c:71
#2  0x00000000004d0b20 in ngx_http_vhost_traffic_status_shm_add_filter_node (r=0x21983f0, filter_keys=0x2183d48)
    at ../nginx-module-vts-0.1.9/src/ngx_http_vhost_traffic_status_module.c:2159
#3  0x00000000004d0f18 in ngx_http_vhost_traffic_status_shm_add_filter (r=0x21983f0) at ../nginx-module-vts-0.1.9/src/ngx_http_vhost_traffic_status_module.c:2269
#4  0x00000000004cea1e in ngx_http_vhost_traffic_status_handler (r=0x21983f0) at ../nginx-module-vts-0.1.9/src/ngx_http_vhost_traffic_status_module.c:1184
#5  0x000000000045f2a5 in ngx_http_log_request (r=0x21983f0) at src/http/ngx_http_request.c:3516
#6  0x000000000045f116 in ngx_http_free_request (r=0x21983f0, rc=0) at src/http/ngx_http_request.c:3463
#7  0x000000000045dd6b in ngx_http_set_keepalive (r=0x21983f0) at src/http/ngx_http_request.c:2901
#8  0x000000000045d1e6 in ngx_http_finalize_connection (r=0x21983f0) at src/http/ngx_http_request.c:2538
#9  0x000000000045ce0f in ngx_http_finalize_request (r=0x21983f0, rc=0) at src/http/ngx_http_request.c:2434
#10 0x000000000044dc10 in ngx_http_core_content_phase (r=0x21983f0, ph=0x216fb50) at src/http/ngx_http_core_module.c:1392
#11 0x000000000044c731 in ngx_http_core_run_phases (r=0x21983f0) at src/http/ngx_http_core_module.c:873
#12 0x000000000044c6a8 in ngx_http_handler (r=0x21983f0) at src/http/ngx_http_core_module.c:856
#13 0x000000000045bcb4 in ngx_http_process_request (r=0x21983f0) at src/http/ngx_http_request.c:1902
#14 0x000000000045a5b0 in ngx_http_process_request_headers (rev=0x7fb28bc4e588) at src/http/ngx_http_request.c:1333
#15 0x0000000000459953 in ngx_http_process_request_line (rev=0x7fb28bc4e588) at src/http/ngx_http_request.c:1013
#16 0x000000000045858e in ngx_http_wait_request_handler (rev=0x7fb28bc4e588) at src/http/ngx_http_request.c:499
#17 0x000000000043d106 in ngx_epoll_process_events (cycle=0x20738c0, timer=48, flags=1) at src/event/modules/ngx_epoll_module.c:822
#18 0x000000000042db88 in ngx_process_events_and_timers (cycle=0x20738c0) at src/event/ngx_event.c:248
#19 0x000000000043ac36 in ngx_worker_process_cycle (cycle=0x20738c0, data=0x4) at src/os/unix/ngx_process_cycle.c:767
#20 0x0000000000437466 in ngx_spawn_process (cycle=0x20738c0, proc=0x43aa30 <ngx_worker_process_cycle>, data=0x4, name=0x4dcd3b "worker process", respawn=13)
    at src/os/unix/ngx_process.c:198
#21 0x000000000043a5fd in ngx_reap_children (cycle=0x20738c0) at src/os/unix/ngx_process_cycle.c:620
#22 0x00000000004392aa in ngx_master_process_cycle (cycle=0x20738c0) at src/os/unix/ngx_process_cycle.c:173
#23 0x00000000004070ec in main (argc=3, argv=0x7ffc30143b98) at src/core/nginx.c:419
(gdb) f 0
#0  0x00000000004706c9 in ngx_http_script_flush_complex_value (r=0x21983f0, val=0x2183d98) at src/http/ngx_http_script.c:44
44  in src/http/ngx_http_script.c
(gdb) f 1
#1  0x0000000000470731 in ngx_http_complex_value (r=0x21983f0, val=0x2183d98, value=0x7ffc30142fe0) at src/http/ngx_http_script.c:71
71  in src/http/ngx_http_script.c
(gdb) f 2
#2  0x00000000004d0b20 in ngx_http_vhost_traffic_status_shm_add_filter_node (r=0x21983f0, filter_keys=0x2183d48)
    at ../nginx-module-vts-0.1.9/src/ngx_http_vhost_traffic_status_module.c:2159
2159            if (ngx_http_complex_value(r, &filters[i].filter_name, &filter_name) != NGX_OK) {
(gdb) bt full 2
#0  0x00000000004706c9 in ngx_http_script_flush_complex_value (r=0x21983f0, val=0x2183d98) at src/http/ngx_http_script.c:44
        index = 0x1
#1  0x0000000000470731 in ngx_http_complex_value (r=0x21983f0, val=0x2183d98, value=0x7ffc30142fe0) at src/http/ngx_http_script.c:71
        len = 9
        code = 0x471ed1 <ngx_http_script_copy_var_code>
        lcode = 0x471e2d <ngx_http_script_copy_var_len_code>
        e = {ip = 0x2183e18 "", pos = 0x21b01a1 "Xr", sp = 0x0, buf = {len = 9, data = 0x21b0198 "localhostXr"}, line = {len = 0, data = 0x0}, args = 0x0, flushed = 1,
          skip = 0, quote = 0, is_args = 0, log = 0, status = 0, request = 0x21983f0}
(More stack frames follow...)
(gdb) l
2154
2155            if (ngx_http_complex_value(r, &filters[i].filter_key, &filter_key) != NGX_OK) {
2156                return NGX_ERROR;
2157            }
2158
2159            if (ngx_http_complex_value(r, &filters[i].filter_name, &filter_name) != NGX_OK) {
2160                return NGX_ERROR;
2161            }
2162
2163            if (filter_key.len == 0) {
(gdb) p filters[i].filter_name
$1 = {value = {len = 0, data = 0xd2e106dd02544516 <Address 0xd2e106dd02544516 out of bounds>}, flushes = 0x1, lengths = 0x12, values = 0x4eab77}
(gdb) p filter_name
$2 = {len = 140721115115968, data = 0x21b0188 "NO\037\061\067\062.21.19.126localhostXr"}
(gdb) bt full 3
#0  0x00000000004706c9 in ngx_http_script_flush_complex_value (r=0x21983f0, val=0x2183d98) at src/http/ngx_http_script.c:44
        index = 0x1
#1  0x0000000000470731 in ngx_http_complex_value (r=0x21983f0, val=0x2183d98, value=0x7ffc30142fe0) at src/http/ngx_http_script.c:71
        len = 9
        code = 0x471ed1 <ngx_http_script_copy_var_code>
        lcode = 0x471e2d <ngx_http_script_copy_var_len_code>
        e = {ip = 0x2183e18 "", pos = 0x21b01a1 "Xr", sp = 0x0, buf = {len = 9, data = 0x21b0198 "localhostXr"}, line = {len = 0, data = 0x0}, args = 0x0, flushed = 1,
          skip = 0, quote = 0, is_args = 0, log = 0, status = 0, request = 0x21983f0}
#2  0x00000000004d0b20 in ngx_http_vhost_traffic_status_shm_add_filter_node (r=0x21983f0, filter_keys=0x2183d48)
    at ../nginx-module-vts-0.1.9/src/ngx_http_vhost_traffic_status_module.c:2159
        p = 0x2 <Address 0x2 out of bounds>
        type = 32764
        rc = 140721115115680
        key = {len = 140405311070240, data = 0x21983f0 "HTTP"}
        dst = {len = 140405311070208, data = 0x20765b8 " \300\260\250\262\177"}
        filter_key = {len = 9, data = 0x21b0198 "localhostXr"}
        filter_name = {len = 140721115115968, data = 0x21b0188 "NO\037\061\067\062.21.19.126localhostXr"}
        i = 0
        n = 1
        filters = 0x2183d70
(More stack frames follow...)
(gdb) x p
0x2:    Cannot access memory at address 0x2
(gdb) l
2164                continue;
2165            }
2166
2167            if (filter_name.len == 0) {
2168                type = NGX_HTTP_VHOST_TRAFFIC_STATUS_UPSTREAM_NO;
2169
2170                rc = ngx_http_vhost_traffic_status_node_generate_key(r->pool, &key, &filter_key, type);
2171                if (rc != NGX_OK) {
2172                    return NGX_ERROR;
2173                }
(gdb)

@vozlt
Copy link
Owner

vozlt commented Mar 24, 2016

Thanks.
The nginx-module-vts has been updated.(0cb444e).
Please test and let me know result.

Source trace

Trace ngx_http_complex_value_t member variable flushes

$ vi src/http/ngx_http_script.c

  35 void
  36 ngx_http_script_flush_complex_value(ngx_http_request_t *r,
  37     ngx_http_complex_value_t *val)
  38 {
  39     ngx_uint_t *index;
  40 
  41     index = val->flushes;
  42 
  43     if (index) {
  44         while (*index != (ngx_uint_t) -1) {
  45 
  46             if (r->variables[*index].no_cacheable) {
  47                 r->variables[*index].valid = 0;
  48                 r->variables[*index].not_found = 0;
  49             }
  50 
  51             index++;
  52         }
  53     }
  54 }
  .
  .
 107 ngx_int_t
 108 ngx_http_compile_complex_value(ngx_http_compile_complex_value_t *ccv)
 109 {
  .
  .
 117     nv = 0;
 118     nc = 0;
 119 
 120     for (i = 0; i < v->len; i++) {
 121         if (v->data[i] == '$') {
 122             if (v->data[i + 1] >= '1' && v->data[i + 1] <= '9') {
 123                 nc++;
 124 
 125             } else {
 126                 nv++;
 127             }
 128         }
 129     }
 130 
  .
  .
 141 
 142     ccv->complex_value->value = *v;
 143     ccv->complex_value->flushes = NULL;
 144     ccv->complex_value->lengths = NULL;
 145     ccv->complex_value->values = NULL;
 146 
 147     if (nv == 0 && nc == 0) {
 148         return NGX_OK;
 149     }
  .
  .

@tkgeng
Copy link
Author

tkgeng commented Apr 20, 2016

It has been working fine for a long time since you fixed the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants