Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to nginx 1.24 auto_push causes looping signal 29 and signal 17 crash #238

Closed
streamingsystems opened this issue May 13, 2023 · 6 comments

Comments

@streamingsystems
Copy link

Hi All,

We recently upgraded to nginx 1.24 and compiled this module properly (as we have always done). The module works great still, however after a "reload" of nginx we are seeing this in our nginx error.log and it keeps repeating forever and causes the module to be inoperable. When we do a nginx "restart" everything goes back to working fine.

When I turn auto_push "off" the problem does not happen, so it's something specific to the auto push.

This has been working fine for years up until a recent upgrade to nginx 1.24.

Any help would be appreciated.

2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310276
2023/05/13 10:27:04 [debug] 3310275#3310275: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310275#3310275: auto_push: create socket '/tmp/nginx-http-flv.9'
2023/05/13 10:27:04 [debug] 3310275#3310275: auto_push: delete existing socket '/tmp/nginx-http-flv.9'
2023/05/13 10:27:04 [debug] 3310276#3310276: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310276#3310276: auto_push: create socket '/tmp/nginx-http-flv.15'
2023/05/13 10:27:04 [debug] 3310276#3310276: auto_push: delete existing socket '/tmp/nginx-http-flv.15'
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310270
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310270 exited on signal 11
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310272 exited on signal 11
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310277
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310278
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 29 (SIGIO) received
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310272
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310269 exited on signal 11
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310279
2023/05/13 10:27:04 [debug] 3310277#3310277: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310277#3310277: auto_push: create socket '/tmp/nginx-http-flv.10'
2023/05/13 10:27:04 [debug] 3310277#3310277: auto_push: delete existing socket '/tmp/nginx-http-flv.10'
2023/05/13 10:27:04 [debug] 3310278#3310278: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310278#3310278: auto_push: create socket '/tmp/nginx-http-flv.11'
2023/05/13 10:27:04 [debug] 3310278#3310278: auto_push: delete existing socket '/tmp/nginx-http-flv.11'
2023/05/13 10:27:04 [debug] 3310279#3310279: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310279#3310279: auto_push: create socket '/tmp/nginx-http-flv.12'
2023/05/13 10:27:04 [debug] 3310279#3310279: auto_push: delete existing socket '/tmp/nginx-http-flv.12'
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 29 (SIGIO) received
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 29 (SIGIO) received
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310271
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310271 exited on signal 11
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310280
2023/05/13 10:27:04 [debug] 3310280#3310280: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310280#3310280: auto_push: create socket '/tmp/nginx-http-flv.14'
2023/05/13 10:27:04 [debug] 3310280#3310280: auto_push: delete existing socket '/tmp/nginx-http-flv.14'
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310276
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310276 exited on signal 11
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310281
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 29 (SIGIO) received
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 29 (SIGIO) received
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310273
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310273 exited on signal 11
2023/05/13 10:27:04 [alert] 2190648#2190648: worker process 3310274 exited on signal 11
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310282
2023/05/13 10:27:04 [notice] 2190648#2190648: start worker process 3310283
2023/05/13 10:27:04 [notice] 2190648#2190648: signal 17 (SIGCHLD) received from 3310274
2023/05/13 10:27:04 [debug] 3310281#3310281: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310281#3310281: auto_push: create socket '/tmp/nginx-http-flv.15'
2023/05/13 10:27:04 [debug] 3310281#3310281: auto_push: delete existing socket '/tmp/nginx-http-flv.15'
2023/05/13 10:27:04 [debug] 3310282#3310282: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310282#3310282: auto_push: create socket '/tmp/nginx-http-flv.8'
2023/05/13 10:27:04 [debug] 3310282#3310282: auto_push: delete existing socket '/tmp/nginx-http-flv.8'
2023/05/13 10:27:04 [debug] 3310283#3310283: auto_push: creating sockets
2023/05/13 10:27:04 [debug] 3310283#3310283: auto_push: create socket '/tmp/nginx-http-flv.13'
2023/05/13 10:27:04 [debug] 3310283#3310283: auto_push: delete existing socket '/tmp/nginx-http-flv.13'

@streamingsystems
Copy link
Author

streamingsystems commented May 13, 2023

More info. The stack trace:

#0 0x00007faa26ebe034 in ngx_rtmp_auto_push_init_process (cycle=) at flv-1.24.0/ngx_rtmp_auto_push_module.c:323
#1 ngx_rtmp_auto_push_init_process (cycle=) at flv-1.24.0/ngx_rtmp_auto_push_module.c:128
#2 0x00005596ef988776 in ngx_worker_process_init (cycle=cycle@entry=0x5596f1294660, worker=worker@entry=0) at src/os/unix/ngx_process_cycle.c:893
#3 0x00005596ef988c5f in ngx_worker_process_cycle (cycle=cycle@entry=0x5596f1294660, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:706
#4 0x00005596ef98759f in ngx_spawn_process (cycle=cycle@entry=0x5596f1294660, proc=0x5596ef988c30 <ngx_worker_process_cycle>, data=0x0, name=0x5596efa34c27 "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:199
#5 0x00005596ef989d27 in ngx_reap_children (cycle=0x5596f1294660) at src/os/unix/ngx_process_cycle.c:598
#6 ngx_master_process_cycle (cycle=0x5596f1294660) at src/os/unix/ngx_process_cycle.c:174
#7 0x00005596ef95f698 in main (argc=, argv=) at src/core/nginx.c:383

I am wondering if it has to do with this fix in nginx:

nginx/nginx@d52e568

Or this fix:

nginx/nginx@593bb54

@winshining
Copy link
Owner

winshining commented May 14, 2023

Your nginx.conf file?

My nginx.conf:

worker_processes  auto;
...
rtmp_auto_push on;

Nginx version:

root@debian:~# /usr/local/nginx/sbin/nginx -V
nginx version: nginx/1.24.0
built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
built with OpenSSL 1.1.1n  15 Mar 2022
TLS SNI support enabled
configure arguments: --with-debug --add-module=../nginx-http-flv-module

Executed "reload" but didn't reproduce the bug:

2023/05/14 16:14:14 [notice] 14873#0: signal process started
2023/05/14 16:15:28 [notice] 14976#0: signal process started
2023/05/14 16:18:08 [notice] 15098#0: signal process started
2023/05/14 16:28:53 [notice] 15472#0: signal process started

@streamingsystems
Copy link
Author

streamingsystems commented May 14, 2023

I fixed this bug.

This is what I found:

In the nginx_rtmp_auto_push_module.c on line 323:

There is this code:

                old = ls->previous->connection;

                if (ngx_del_event(old->read, NGX_READ_EVENT, NGX_CLOSE_EVENT)
                    == NGX_ERROR)
                {
                    return NGX_ERROR;
                }

                old->fd = (ngx_socket_t) -1;

What I am seeing if that "old" is null, so when this code is executing it's crashing because it's trying to do old->read on null.

It happens on every server we have when we issue a "reload".

To fix it I made this change:

            old = ls->previous->connection;
            if (old) {
                if (ngx_del_event(old->read, NGX_READ_EVENT, NGX_CLOSE_EVENT)
                    == NGX_ERROR)
                {
                    return NGX_ERROR;
                }

                old->fd = (ngx_socket_t) -1;
            }

It now reloads perfectly in our entire environment.

I am wondering what conditions in the new 1.24 would cause ls->previous->connection to be null, which would mean that the old variable is null, which would crash the old->read.

/sbin/nginx -V
nginx version: nginx/1.24.0
built by gcc 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)
built with OpenSSL 1.1.1k FIPS 25 Mar 2021
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-openssl-opt=enable-ktls --with-cc-opt='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

We have our configs broken up into different files but this is the various configs:

rtmp_auto_push on;

rtmp {

out_queue           4096;
out_cork            8;
max_streams         128;
timeout             10s;
drop_idle_publisher 10s;
respawn off;

server {
    .....
}

}

user nginx;
worker_processes auto;
worker_rlimit_nofile 32768;
pid /run/nginx.pid;

include /usr/share/nginx/modules/*.conf;

events {
worker_connections 2048;
use epoll;
multi_accept on;
}

@winshining
Copy link
Owner

I found the reason: if the module is compiled as a dynamic module, the bug will be 100% reproduced.
What you mentioned above doesn't perfectly resolve the issue and will lead to socket leak.
I will fix the issue soon.
Thank you for reporting the bug!

@streamingsystems
Copy link
Author

streamingsystems commented May 15, 2023 via email

@winshining
Copy link
Owner

Fixed, please fetch the latest code to test.
I will take a look at proxy protocol issue you have posted to me in my spare time.

Executor-Cheng added a commit to Executor-Cheng/nginx-http-flv-module that referenced this issue Aug 1, 2023
* [fix] fixed a crash bug resulted from 'reload' (winshining#238).

* [fix] fixed 'proxy protocol not working' bug (winshining#239).

* [fix] fixed a typo.

* [fix] fixed an endian bug in mp4 vod.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants