crash with ngx.io #2

alexandrlevashov · 2019-03-16T09:48:08Z

Hi,

You module looks very useful. Thank you for your work.

I built the library with OpenResty 1.15.8.1rc1
https://github.com/openresty/docker-openresty/blob/master/xenial/Dockerfile

Configuration of the module is default (as on the main page).

And I constantly get crash with multi requests.

My use case is many clients write and read files simultaneously (files are from 500 bytes till 1GB ) and each 1-5 minutes nginx crashes with always the same stacktrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ngx_http_lua_io_file_finalize (r=0x3730333d646e6f63, ctx=0x7ff24166e4e0)
at /tmp/lua-io-nginx-module/src/ngx_http_lua_io_module.c:1492
1492 *ctx->cleanup = NULL;
[Current thread is 1 (Thread 0x7ff257c31740 (LWP 99))]

With common lua io it works as expected.

I can provide more info if you tell me what I can gather more.

Thank you. BR Alex.

alexandrlevashov · 2019-03-16T10:03:14Z

here are all threads stack traces

(gdb) thread apply all where

Thread 17 (Thread 0x7ff245cde700 (LWP 108)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff245cde700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 16 (Thread 0x7ff2454dd700 (LWP 109)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2454dd700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 15 (Thread 0x7ff242cd8700 (LWP 114)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff242cd8700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 14 (Thread 0x7ff2434d9700 (LWP 113)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2434d9700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 13 (Thread 0x7ff2424d7700 (LWP 115)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2424d7700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 12 (Thread 0x7ff247ce2700 (LWP 104)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff247ce2700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7ff2444db700 (LWP 111)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2444db700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7ff243cda700 (LWP 112)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff243cda700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7ff246ce0700 (LWP 106)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff246ce0700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7ff244cdc700 (LWP 110)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff244cdc700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7ff2474e1700 (LWP 105)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2474e1700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

---Type to continue, or q to quit---
Thread 6 (Thread 0x7ff2464df700 (LWP 107)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2464df700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ff248ce4700 (LWP 102)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff248ce4700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ff2484e3700 (LWP 103)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2484e3700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ff2494e5700 (LWP 101)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2494e5700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ff249ce6700 (LWP 100)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff249ce6700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ff257c31740 (LWP 99)):
#0 ngx_http_lua_io_file_finalize (r=0x3730333d646e6f63, ctx=0x7ff24166e4e0) at /tmp/lua-io-nginx-module/src/ngx_http_lua_io_module.c:1492
#1 0x000000000049ea6e in ngx_http_free_request (r=r@entry=0xaac8bc0, rc=0) at src/http/ngx_http_request.c:3508
#2 0x000000000049ecb0 in ngx_http_close_request (rc=rc@entry=0, r=0xaac8bc0) at src/http/ngx_http_request.c:3479
#3 0x000000000049f10e in ngx_http_lingering_close_handler (rev=) at src/http/ngx_http_request.c:3346
#4 0x0000000000489a8b in ngx_epoll_process_events (cycle=, timer=, flags=) at src/event/modules/ngx_epoll_module.c:902
#5 0x000000000048024b in ngx_process_events_and_timers (cycle=cycle@entry=0xd22a50) at src/event/ngx_event.c:252
#6 0x0000000000487bc2 in ngx_worker_process_cycle (cycle=0xd22a50, data=) at src/os/unix/ngx_process_cycle.c:816
#7 0x00000000004860ec in ngx_spawn_process (cycle=cycle@entry=0xd22a50, proc=proc@entry=0x487b50 <ngx_worker_process_cycle>, data=data@entry=0x2, name=name@entry=0x787b5c "worker process",
respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#8 0x00000000004872c4 in ngx_start_worker_processes (cycle=cycle@entry=0xd22a50, n=8, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:397
#9 0x0000000000488674 in ngx_master_process_cycle (cycle=cycle@entry=0xd22a50) at src/os/unix/ngx_process_cycle.c:136
#10 0x000000000045ff8e in main (argc=, argv=) at src/core/nginx.c:382

tokers · 2019-03-16T10:41:27Z

Thanks for the report. I will investigate it as soon as possible.

tokers · 2019-03-17T10:40:54Z

@alexandrlevashov
Could you provide the test arguments? Like the headers, the request body size.

I cannot reproduce this problem for now.

tokers · 2019-03-17T10:46:24Z

@alexandrlevashov
By the way, did you enable the lingering_close directive? I found the ngx_http_lingering_close_handler function is in the stack.

alexandrlevashov · 2019-03-19T08:41:31Z

Could you provide the test arguments? Like the headers, the request body size.

I could not catch the argumetns yet. It is not clear since queries are not generalized and there are many internal queries with capture and proxy_pass/upsteam to itself and outside.

I tried to implement a sintetic test, but it worked without a problem. The system where it crashes is high loaded it constantly transfers ~1Gbit/sec.

I will continue with sintentic tests later to figure out the case.

did you enable the lingering_close directive?

It is by default on. Should I disable it? In nginx documentation it is not recomended for production system as I remember.

tokers · 2019-03-20T01:36:41Z

It is by default on. Should I disable it? In nginx documentation it is not recomended for production system as I remember.

Well, it's friendly for your clients if you enable lingering_close.

tokers · 2019-03-20T01:37:27Z

@alexandrlevashov
Anyway. I will try to capture this issue with Mozilla rr in my local machine.

tokers · 2019-03-24T04:31:46Z

With the default configuration. I can't reproduce this problem at all, other information is required to solve it, could you tell me how your clients request to the Nginx?

z-trade · 2019-03-31T10:25:14Z

Requesting model is the following. There are PUT/HEAD/GET/DELETE requests. Most PUT requests writes file data locally and sends it to other service using this library:
https://github.com/ledgetech/lua-resty-http

so chunks are read, written to local file and forwarded to other service.

GET requests read file and "print" data to output.

Chunk size is 65536 bytes.

PUT is most interested since http request to forward data is sent to local "proxy_pass" location that passes query to an external service.

So PUT sequence is:

Get PUT request
Open local file to write
Connect to local (127.0.0.1:80/local_proxy_pass) location with proxy_pass using resty.http library
3.1. for body reader an iterator function is passed where
3.1.1. receive a chunk data with socket:receive(current_chunk_size)
3.1.2. write data to local file
3.1.3. return data
When all data is read - close the file

So it might be related to such a not straight forward logic. But in my test environment I didn't face the issue also (((. It only occur on highloaded production servers.

Input and external requests are secured with https .

tokers · 2019-04-01T01:42:22Z

@z-trade OK. I will try to strengthen the pressure test.

tokers · 2019-04-09T06:16:48Z

@z-trade @alexandrlevashov
I have successfully reproduced this issue and found the evil.

Since a programming fault, an expected meta-table wasn't set, when the GC collector is running, a memory chunk was freed silently without calling the __gc function, therefore an "use-after-free" problem occurred (during the finalizing process of request).

I'm trying to fix this issue, although the crash stack is not same but I think the question is similar.

tokers · 2019-04-09T07:11:15Z

I have fixed the problem in my local machine, but the fucking network problem prevents me from pushing my fix to github. I will push it ASAP!

The ngx_http_lua_io_file_ctx_metatable_key doesn't hold the io file object ctx metatable correctly. So the expected __gc method wouldn't be called while freeing the file_ctx object (an userdata chunk), instead it was just freed silently. Consequently, our cleanup handler was still hanged on the r->cleanup linked list (it would be discarded if the __gc method was called), and it was called when Nginx was freeing current request. The handler, would try to access the already freed userdata chunk, caused the "use-after-free" problem, and the process might crash randomly. See #2 on Github for the details.

tokers · 2019-04-09T07:17:48Z

@z-trade @alexandrlevashov
Hi!

I have pushed the fix to this repository, see the ed5decf for details, just try it and see whether any crash problem still exists!

alexandrlevashov · 2019-04-09T13:59:47Z

Hi!

Just tryed for several hours on the highloaded server. No crashes anymore. Good job!

I will run it for several days and then come back with final results.

alexandrlevashov · 2019-04-10T09:19:15Z

Hi!

So no crashes anymore, but the following error occurs very often (I even assume that every time):
close() failed (9: Bad file descriptor)

This is produced at the end of the complex workflow described above after main data already written and re-sended and a small piece of data should be placed in a separate file. The exact code:

local io = require "ngx.io"

local file,err = io.open( filename, "w+" )
if err then return nil, err end

for name, value in pairs(headers) do
if value ~= nil then
local res,err = file:write(name:lower()..": "..value.."\n")
if not res then return nil,err end
end
end
return file:close()

The last file:close is the problem that returns the error "close() failed (9: Bad file descriptor)" .

tokers · 2019-04-10T09:22:26Z

Hi!

So no crashes anymore, but the following error occurs very often (I even assume that every time):
close() failed (9: Bad file descriptor)

This is produced at the end of the complex workflow described above after main data already written and re-sended and a small piece of data should be placed in a separate file. The exact code:

local io = require "ngx.io"

local file,err = io.open( filename, "w+" )
if err then return nil, err end

for name, value in pairs(headers) do
if value ~= nil then
local res,err = file:write(name:lower()..": "..value.."\n")
if not res then return nil,err end
end
end
return file:close()

The last file:close is the problem that returns the error "close() failed (9: Bad file descriptor)" .

Hmmm. I think this is really another problem and we shouldn't go ahead in this thread. Would you mind to create a new issue and let's talk it there!

alexandrlevashov · 2019-04-10T10:40:51Z

sure.

alexandrlevashov changed the title ~~crash with~~ crash with ngx.io Mar 16, 2019

tokers added the help wanted Extra attention is needed label Mar 16, 2019

tokers pinned this issue Mar 18, 2019

tokers unpinned this issue Mar 18, 2019

tokers added bug Something isn't working and removed help wanted Extra attention is needed labels Apr 9, 2019

tokers added the good first issue Good for newcomers label Apr 9, 2019

alexandrlevashov closed this as completed Apr 10, 2019

splitice mentioned this issue May 30, 2023

Crash in ngx_http_lua_io_read_all #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash with ngx.io #2

crash with ngx.io #2

alexandrlevashov commented Mar 16, 2019 •

edited

Loading

alexandrlevashov commented Mar 16, 2019

tokers commented Mar 16, 2019 •

edited

Loading

tokers commented Mar 17, 2019 •

edited

Loading

tokers commented Mar 17, 2019

alexandrlevashov commented Mar 19, 2019 •

edited

Loading

tokers commented Mar 20, 2019

tokers commented Mar 20, 2019

tokers commented Mar 24, 2019

z-trade commented Mar 31, 2019 •

edited

Loading

tokers commented Apr 1, 2019

tokers commented Apr 9, 2019 •

edited

Loading

tokers commented Apr 9, 2019 •

edited

Loading

tokers commented Apr 9, 2019

alexandrlevashov commented Apr 9, 2019

alexandrlevashov commented Apr 10, 2019

tokers commented Apr 10, 2019

alexandrlevashov commented Apr 10, 2019

crash with ngx.io #2

crash with ngx.io #2

Comments

alexandrlevashov commented Mar 16, 2019 • edited Loading

alexandrlevashov commented Mar 16, 2019

tokers commented Mar 16, 2019 • edited Loading

tokers commented Mar 17, 2019 • edited Loading

tokers commented Mar 17, 2019

alexandrlevashov commented Mar 19, 2019 • edited Loading

tokers commented Mar 20, 2019

tokers commented Mar 20, 2019

tokers commented Mar 24, 2019

z-trade commented Mar 31, 2019 • edited Loading

tokers commented Apr 1, 2019

tokers commented Apr 9, 2019 • edited Loading

tokers commented Apr 9, 2019 • edited Loading

tokers commented Apr 9, 2019

alexandrlevashov commented Apr 9, 2019

alexandrlevashov commented Apr 10, 2019

tokers commented Apr 10, 2019

alexandrlevashov commented Apr 10, 2019

alexandrlevashov commented Mar 16, 2019 •

edited

Loading

tokers commented Mar 16, 2019 •

edited

Loading

tokers commented Mar 17, 2019 •

edited

Loading

alexandrlevashov commented Mar 19, 2019 •

edited

Loading

z-trade commented Mar 31, 2019 •

edited

Loading

tokers commented Apr 9, 2019 •

edited

Loading

tokers commented Apr 9, 2019 •

edited

Loading