Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash with ngx.io #2

Closed
alexandrlevashov opened this issue Mar 16, 2019 · 17 comments
Closed

crash with ngx.io #2

alexandrlevashov opened this issue Mar 16, 2019 · 17 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@alexandrlevashov
Copy link

alexandrlevashov commented Mar 16, 2019

Hi,

You module looks very useful. Thank you for your work.

I built the library with OpenResty 1.15.8.1rc1
https://github.com/openresty/docker-openresty/blob/master/xenial/Dockerfile

Configuration of the module is default (as on the main page).

And I constantly get crash with multi requests.

My use case is many clients write and read files simultaneously (files are from 500 bytes till 1GB ) and each 1-5 minutes nginx crashes with always the same stacktrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ngx_http_lua_io_file_finalize (r=0x3730333d646e6f63, ctx=0x7ff24166e4e0)
at /tmp/lua-io-nginx-module/src/ngx_http_lua_io_module.c:1492
1492 *ctx->cleanup = NULL;
[Current thread is 1 (Thread 0x7ff257c31740 (LWP 99))]

With common lua io it works as expected.

I can provide more info if you tell me what I can gather more.

Thank you. BR Alex.

@alexandrlevashov alexandrlevashov changed the title crash with crash with ngx.io Mar 16, 2019
@alexandrlevashov
Copy link
Author

here are all threads stack traces

(gdb) thread apply all where

Thread 17 (Thread 0x7ff245cde700 (LWP 108)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff245cde700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 16 (Thread 0x7ff2454dd700 (LWP 109)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2454dd700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 15 (Thread 0x7ff242cd8700 (LWP 114)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff242cd8700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 14 (Thread 0x7ff2434d9700 (LWP 113)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2434d9700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 13 (Thread 0x7ff2424d7700 (LWP 115)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2424d7700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 12 (Thread 0x7ff247ce2700 (LWP 104)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff247ce2700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7ff2444db700 (LWP 111)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2444db700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7ff243cda700 (LWP 112)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff243cda700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7ff246ce0700 (LWP 106)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff246ce0700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7ff244cdc700 (LWP 110)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff244cdc700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7ff2474e1700 (LWP 105)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2474e1700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

---Type to continue, or q to quit---
Thread 6 (Thread 0x7ff2464df700 (LWP 107)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2464df700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ff248ce4700 (LWP 102)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff248ce4700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ff2484e3700 (LWP 103)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2484e3700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ff2494e5700 (LWP 101)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff2494e5700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ff249ce6700 (LWP 100)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000048b9a0 in ngx_thread_cond_wait (cond=cond@entry=0xd25098, mtx=mtx@entry=0xd25058, log=0xd22a68) at src/os/unix/ngx_thread_cond.c:63
#2 0x000000000048b26f in ngx_thread_pool_cycle (data=0xd25058) at src/core/ngx_thread_pool.c:315
#3 0x00007ff2575fe6ba in start_thread (arg=0x7ff249ce6700) at pthread_create.c:333
#4 0x00007ff25695941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ff257c31740 (LWP 99)):
#0 ngx_http_lua_io_file_finalize (r=0x3730333d646e6f63, ctx=0x7ff24166e4e0) at /tmp/lua-io-nginx-module/src/ngx_http_lua_io_module.c:1492
#1 0x000000000049ea6e in ngx_http_free_request (r=r@entry=0xaac8bc0, rc=0) at src/http/ngx_http_request.c:3508
#2 0x000000000049ecb0 in ngx_http_close_request (rc=rc@entry=0, r=0xaac8bc0) at src/http/ngx_http_request.c:3479
#3 0x000000000049f10e in ngx_http_lingering_close_handler (rev=) at src/http/ngx_http_request.c:3346
#4 0x0000000000489a8b in ngx_epoll_process_events (cycle=, timer=, flags=) at src/event/modules/ngx_epoll_module.c:902
#5 0x000000000048024b in ngx_process_events_and_timers (cycle=cycle@entry=0xd22a50) at src/event/ngx_event.c:252
#6 0x0000000000487bc2 in ngx_worker_process_cycle (cycle=0xd22a50, data=) at src/os/unix/ngx_process_cycle.c:816
#7 0x00000000004860ec in ngx_spawn_process (cycle=cycle@entry=0xd22a50, proc=proc@entry=0x487b50 <ngx_worker_process_cycle>, data=data@entry=0x2, name=name@entry=0x787b5c "worker process",
respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#8 0x00000000004872c4 in ngx_start_worker_processes (cycle=cycle@entry=0xd22a50, n=8, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:397
#9 0x0000000000488674 in ngx_master_process_cycle (cycle=cycle@entry=0xd22a50) at src/os/unix/ngx_process_cycle.c:136
#10 0x000000000045ff8e in main (argc=, argv=) at src/core/nginx.c:382

@tokers
Copy link
Owner

tokers commented Mar 16, 2019

Thanks for the report. I will investigate it as soon as possible.

@tokers tokers added the help wanted Extra attention is needed label Mar 16, 2019
@tokers
Copy link
Owner

tokers commented Mar 17, 2019

@alexandrlevashov
Could you provide the test arguments? Like the headers, the request body size.

I cannot reproduce this problem for now.

@tokers
Copy link
Owner

tokers commented Mar 17, 2019

@alexandrlevashov
By the way, did you enable the lingering_close directive? I found the ngx_http_lingering_close_handler function is in the stack.

@tokers tokers pinned this issue Mar 18, 2019
@tokers tokers unpinned this issue Mar 18, 2019
@alexandrlevashov
Copy link
Author

alexandrlevashov commented Mar 19, 2019

Could you provide the test arguments? Like the headers, the request body size.

I could not catch the argumetns yet. It is not clear since queries are not generalized and there are many internal queries with capture and proxy_pass/upsteam to itself and outside.

I tried to implement a sintetic test, but it worked without a problem. The system where it crashes is high loaded it constantly transfers ~1Gbit/sec.

I will continue with sintentic tests later to figure out the case.

did you enable the lingering_close directive?

It is by default on. Should I disable it? In nginx documentation it is not recomended for production system as I remember.

@tokers
Copy link
Owner

tokers commented Mar 20, 2019

It is by default on. Should I disable it? In nginx documentation it is not recomended for production system as I remember.

Well, it's friendly for your clients if you enable lingering_close.

@tokers
Copy link
Owner

tokers commented Mar 20, 2019

@alexandrlevashov
Anyway. I will try to capture this issue with Mozilla rr in my local machine.

@tokers
Copy link
Owner

tokers commented Mar 24, 2019

With the default configuration. I can't reproduce this problem at all, other information is required to solve it, could you tell me how your clients request to the Nginx?

@z-trade
Copy link

z-trade commented Mar 31, 2019

Requesting model is the following. There are PUT/HEAD/GET/DELETE requests. Most PUT requests writes file data locally and sends it to other service using this library:
https://github.com/ledgetech/lua-resty-http

so chunks are read, written to local file and forwarded to other service.

GET requests read file and "print" data to output.

Chunk size is 65536 bytes.

PUT is most interested since http request to forward data is sent to local "proxy_pass" location that passes query to an external service.

So PUT sequence is:

  1. Get PUT request
  2. Open local file to write
  3. Connect to local (127.0.0.1:80/local_proxy_pass) location with proxy_pass using resty.http library
    3.1. for body reader an iterator function is passed where
    3.1.1. receive a chunk data with socket:receive(current_chunk_size)
    3.1.2. write data to local file
    3.1.3. return data
  4. When all data is read - close the file

So it might be related to such a not straight forward logic. But in my test environment I didn't face the issue also (((. It only occur on highloaded production servers.

Input and external requests are secured with https .

@tokers
Copy link
Owner

tokers commented Apr 1, 2019

@z-trade OK. I will try to strengthen the pressure test.

@tokers
Copy link
Owner

tokers commented Apr 9, 2019

@z-trade @alexandrlevashov
I have successfully reproduced this issue and found the evil.

Since a programming fault, an expected meta-table wasn't set, when the GC collector is running, a memory chunk was freed silently without calling the __gc function, therefore an "use-after-free" problem occurred (during the finalizing process of request).

I'm trying to fix this issue, although the crash stack is not same but I think the question is similar.

@tokers tokers added bug Something isn't working and removed help wanted Extra attention is needed labels Apr 9, 2019
@tokers
Copy link
Owner

tokers commented Apr 9, 2019

I have fixed the problem in my local machine, but the fucking network problem prevents me from pushing my fix to github. I will push it ASAP!

tokers added a commit that referenced this issue Apr 9, 2019
The ngx_http_lua_io_file_ctx_metatable_key doesn't hold the io file
object ctx metatable correctly. So the expected __gc method wouldn't be
called while freeing the file_ctx object (an userdata chunk), instead it was
just freed silently. Consequently, our cleanup handler was still hanged on the
r->cleanup linked list (it would be discarded if the __gc method was called),
and it was called when Nginx was freeing current request. The handler, would
try to access the already freed userdata chunk, caused the "use-after-free"
problem, and the process might crash randomly.

See #2 on Github for the details.
@tokers
Copy link
Owner

tokers commented Apr 9, 2019

@z-trade @alexandrlevashov
Hi!

I have pushed the fix to this repository, see the ed5decf for details, just try it and see whether any crash problem still exists!

@tokers tokers added the good first issue Good for newcomers label Apr 9, 2019
@alexandrlevashov
Copy link
Author

Hi!

Just tryed for several hours on the highloaded server. No crashes anymore. Good job!

I will run it for several days and then come back with final results.

@alexandrlevashov
Copy link
Author

Hi!

So no crashes anymore, but the following error occurs very often (I even assume that every time):
close() failed (9: Bad file descriptor)

This is produced at the end of the complex workflow described above after main data already written and re-sended and a small piece of data should be placed in a separate file. The exact code:

local io = require "ngx.io"

local file,err = io.open( filename, "w+" )
if err then return nil, err end

for name, value in pairs(headers) do
if value ~= nil then
local res,err = file:write(name:lower()..": "..value.."\n")
if not res then return nil,err end
end
end
return file:close()

The last file:close is the problem that returns the error "close() failed (9: Bad file descriptor)" .

@tokers
Copy link
Owner

tokers commented Apr 10, 2019

Hi!

So no crashes anymore, but the following error occurs very often (I even assume that every time):
close() failed (9: Bad file descriptor)

This is produced at the end of the complex workflow described above after main data already written and re-sended and a small piece of data should be placed in a separate file. The exact code:

local io = require "ngx.io"

local file,err = io.open( filename, "w+" )
if err then return nil, err end

for name, value in pairs(headers) do
if value ~= nil then
local res,err = file:write(name:lower()..": "..value.."\n")
if not res then return nil,err end
end
end
return file:close()

The last file:close is the problem that returns the error "close() failed (9: Bad file descriptor)" .

Hmmm. I think this is really another problem and we shouldn't go ahead in this thread. Would you mind to create a new issue and let's talk it there!

@alexandrlevashov
Copy link
Author

sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants