Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pubsub: Node process terminates "free(): double free detected in tcache 2" #212

Closed
dioptre opened this issue Oct 21, 2019 · 19 comments
Closed
Labels
bug Something isn't working

Comments

@dioptre
Copy link
Contributor

dioptre commented Oct 21, 2019

After about 24 hours I get this:

On Node

Aborted
free(): double free detected in tcache 2
npm ERR! code ELIFECYCLE
npm ERR! errno 134
npm ERR! XXX@0.0.1 debug: DEBUG=* node index.js
npm ERR! Exit status 134
npm ERR!
npm ERR! Failed at the XXX@0.0.1 debug script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

OS Linux 5.1.7
Node v10.15.3
Npm v6.12.0

13 verbose stack Exit status 134
13 verbose stack at EventEmitter. (/home/a/.nvm/versions/node/v10.15.3/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:332:16)
13 verbose stack at EventEmitter.emit (events.js:189:13)
13 verbose stack at ChildProcess. (/home/a/.nvm/versions/node/v10.15.3/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack at ChildProcess.emit (events.js:189:13)
13 verbose stack at maybeClose (internal/child_process.js:970:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)

Found this running gdb...

double free or corruption (fasttop)

Thread 1 "node" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) run index.js

@dioptre dioptre changed the title free(): double free detected in tcache 2 Node process terminates "free(): double free detected in tcache 2" Oct 21, 2019
@ghost
Copy link

ghost commented Oct 26, 2019

I cannot do anything with that information. You'll have to minimize the problem to the base minimum, for instance you should try without Cluster and you have to stress the server to crash earlier than 24 hours

@ghost
Copy link

ghost commented Oct 26, 2019

This bug report is missing essentially everything to go by -> I have no idea what features you are using, SSL or not, Cluster or not, pub/sub or not, etc. It really only says "I crashed". I can't make anything of this.

@ghost
Copy link

ghost commented Oct 26, 2019

You're not even telling what version you are using. How do you expect me to even begin looking at this?

@dioptre
Copy link
Contributor Author

dioptre commented Oct 28, 2019

Yeah I can see you need more info.

uWebsockets 16.1

Seems like it might be due to websocket expiring - im running sockets that dont timeout (=0).

I'll try and find more info in a trace when I have time.

@ghost
Copy link

ghost commented Oct 28, 2019

You need to provide exactly what you are doing. SSLApp or App, pub/sub or not, what functions you use, Cluster or not, etc. etc.

Then, the best would be to build the binary with AddressSanitizer which will tell me exactly what happens with these kind of issues.

I can tell you how to build with AddressSanitizer if you want to

@dioptre
Copy link
Contributor Author

dioptre commented Oct 28, 2019

SSLApp
No pub
No Cluster

Forgot to send backtrace too:

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7aaf535 in __GI_abort () at abort.c:79
#2 0x00007ffff7b06508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7c1128d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff7b0cc1a in malloc_printerr (str=str@entry=0x7ffff7c12f58 "free(): double free detected in tcache 2") at malloc.c:5341
#4 0x00007ffff7b0e6fd in _int_free (av=0x7ffff7c48c40 <main_arena>, p=0x29ca530, have_lock=) at malloc.c:4193
#5 0x00007ffff5172569 in ?? () from /home/a/projects/bb/bbs/node_modules/uWebSockets.js/uws_linux_x64_64.node
#6 0x00007ffff5172f2e in ?? () from /home/a/projects/bb/bbs/node_modules/uWebSockets.js/uws_linux_x64_64.node
#7 0x00007ffff517d09a in ?? () from /home/a/projects/bb/bbs/node_modules/uWebSockets.js/uws_linux_x64_64.node
#8 0x00007ffff5178505 in auto uWS::WebSocketContext<true, true>::init()::{lambda(auto:1*, char*, int)#1}::__invoke<us_socket_t>(us_socket_t*, char*, int) ()
from /home/a/projects/bb/bbs/node_modules/uWebSockets.js/uws_linux_x64_64.node
#9 0x00007ffff518e900 in ssl_on_data () from /home/a/projects/bb/bbs/node_modules/uWebSockets.js/uws_linux_x64_64.node
#10 0x0000000000a5a5c8 in uv__io_poll (loop=loop@entry=0x24adfe0 <default_loop_struct>, timeout=1445) at ../deps/uv/src/unix/linux-core.c:375
#11 0x0000000000a4a21b in uv_run (loop=0x24adfe0 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:370
#12 0x00000000008e6f45 in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocatorstd::string > const&, std::vector<std::string, std::allocatorstd::string > const&) ()
#13 0x00000000008e5239 in node::Start(int, char**) ()
#14 0x00007ffff7ab109b in __libc_start_main (main=0x89ec50

, argc=2, argv=0x7fffffffd918, init=, fini=,
rtld_fini=, stack_end=0x7fffffffd908) at ../csu/libc-start.c:308
#15 0x000000000089ed85 in _start ()

Yes please tell me how to AddressSanitizer! Thanks! :)

@dioptre
Copy link
Contributor Author

dioptre commented Oct 28, 2019

Btw (for others too) I setup a debug branch for testing/developing:
https://github.com/dioptre/uWebSockets.js/tree/debug

And had to run:

LD_PRELOAD=libclang_rt.asan-x86_64.so gdb node

Then in gdb:

run index.js

@ghost
Copy link

ghost commented Oct 29, 2019

You don't need gdb but you need preload with nodejs. I made changes to your build

@ghost
Copy link

ghost commented Oct 29, 2019

What you report in this bug is not the cause but rather the consequences, asan will report the cause with lots of info in text format

@dioptre
Copy link
Contributor Author

dioptre commented Oct 29, 2019

Thanks mate found it:

https://gist.github.com/dioptre/08fb3feb56e91f827a331de423d9103e

For others who might get here (might be worthwhile to merge the debug branch in the main repo mate):

DEBUG=* LD_PRELOAD=/usr/lib/llvm-7/lib/clang/7.0.1/lib/linux/libclang_rt.asan-x86_64.so node index.js

@ghost
Copy link

ghost commented Oct 29, 2019

You are calling subscribe?

@dioptre
Copy link
Contributor Author

dioptre commented Oct 29, 2019

Yes, I subscribe to a channel, but I haven't updated the pub yet (I do not use it yet). Since you fixed it I only added sockets to a set and have been using send to each using a for loop temporarily. Has worked fine up until recently but I've also had to move to SSL. Once a ws disconnects, I remove them from the set. I've commented all this set logic out so don't think that's interfering with it.

I got another free error (looks like its trimtree):
https://gist.github.com/dioptre/aed0df0a58649128e734d01f590d209f

@dioptre
Copy link
Contributor Author

dioptre commented Oct 29, 2019

This also looks different (heap use after free):
https://gist.github.com/dioptre/9a1ed85440578670bd981e5994aed7df

@ghost
Copy link

ghost commented Oct 29, 2019

All of these are about pub sub. It makes sense, it is new and needs tests and more fixing

@dioptre
Copy link
Contributor Author

dioptre commented Oct 29, 2019

OK let me know how I can help. I'll be removing all subscribe actions again for now.

@dioptre
Copy link
Contributor Author

dioptre commented Oct 30, 2019

Also can test anytime you need after you make updates.

@dioptre
Copy link
Contributor Author

dioptre commented Oct 30, 2019

Also can confirm once the subscribe is removed the system is stable.

@ghost
Copy link

ghost commented Nov 1, 2019

Perfect. Pubsub functions needs tests and some more testing

@ghost ghost changed the title Node process terminates "free(): double free detected in tcache 2" Pubsub: Node process terminates "free(): double free detected in tcache 2" Nov 1, 2019
@ghost ghost mentioned this issue Nov 3, 2019
@ghost ghost added the bug Something isn't working label Nov 8, 2019
@ghost
Copy link

ghost commented Nov 8, 2019

This should be bug #222, could you check that?

@ghost ghost closed this as completed Nov 8, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant