New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage (or memleak?) #43

Closed
nicokaiser opened this Issue Mar 21, 2012 · 107 comments

Comments

Projects
None yet
@nicokaiser
Contributor

nicokaiser commented Mar 21, 2012

Hi all!

I created a small example for the memory leak my ws (or WebSocket.IO) server has.

This small server only accepts WebSocket connections and sends one message.
Client count and memory usage is displayed every 2 seconds.

The client open 10,000 connections, then every minute 2,000 new connections are openend and 2,000 are closed, so the count stays at about 10,000.

https://gist.github.com/2152933

Memory Graph

The graph is the output of server.js (run on a EC2 "large" instance with node 0.6.13 and ws 0.4.9).

  • Server startup: RSS is about 13 MB (0 clients)
  • When the first 10,000 clients are connected, RSS = 420 MB
  • During the next 10 minutes clients come and go (see client.js), RSS grows to 620 MB
  • heap usage stay stable
  • Client is stopped =(> RSS falls to 220 MB waited 1 minute for GC)
  • After 1 minute, client ist started again => RSS jumps to 440 MB (10,000 clients)
  • During the next minutes, RSS grows again up to 630 MB (after 10 minutes)
  • Client is stopped => RSS falls to 495 MB
  • When I start the client again, RSS usage seems stable (at least compared to the two runs before).

Questions:

  1. 400 MB for 10.000 clients (with no data attached) is much. But I see JS as an interpreted language is not that memory optimized than C. BUT, why does opening and closing 20.000 connections (during the 10 minute period) consume another 200 MB?
  2. The process is at about 30% CPU, so the GC has the chance to kick in (and, ws uses nextTick, so the GC really has a chance)
  3. Why is he GC unable to free the memory after the second run? Can't be the Buffer/SlowBuffer problem (fragmented small Buffers in different 8k SlowBuffers), as there are no Buffers used anymore...
  4. Why does the RSS usage remain pretty stable after the first two runs?

The effect is the same (but slower) with only 1,000 client connections per minute, but things get even worse when I don't stop the client after 10 minutes. Our production server runs with about 30-40% CPU constantly (30k connections), 1-5% at night (1-2k connections), but is never completely idle. The growing of the RSS usage never seems to stop.

On the production server, RSS grows until node crashes (0.4.x crashes at 1 GB) or the process gets killed by the system (0.6 supports more than 1 GB).

I'll try two things tomorrow:

  • The same setup with Node 0.4 (as 0.4 seems much better in regards of memory consumption than 0.6), and
  • with different WebSocket libraries, e.g. "websock", which is not 100% stable but only consumes one third (still leaking though), and a variant of WebSocket.IO with the original Socket.IO (not ws!) hybi parsers, see my "nows" branch.

The setup is a default Amazon EC2 large instance, so this must be an issue for anyone who runs a WebSocket server using ws (or must likely also WebSocket.IO with ws receivers) with some traffic. I refuse to believe Node is not capable of serving this.

Or am I missing something?

Nico

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 21, 2012

Update: the 60-minute version is still running, I'll post the results tomorrow. RSS seems to stabilize around 900 MB if I don't stop the clients. Come on, 900 MB (500 of which is garbage!)?!

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

This is 60 minutes with 1 minute pause after each 60 minute run:

ws 60 minutes

Maybe the "leak" is no leak but very very high memory consumption...

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 22, 2012

Great report, Nico. You are onto something when you say that each connection consumes a lot of memory. I pool rather aggressively, in order to ensure high speed. The 'ws' library is written to be as fast as possible, and will at present outperform all other node.js websocket libraries, for small data amounts as well as large.

That said, I will deal with this, to push the memory use down. An alternative is to introduce a configuration option which allows the user to favor either speed or memory consumption.

In your test above, are you sending any data for the 10k clients? How large would you say the largest packet is?

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

This is WebSocket.IO 10-minute (like the very first chart) with Node 0.6.
It's even worse than ws, as it never returns the memory during idle phases.

wsio-test-10min

@nicokaiser nicokaiser closed this Mar 22, 2012

@nicokaiser nicokaiser reopened this Mar 22, 2012

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

@einaros The client ist the one in the Gist – the client sends 10k at the beginning of each connection.

A configuration option would be amazing! I'll observe the production server, which is running Node 0.6 right now, if memory usage rise ends somewhere ;)

I understand that much memory is needed to ensure fast connection handling, however the memory should be freed after some (idle?) time...

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 22, 2012

For 10k users sending an initial 10k (in a single payload), the consumption would be at least 10.000 * 10k.

I agree that the pools should be released. Adding a set of timers to do that without negatively affecting the performance for high connection counts will require thorough consideration before being implemented.

Thank you for your continued work on this. We'll get to the bottom of it :)

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

Well, I open 10,000 clients at the startup, and then open and close 2,000 random clients every minute. So there should be 10,000 clients all the time, but about 20,000 "connection" events (and thus, 20,000 initial 10k messages received) during the 10 minutes.

Isn't the pool released after a client disappers?

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 22, 2012

Each pool is released as the client closes, yes.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

@einaros then it must be something different (or poor GC), because after each 10 minute period all clients are disconnected, so there should be no buffer left (so even the shared SlowBuffers should be freed by V8)...

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 22, 2012

For comparison, this is WebSocket.IO without ws (backported the original Socket.IO hybi modules) and without client tracking (only clientsCount):

wsionows-test-10min

GC seems to run from time to time, but does not catch everything. Maybe interesting for @guille

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 22, 2012

@nicokaiser, I wrote the "old" websocket.io hybi parsers as well, so this is all my thing to fix in either case :)

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

I've found a couple of issues now, which takes care of leaks visible through the v8 profiler's heap dump. With my recent changes, I can spawn up 5k clients to a server, send some data, then disconnect the users, and the resulting heap dump will look more or less as it did before any clients connected.

I can't yet get the rss to fall back to a reasonable size, though, but I'm looking into it.

@3rd-Eden

This comment has been minimized.

Member

3rd-Eden commented Mar 30, 2012

And you are not leaking any buffers? As they are shows in RSS and not in the V8 heap

On Friday, March 30, 2012 at 10:29 AM, Einar Otto Stangvik wrote:

I've found a couple of issues now, which takes care of leaks visible through the v8 profiler's heap dump. With my recent changes, I can spawn up 5k clients to a server, send some data, then disconnect the users, and the resulting heap dump will look more or less as it did before any clients connected.

I can't yet get the rss to fall back to a reasonable size, though, but I'm looking into it.


Reply to this email directly or view it on GitHub:
#43 (comment)

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

@3rd-Eden, for js Buffer/SlowBuffer instances, they should show up in the heap dump. I figured it may be due to allocations in my native extensions, but the results were exactly the same with those disabled.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 30, 2012

@einaros that sounds great! That's exactly the kind of issues I get – growing rss that don't fall back (the difference between heapTotal and rss grow until the process crashes.

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

I can't help but think that the remaining issues are caused by something within node leaking native resources, probably triggered by something I am doing. And I'm guessing that something is the buffer pool. Now the question is how the buffers are handled / used wrong, and how they wind up leaking native allocation blocks.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 30, 2012

On the production system, I disabled BufferPools (by using new Buffer - like BufferPool would do with initial size 0 and no growing/shrinking strategy) and validation (by, well, not validating utf8), but this does not solve the problem.

So it must be either the buffertool (mask) (I cannot test the JS version from the Windows port, as I'm using Node 0.4) or something else...

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

@nicokaiser, I'll have another look at the bufferutil and see what I can find.

@3rd-Eden

This comment has been minimized.

Member

3rd-Eden commented Mar 30, 2012

That could certainly be the issue here, as there also countless reports that socket.io on 0.4 leaks less memory than socket.io on node 0.6.
So it could be that node is leaking buffers somewhere as well.

On Friday, March 30, 2012 at 10:48 AM, Einar Otto Stangvik wrote:

I can't help but think that the remaining issues are caused by something within node leaking native resources, probably triggered by something I am doing. And I'm guessing that something is the buffer pool. Now the question is how the buffers are handled / used wrong, and how they wind up leaking native allocation blocks.


Reply to this email directly or view it on GitHub:
#43 (comment)

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

@nicokaiser, I disabled my native extensions again, to no avail.

Will go pick a fight with the buffers now.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 30, 2012

Another observation:

  • When I make the server do something expensive (e.g. a long for-loop, or sending all connected client objects over Repl) while there are many clients (and many connection open/close operations), the rss tends to be higher after the expensive (blocking) operation. So something might get queued up and never freed...
@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

Well it is definitely the buffers. The question is how / why they are released by v8, but the native backing isn't.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 30, 2012

Is there a way to completely avoid the Buffers and e.g. use Strings for testing (I know this is highly inefficient)?

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

Well I'm not saying it's my buffers, just node Buffer buffers. I'll just have to find if this is something specific to what I'm doing, or that it's an actual node bug.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Mar 30, 2012

@einaros yes, but on a high-traffic websocket server, ws is the place that naturally generates many many Buffer objects. So if we could avoid this (by using strings and making some inefficient, slow operations with them), we could test if the memory problem disappears then.

@einaros

This comment has been minimized.

Contributor

einaros commented Mar 30, 2012

You can't avoid Buffer objects for binary network operations.

@3rd-Eden

This comment has been minimized.

Member

3rd-Eden commented Mar 30, 2012

Node will transform strings to Buffer automatically anyways, so even if we didn't do binary network operations it would still be impossible (if I remember correctly)

On Friday, March 30, 2012 at 11:13 AM, Nico Kaiser wrote:

@einaros yes, but on a high-traffic websocket server, ws is the place that naturally generates many many Buffer objects. So if we could avoid this (by using strings and making some inefficient, slow operations with them), we could test if the memory problem disappears then.


Reply to this email directly or view it on GitHub:
#43 (comment)

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Apr 8, 2012

Hm, ok, none of the optimizations had any effect on the very high memory usage – I assume Node is failing to release unused Buffer memory:

http://cl.ly/1R442g3t2d1T3S152s0i

Do you think compiling node for ia32 (instead of x64) might change something?

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented May 8, 2012

@crickeys can this make a different if I'm not using https? (only some crypt functions)

I'm using the bundled libssl (0.9.8 I think).

@crickeys

This comment has been minimized.

crickeys commented May 8, 2012

according to this: nodejs/node-v0.x-archive#2653 yes
Apparently the crypt function rely on openssl. However, I tried doing this and still have a gradual memory leak with socket.io and websockets enabled. Those also seem to use the crypt functions, but this didn't seem to help there :(

@crickeys

This comment has been minimized.

crickeys commented May 8, 2012

Actually, I may not have properly updated libcrypto eventhough I successfully built node with an openssl 1.0.0i. It appears the older 0.9.8 libcrypto is being used by my node. Let me keep playing with this, you may want to try it too as I really don't know what I'm doing here :)

@einaros

This comment has been minimized.

Contributor

einaros commented May 11, 2012

@nicokaiser Have you had any memory related crashes lately?

I think it's time to summarize where we are at with this again. With the latest node version and latest ws version - are anyone seeing any actual crashes due to out of memory errors?

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented May 11, 2012

@einaros We changed some things in the infrastructure (split the server, sticky sessions with HAproxy), which causes each server not to have more than ~10.000 connections. The last 2 weeks look like this for one of the servers (node 0.6.16, ws 0.4.13), memory and clients:

Memory is at about 1.4 GB per server, with about 9k clients.

I'll try to keep the servers running (needed to restart them because of an update after these screenshots) to see if memory rises above 2 GB (when swapping kicks in).

@einaros

This comment has been minimized.

Contributor

einaros commented May 11, 2012

The fact that it doesn't keep growing beyond the numbers seen there could suggest that there isn't actually any leakage anymore. At the same time, there could be a tiny leakage (anywhere), which over a greater extent of time than measured here could cause a crash.

At some point I need to build a stress testing environment to see how it reacts to sustained loads from more than 50k clients..

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented May 11, 2012

Update: the installed version I mentioned adds socket.destroy to every socket.end, see #64. Don't know if this makes a difference; I can change one of the servers to use vanilla ws 0.4.14 next week.

@einaros

This comment has been minimized.

Contributor

einaros commented Jun 14, 2012

Since this hasn't been brought back up, I'm (temporarily) concluding that the issue has been covered by the fixes as of late. The lingering memory seen in graphs right now are due to free'd memory not being released immediately.

Should anyone actually get a process crash due to running out of memory, we'll have to revisit it. Judging from the lack of talk here lately, that hasn't been the case in quite some time.

@einaros einaros closed this Jun 14, 2012

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Jun 14, 2012

Confirmed. The process takes lots of memory over time, but this seems like "node.js is lazy to clean up memory". A server that actually uses the memory it gets is ok as long as it does not take more than available.

@paiboonpa paiboonpa referenced this issue Oct 8, 2012

Closed

memory leak #1015

@nodenstuff

This comment has been minimized.

nodenstuff commented Oct 8, 2012

Confirmed that its happening again. Server eventually crashes after running out of RAM.

node 0.8.9
socket.io 0.9.10
Express 3.0.0.0rc4

@kanaka

This comment has been minimized.

Contributor

kanaka commented Oct 8, 2012

I can confirm that our server uses more and more memory over time and eventually crashes when the memory use gets high. The more connections and data transferred the quicker the crash happens.

node 0.8.9
ws 0.4.21
AWS t1.micro running Ubuntu 12.04.1

@einaros

This comment has been minimized.

Contributor

einaros commented Oct 8, 2012

@nodenstuff, thanks for reporting. Since nothing much has been changed in ws over the last few months, I'm not immediately jumping to the conclusion that this is caused by the ws parsers. If there wasn't a crashing leak four months ago, but there is today, that's a regression in another component - or a breaking change in node which ws will have to be updated to work with.

@kanaka, are you using socket.io?

@nicokaiser, how has your situation been lately? Any crashes?

@kanaka

This comment has been minimized.

Contributor

kanaka commented Oct 8, 2012

@einaros, no socket.io, it's a very simple server that just uses the ws module directly:

https://github.com/n01se/1110/blob/6c90e0efc3a4afeb099f79d18d471a5936de1d3e/server.js

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Oct 8, 2012

@einaros No crashes, but this may be because we reduced the load from our servers and 2 GB memory seem to be enough.

No I cannot check this at the moment, but I'm pretty sure, that, while Node allocates 1.5 GB memory, one process will crash if I start another process on this server that uses more than 512 MB.

Fact is that Node, starting from 0.6.x, fails to return unused memory to the kernel, but keeps the RSS (!!) allocated. Which maybe no problem if this Node process is the only thing that runs on the server (and thus is allowed to eat all of its memory), but it's not nice and definitely a bug.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Oct 8, 2012

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Oct 8, 2012

@paiboonpa

This comment has been minimized.

paiboonpa commented Oct 10, 2012

I have just tried with NodeJS 0.9.2 (with a little modification to socket.io 0.9.10 because it cannot find socket.io.js client). This problem still going on. Seem like only way to solve this is use node 0.4.12. :( I want to use native cluster more but seem like I do not have other choice...

@nodenstuff

This comment has been minimized.

nodenstuff commented Oct 10, 2012

I enabled flashsocket on 0.9.10 and I didn't run out of ram today. Might just be a fluke. But currently hold steady around 2GB total in use, that includes redis and everything else running on the server. I will update if that changes.

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Oct 11, 2012

@paiboonpa

This comment has been minimized.

paiboonpa commented Oct 21, 2012

Good News on rewrite buffer:
https://groups.google.com/forum/?fromgroups=#!topic/nodejs/Vjg7-VHGrnk

Branch:
https://github.com/joyent/node/tree/crypto-buffers

Bad news:
This branch cannot run with MySQL at all. It showed me Access denied even username&password already correct. A hack to fix MySQL issue need and hope it can fix this memory issue.

@paiboonpa

This comment has been minimized.

paiboonpa commented Jan 4, 2013

I just setup stud to terminate ssl and now node do not need to process ssl anymore. Now, my memory consumption decrease about 3 times. Maybe this bug relate to using wss via https?

@damianobarbati

This comment has been minimized.

damianobarbati commented Jun 5, 2018

@nicokaiser how did you solve this in the end? I see a very high memory consumption, and when sockets are closed memory is still used (at least most of it).

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Jun 6, 2018

@damianobarbati I simplified the application code to avoid retained references, this did already solve the problem. And I switched to uws, which also gave a huge memory advantage.

@damianobarbati

This comment has been minimized.

damianobarbati commented Jun 6, 2018

@nicokaiser could you show a simplified example about "avoiding retained references"?
I typically associate data to ws object (i.e: ws.token = token) but shouldn't close event and terminate method be enough to have the gc clean up the empty references?

Thanks for helping!

@nicokaiser

This comment has been minimized.

Contributor

nicokaiser commented Jul 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment