Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak example #1122

Open
yannick opened this issue May 27, 2015 · 16 comments
Open

memory leak example #1122

yannick opened this issue May 27, 2015 · 16 comments

Comments

@yannick
Copy link
Contributor

yannick commented May 27, 2015

i have created a vibe application which calls a few backend services (it shovels data into amazon kinesis). unfortunately once put into production on a heavy traffic website it starts to eat all my memory.

example is here: https://github.com/yannick/vibed-crash-example/tree/memleak
(i just put it into a branch and reused the example for #1099 ). more or less it just uses requestHTTP within a server to call backend services.

any help appreciated

@yannick
Copy link
Contributor Author

yannick commented May 27, 2015

when you replace the requestHTTP with std.net.curl.get it's slow as hell but memory usage stays more constant. so it might be a dupe of #1114

@etcimon
Copy link
Contributor

etcimon commented May 28, 2015

Are you calling HTTPServerResponse.dropBody() ? The destructor doesn't actually finalize anything

@yannick
Copy link
Contributor Author

yannick commented May 28, 2015

dropBody() solves it (mostly),it still does not free memory.
why is dropBody not called from ~this() ?

@etcimon
Copy link
Contributor

etcimon commented May 29, 2015

dropBody() solves it (mostly),it still does not free memory.

I have 2 ideas that could reduce the memory usage:

1- Using HTTPClient directly will avoid keeping the objects in a ConnectionPool, which prevents memory from being freed and is the side effect of requestHTTP.

2- I've suspected for a while that the pool size might be responsible. I would like to try and cap it at 2mb just for a test, I've never really played around with the GC config options so it's just a guess

https://github.com/D-Programming-Language/druntime/blob/001041216659779913cd445e8a7b60cfc67c5349/src/gc/config.d#L32

why is dropBody not called from ~this() ?

Not sure

@yannick
Copy link
Contributor Author

yannick commented May 29, 2015

thanks @etcimon
i'll try these two.
however in my real word application the memory grows about 2m per minute (handling around 8-10k requests per minute behind an nginx).

i tried to debug it with valgrind which says that some memory is lost, but i'm not really familiar with this tool.
massif: https://gist.github.com/yannick/4dec9187254c6451b9d6#file-massif-log
leakcheck: https://gist.github.com/9bb15f847c09b9ad4c9b
leakcheck libasync: https://gist.github.com/yannick/4dec9187254c6451b9d6#file-lc-log

i suspect its loosing mem on multiple locations, what e.g. puts me off is that it also mentions shared AsyncDNS loosing stuff.

@etcimon
Copy link
Contributor

etcimon commented May 29, 2015

The 2 biggest losses come from stream.operations:128 and http.server:1499 which are:

auto bufferobj = FreeListRef!(Buffer, false)();
and
FreeListRef!HTTPServerRequest req = FreeListRef!HTTPServerRequest(reqtime, listen_info.bindPort);

The rest of the losses probably stem from there, since these manually allocated objects are keeping GC pointers alive. Of course, you could also comment out the DebugAllocator from Libasync at libasync.internal.memory, it seems to use some memory too.

A quick fix for the FreeListRef issue could be to use auto req = FreeListObjectAlloc!HTTPServerRequest.alloc(...); scope(exit) FreeListObjectAlloc!HTTPServerRequest.free(req); and same for the buffer.

Can you run a new leak check with this change?

@etcimon
Copy link
Contributor

etcimon commented May 29, 2015

This could probably be caused by the by-ref vs by-value distinction in FreeListRef opAssign, one of those should swap rather than increment I think, I'm not sure if ~this is always called when returning a freshly created struct.

@etcimon
Copy link
Contributor

etcimon commented May 29, 2015

If you can test this out, we'll see if it works. I'd also recommend to use libasync master and to merge the libasync-fix branch, I've been fixing lots of leaks lately.

@yannick
Copy link
Contributor Author

yannick commented May 31, 2015

more massif/leakcheck outputs: https://gist.github.com/yannick/1ee72552c8f9fb804850
that one is with libevent and 3a00dcc.

@etcimon
Copy link
Contributor

etcimon commented May 31, 2015

This test only shows info for about 85kb of data:

https://gist.github.com/yannick/1ee72552c8f9fb804850#file-leakcheck31-log-L4080

Today, I started testing everything again and apart from 2 leaks in my http/2 branch (which I promptly fixed), I found nothing in vibe.d. I'm starting to suspect that you might be misusing something. Here's a few odd pieces of info I found:

https://gist.github.com/yannick/4dec9187254c6451b9d6#file-massif-log-L3159

The std.concurrency package should never be used with vibe.d, you must import vibe.core.concurrency instead as a drop-in replacement

https://gist.github.com/yannick/4dec9187254c6451b9d6#file-massif-log-L3060

This array seems to be getting very large (155kb), does it contains pointers?

Aside from that, lets say you have a hashmap that looks like this: Log[string] map, where string is an IP taken from the headers ie. a pointer tracked by the GC that keeps a header line alive. The proper way to remove a log item from the hashmap would be map.remove(ip), so if you're doing map[ip] = null or something of sorts you'd be keeping the header line alive.

So, to sum things up, you should probably review these particular points. Otherwise, in valgrind you should make at least 1000 requests to get a nice output.

Finally, if you need some help getting a better throughput there's the VibeManualMemoryManagement version flag that might improve by 2-3x the req/s amount. I'd say it may be harder to get working though. There's many more optimizations to do but I guess we should track the leak first. Let me know if you need me to read your code

edit: One last point. If you're using destroy on a struct* - e.g. Log* log; log.destroy(); you're only nulling the pointer. If you want to destroy the contents, you need to do destroy(*log). I had a hard time with this one ;)

@etcimon
Copy link
Contributor

etcimon commented Jun 1, 2015

We investigated this issue and long story short, Valgrind is the recommended tool to evaluate if a program leaks; none were shown on the massif reports (they show 86kb memory usage for 1000s of requests). I recommend that an FAQ entry be written about this because vibe.d is an uncommon long-running single-process webserver and it's normal for anyone to be looking at the memory footprint. The virtual size / physical size shown by linux tools will indicate a false positive wrt leaks because the OS over-allocates using the motto "If your ram isn't used, you're wasting your money". The reassuring part is that the memory will be recycled if another process competes, ie. it won't trigger an OOM. On the valgrind logs (which track malloc/free), vibe.d averages a stable amount of 86kb-300kb of memory following 1000s of requests.

@dariusc93
Copy link
Contributor

@etcimon Though I do understand that vibe recycles the memory, but if there is ever an attack, even a minor one, can cause it to push the pool higher than it should be. Not even an attack but when it comes to large traffic websites, this could become a problem. As one doesnt want to see nearly a 3rd of memory being used and not being freed when its not in used leaving room for other processes, if available.

@etcimon
Copy link
Contributor

etcimon commented Jun 8, 2015

I've been debugging this in Windows for a while now, the memory usage in the task manager is as real as it gets. Even with the GC on, the memory shouldn't have to increase, contrary to what I had believed so far. I think the problem is circular references keeping objects alive. The GC is very unreliable in terms of destruction, my solution (on my fork) has been to destroy GC-allocated objects, it takes a lot of path analysis though especially with the RAII / GC mix.

Currently, my biggest problem is the fact that the GC can collect from any thread. I don't know if libevent is thread-safe by default, but if Thread#2 allocates on the GC, it may end up sweeping and finalizing all the objects from Thread#1, which you better hope doesn't rely on thread-local storage. An invalid memory operation (due to destructors throwing) in a foreign thread will probably fail silently too and maybe even disable the GC, worst case it'll cause a deadlock. I'm now experimenting a __gshared Array!Destructor[Thread] destructor queue for the HTTPClient because of this!

I think D as a whole should have documented this. All destructors should be tagged nothrow shared @nogc ~this(). Otherwise, you're letting through some major bugs in long-running applications.

@etcimon
Copy link
Contributor

etcimon commented Jun 9, 2015

@s-ludwig could the leak be due to the fiber stack not being cleared? ie. do you think some pointers on the fiber stack could be kept visible to the GC because of the recycling process?

If so, this means it would be important to manually destroy any GC references on the stack, such as the appender dst.data in stream.operations.readAll

@dariusc93
Copy link
Contributor

Any update regarding this issue? I do not know if this is still a real issue but it would be nice to try to get it resolved, if all possible.

@JustinVenus
Copy link

tl;dr adjusting (lowing) the maxPoolSize GC setting takes care of perceived memory issues with the HTTP Client.

For context my app is a (http) reverse proxy.

I just came across this issue debugging long GC collect latency when using the HTTP Client and thought I'd help the next person out by commenting. Further up in this report it is recommended to try adjusting the maxPoolSize to about 2MB. I found I could set maxPoolSize to 16MB (decrease the number of collections) and keep the throughput very close to the same as 2MB.

added this to app.d

extern(C) __gshared string[] rt_options = [ "gcopt=maxPoolSize:16 profile:1" ];

My GC collections increase in frequency compared to the default GC settings, but overall the collect time went down.

Listening for requests on http://0.0.0.0:8080/
^CReceived signal 2. Shutting down.
    Number of collections:  187
    Total GC prep time:  1 milliseconds
    Total mark time:  49 milliseconds
    Total sweep time:  88 milliseconds
    Total page recovery time:  2 milliseconds
    Max Pause Time:  0 milliseconds
    Grand total GC time:  141 milliseconds
GC summary:    5 MB,  187 GC  141 ms, Pauses   50 ms <    0 ms

A nice side effect to the more frequent GC's was the increased throughput and remained consistent

Total transferred:      1910000 bytes
HTML transferred:       380000 bytes
Requests per second:    11774.08 [#/sec] (mean)
Time per request:       1.359 [ms] (mean)
Time per request:       0.085 [ms] (mean, across all concurrent requests)
Transfer rate:          2196.14 [Kbytes/sec] receive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants