New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mem leak on UDP connections #6815
Conversation
ticker := time.NewTicker(c.timeout) | ||
defer ticker.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand at all why this would fix anything for now.
If we suppose (even though I don't know why or how) that the problem is that somehow ticker.Stop was not called, then I understand why L208 (VS L284) might make a difference.
But then why is L207 needed at all? Why can't L208 just be:
defer c.ticker.Stop()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we keep on assuming that the problem comes from ticker.Stop not being called, that probably means that c.Close was not always called, which is still a big problem in itself. Because, among other things, that would mean c.listener.conns is not being cleaned up either (which would also be a memleak).
This is all very puzzling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed c.Close was called (actually multiple times once the timeout was reached), so c.ticker.Stop() was actually being called as well but for some reason the routine and objects were not GCed. I'm puzzled with this as well, my best guess is that the conn, stored in c.listener.conns is not being cleaned up as well but I confirmed it gets removed from the map.
So you are right that we might have more to it than just the ticker, and actually I don't see memory getting back to the starting point level after experiencing a lot of new connections, maybe its conn that is not being GCed for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the initial code was correct, and I think the new code is also correct, and since it seems to fix the problem, I think we should merge ASAP.
However, we should take the time at some point to properly understand what was going on.
One coworker's hypothesis, is that there actually is no memleak, and that we're simply allocating too aggressively for the GC to cope. And this PR would have the side-effect of moving the memory footprint from the heap to the stack, which would avoid said allocation. We could test that hypothesis by e.g. running the initial code and problematic repro with GC set to be much more aggressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
82cbedc
to
22036cb
Compare
What does this PR do?
Fix a memory / go routine leak on udp connections, specifically in allocating a new
time.Ticker
under theConn
struct.No expert here, but after taking a heap profile in a stress test which created lots of new udp connections it showed that we were leaking memory, and probably go routines (runtime malg), as we allocated a time.Ticker in
udp.(*Listener)newConn
. This is also consistent with the profile sent on #6761.It could be that the
Ticker
, which was referenced by theConn
, was not being collected by the GC, justifying the dead go routines and memory leak.Changing the implementation to allocate and stop the ticker in the main
readLoop
seems to solve the memory and go routine problems as demonstrated in the new profile taken after running the same stress test:Motivation
Issue #6761
More
Added/updated testsAdded/updated documentationAdditional Notes