Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process invalidations immediately for closed connections #185

Closed
wants to merge 1 commit into from

Conversation

jmuchemb
Copy link
Member

This fixes a memory leak when pool-timeout is not set.

This fixes a memory leak when `pool-timeout` is not set.
@icemac
Copy link
Member

icemac commented Dec 19, 2017

@jimfulton What do you think about this PR?

@jimfulton
Copy link
Member

Thanks for the ping.

This PR doesn't make any sense to me. :)

I set it aside when it was submitted and failed to come back to it. I'll dig in this weekend.

@jimfulton
Copy link
Member

This implementation seems rather hacky to me. The assignment of _invalidate on a connection's storage is especially unsavory.

A more detailed description of the problem would have made review easier.

Let me see if I understand:

Currently for the MVCCAdapterInstance case, invalidations are stored in the adapter and remain unprocessed until a closed connection is opened. This means that when a connection is closed, we store these invalidations longer than we want and keep data in the connection's cache longer than we want. I agree that this is wasteful. It would be nice to do these invalidations immediately, both to save memory and to do work while connections are idle.

Have I got this right?

Here's an alternative:

Change the MVCCAdapter constructor to take a database as a second argument. When an adapter gets invalidations, it will update it's instances as it does now. It will then call a method on the database (invalidations_received ?) that causes all closed connections to poll for invalidations. That will flush pending invalidations and invalidate cached objects sooner.

Thoughts?

Also, @jamadden would likely be interested in this discussion. :)

@jamadden
Copy link
Member

Change the MVCCAdapter constructor to take a database as a second argument. When an adapter gets invalidations, it will update it's instances as it does now. It will then call a method on the database (invalidations_received ?) that causes all closed connections to poll for invalidations. That will flush pending invalidations and invalidate cached objects sooner.

At first glance, I was thinking that seems like it's going to (up to) double the number of invalidation polls that RelStorage makes, assuming any write activity at all, if you're using multiple connections. But RelStogare natively implements IMVCCStorage so it wouldn't actually be involved in this, would it? So any potential memory savings wouldn't apply to it.

In general I'm not sure I like the idea, if I understand what it's trying to accomplish. It seems like this would only be an issue if you had a lot of connections in the pool that were active, and then all (or all but one) sat idle. How often does that happen? Maybe if that's a problem there can be a way to manually ask closed connections to process invalidations (or just clear their caches) and affected applications can do that at their leisure (presumably they'd know when they were "done" or "idle"). Or maybe that's what pool-timeout is for 😄

@jimfulton
Copy link
Member

@jamadden You're right that this wouldn't effect RelStorage. Perhaps some other idea would, if we even want that.

An issue that the MVCCAdapter has that RelStorage doesn't is that invalidations are pushed to it and can pile up.

This is a potential issue whenever clients are idle for an extended time. I can think of lots of cases like this, such as special clients (e.g. sqs workers) that are idle much of the time, or web clients in generously provisioned applications. For many of these, you might want to keep the caches around, but not keep invalidated objects and not want to let invalidations pile up. (Again, not an issue for relstorage.)

When I first looked at the PR a while ago, my reaction was that pool timeout probably shouldn't be disabled by default. But even if it was enabled, I'd think you would want to be conservative, and might want to keep at least one connection and might still have idle connections, or connections idle for some time.

@jmuchemb
Copy link
Member Author

Have I got this right?

yes

Change the MVCCAdapter constructor to take a database as a second argument. When an adapter gets invalidations, it will update its instances as it does now. It will then call a method on the database (invalidations_received ?) that causes all closed connections to poll for invalidations. That will flush pending invalidations and invalidate cached objects sooner.

But the instances of an adapter include those that are used by closed connections. Although invalidations are processed with what you suggest, I don't see how it prevents the memory leak (MVCCAdapterInstance._invalidations).

This implementation seems rather hacky to me. The assignment of _invalidate on a connection's storage is especially unsavory.

This reminds me my comment about the addition of adapters. On ZODB4, this PR would look much better.

@jimfulton
Copy link
Member

jimfulton commented Jan 13, 2018 via email

@jmuchemb
Copy link
Member Author

@jimfulton Now I understand what you suggest, but from a performance point of view, it's ugly. The oids would be copied to set, just to be removed immediately. That would become quite labyrinthine:

  1. MVCCAdapter.invalidate -> MVCCAdapterInstance._invalidate
  2. MVCCAdapter.invalidate -> DB.invalidations_received? -> Connection.invalidations_received? -> MVCCAdapterInstance.poll_invalidations

poll_invalidations would also do useless work (updating _start)

Then to avoid code duplication, newTransaction would be split, which means yet another function call.

I admit that setting MVCCAdapterInstance._invalidate is not nice, mainly because it's not a defined interface and MVCC storages like RelStorage should not be modified this way.

About pool-timeout, I always prefer when software are smart enough to adapt automatically and don't bother with obscure settings.

@jimfulton
Copy link
Member

I don't don't see significant overhead in what I'm proposing. It has the advantage of having closed connections behave much like closed connections. It also gives behavior similar to what we has in ZODB 4 for closed connections.

But whatever.

@jimfulton jimfulton closed this Jan 13, 2018
@jmuchemb
Copy link
Member Author

This isn't even an issue for ZODB 4, IIRC, because it processes invalidations against closed connections. (If that's not true, then perhaps that should be fixed.)

ZODB4 also has this memory leak. That's the version I used when I got a OOM-kill.

Closed connections also collected oids (in Connection._invalidated), and invalidations were processed when it was reopened.

What is simpler with ZODB4 is that oids are collected on the Connection itself, whereas with ZODB5 we have to link MVCCAdapterInstance to Connection.

@jimfulton
Copy link
Member

jimfulton commented Jan 13, 2018 via email

@jmuchemb
Copy link
Member Author

But whatever.

I do want ZODB to be smarter by not requiring to configure pool-timeout for the usual case. The current situation is too bad, and your suggestion remains an improvement.

@jimfulton
Copy link
Member

So I see 2 issues:

  • Pool timeout should be enabled by default.
  • Better handling of invalidations for MVCCAdapter invalidations for closed connections.
    I'm open to other ideas beside my proposal.

Perhaps 2 new PRs, if you're up to it? (I sadly don't have time to work on this now. :()

@jmuchemb
Copy link
Member Author

  • Pool timeout should be enabled by default.

If invalidations are processed immediately for closed connections, I don't see why.

new PRs, if you're up to it? (I sadly don't have time to work on this now. :()

Sure. maybe at the end of this month.

@jmuchemb
Copy link
Member Author

jmuchemb commented Oct 2, 2019

@jamadden I am still looking for a solution about this issue and I wonder if RelStorage is affected or not. I don't see why it wouldn't.

@jamadden
Copy link
Member

jamadden commented Oct 2, 2019

I don't believe RelStorage is affected, and that's because invalidations are collected synchronously, when the connection is in use. A RelStorage that's not being used isn't receiving notifications or anything like that, it polls for them at particular parts of the transaction lifecycle. (But see below.) That's what Jim meant by:

An issue that the MVCCAdapter has that RelStorage doesn't is that invalidations are pushed to it and can pile up.

On large, busy sites, we were seeing that sometimes poll queries could be very expensive if these older closed connections got used again.

Using the transaction manager in explicit mode cuts down on the number of poll queries that the Connection makes, which had some benefit, but not enough.

So RelStorage 3 introduces a sort of unified invalidation cache that cuts down on the total size of any given invalidation poll query and allows the information collected to be shared across connections, making each individual poll query much cheaper. Storing that information could be seen as a semi-asynchronous collection of invalidations. But there's not a tremendous amount of memory associated with storing the invalidation information because we move collected OIDs off-heap into native integers.

In the event that it does add up because the Connection pool is poorly configured, old RelStorage connections simply become ineligible to be part of the unified invalidation cache after some threshold and the older invalidations get discarded. If they get used again, they'll invalidate all of the object cache and start fresh --- likely at that point most of the object cache is no good anymore anyway, there have been so many writes. These old connections tend to only get used during periods of bursts of activity when we're doing a substantial number of writes, which take longer than reads, so we start scraping the bottom of the barrel of the Connection pool.

The approach here of assigning an _invalidate attribute wouldn't work on RelStorage because there's no such attribute.

@jmuchemb
Copy link
Member Author

jmuchemb commented Oct 2, 2019

Thanks.

In the event that it does add up because the Connection pool is poorly configured, old RelStorage connections simply become ineligible to be part of the unified invalidation cache after some threshold and the older invalidations get discarded.

It's like MVCCAdapterInstance._invalidate would stop adding oids to _invalidations if there are too many elements and do self._invalidations = None. But the drawback is again to have a magical configuration parameter, which I don't like. Just to be sure I have understood, what is this parameter in RelStorage ?

I started something, including a micro-optimization that causes invalidations to be processed when a connection is closed but you may not like this in the case of RelStorage.

@jamadden
Copy link
Member

jamadden commented Oct 2, 2019

But the drawback is again to have a magical configuration parameter, which I don't like. Just to be sure I have understood, what is this parameter in RelStorage ?

It's ultimately intended to be an adaptive heuristic, taking into account things like number of old connections, time length between them polling, transactions/second in the environment. The goal is that it doesn't need to be configured: zodb/relstorage#338

But it's currently just a number specifying how many transactions (not invalidations) a connection is allowed to fall behind, while we watch how it behaves in our busy environments, just in case it doesn't actually need to be that complicated. Behaviour has actually been quite sane with that cutoff, so far. The number of loads (indicating connections invaliding otherwise valid objects) hasn't gone up or shown spikes.

I started something, including a micro-optimization that causes invalidations to be processed when a connection is closed but you may not like this in the case of RelStorage.

Yeah, that doesn't sound useful. It actually sounds like an anti-optimization if it involves extra polling.

@icemac
Copy link
Member

icemac commented Oct 11, 2019

@jmuchemb Do you think it is okay to delete the branch of this PR?

@jmuchemb
Copy link
Member Author

I didn't forget to delete it. It's just that the issue is still open. It's important for us and I'll fix it, sooner or later. No idea when I'll work again on it.

If you insist, it's true I have it locally.

@icemac
Copy link
Member

icemac commented Oct 11, 2019

@jmuchemb I was only asking because the PR is closed. We can keep it if you wish, no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants