Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove poll-interval #87

Closed
jamadden opened this issue Jun 29, 2016 · 2 comments
Closed

Deprecate and remove poll-interval #87

jamadden opened this issue Jun 29, 2016 · 2 comments
Assignees
Milestone

Comments

@jamadden
Copy link
Member

jamadden commented Jun 29, 2016

In the words of RelStorage's creator Shane Hathaway, " the "poll-interval" option is an unnecessary feature. I think it's harmful, in fact." ZODB creator Jim Fulton agrees, writing "I think just
polling in poll_invalidations would make it a lot easier for me to reason about. :) I'd also remove sync from this dance altogether at the IMVCCStorage layer."

Our organization has been using the recommended cluster configuration of a cache-server with a 60s poll-interval, as documented. We have run into a few rare, hard-to-debug instances across the cluster where it's obvious that the caches are out of sync, and so we looked into poll-interval more closely.

What the documentation doesn't mention is that if there's any problem with memcache at all, such as the connection being dropped, or a certain key being evicted, then the polling that's required won't take place. This is confounded by the fact that the python-memcached module, at least, nearly silently drops broken connections, and relstorage moves on to the next cache at that point.

Plus, when polling is required (which happens for every connection whenever there's been a commit by any other connection) it just adds overhead: we make a call to memcache, which says yes, then we go ahead and make the call to the DB; the memcache call is superfluous. The situation would be different if there were practically 0 writes, but in a large cluster given that there are hundreds of connections, even a tiny number of writes will cause lots of overhead for all the other connections.

So this proposes to remove the poll-interval option as a loaded footgun. For the 2.0 release, we will continue to accept it in the configuration but will warn if it's non-zero and ignore it. Future releases may silently ignore it and/or drop it altogether.

@jamadden jamadden added this to the 2.0.0b2 milestone Jun 29, 2016
@jamadden
Copy link
Member Author

jamadden commented Jun 29, 2016

It also occurs to me that the database poll query is trivial, and should almost certainly be cached by the DB anyway, so it should be inexpensive. I haven't tried to confirm this though.

BTW, the Oracle adapter uses the form I would expect: SELECT MAX(tid) FROM <table> while the others use the proprietary extensions SELECT tid FROM table ORDER BY tid DESC LIMIT 1. I would expect the Oracle form to be cheaper everywhere (the column is indexed). Does anyone have any recent data on this for PostgreSQL and MySQL?

Update: As this stackoverflow question states, I've verified that MAX does have the better plan on MySQL.

Update2: A separate stackoverflow question shows results for PostgreSQL, and I still replicate those query plans, so LIMIT may be better there still.

Update3: Nope, that was on a PostgreSQL database that was tiny and had no stats. If I vacuum it, I get much better plans.

@jamadden
Copy link
Member Author

jamadden commented Jun 29, 2016

Shane also writes:

Even if poll-interval works perfectly, it keeps connections idle in transactions by design. When I came up with the feature I didn't realize how bad it is to leave transactions idle all the time. It keeps Postgres from running autovacuum, which means table statistics fall out of sync, which means queries get slower over time (sometimes much slower). Other databases probably have similar issues.

This has come up on the mailing list before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant