Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProgrammingError after reconnecting. #32

Open
luhn opened this issue Jul 18, 2014 · 7 comments
Open

ProgrammingError after reconnecting. #32

luhn opened this issue Jul 18, 2014 · 7 comments

Comments

@luhn
Copy link
Contributor

luhn commented Jul 18, 2014

I'm using ConnectionPool with DeadConnectionDetector. Here's my connection function, if it helps.

This morning, my primary database failed, which happened to be during a period of high-frequency querying. So of course, psycopg2.OperationalError appears several times, before it was replaced with ConnectionDead. Shortly thereafter, my standby server was promoted to master. txpostgres was then able to reconnect, which [I assume] it did. But then, immediately, the following error starts: psycopg2.ProgrammingError: execute cannot be used while an asynchronous query is underway. As far as I can tell, no queries were successfully executed until I restarted the app.

I've been digging around the txpostgres code to try to find how this could happen, but I haven't had any enlightenment so far. I was wondering if you might have any ideas.

Here are some tracebacks, but—as is usual for Twisted—they aren't very helpful.

@wulczer
Copy link
Owner

wulczer commented Jul 21, 2014

Hi,

First of all, thanks for the report!

Which version of Twisted are you using? The culprit seems to be this exception:

ValueError: file descriptor cannot be a negative integer (-1)

which is something that has been plaguing me for a long time - there's at least one Twisted ticket that has been opened in connection with this problem:

http://twistedmatrix.com/trac/ticket/4539

As you can see, the ticket is now solved, but I've seen this error happen on our production system once, even though we're running recent Twisted. OTOH, we've had a master/slave fallback recently and everything kept working.

It would be very good if you could come up with a way to reproduce the problem. At this point, I'm not even sure if it's Twisted's or psycopg2's fault... The -1 file descriptor error is most probably Twisted, but the asynchronous query underway one - that's coming from psycopg2. txpostgres calls close() on cursor detected to be in an error state, but perhaps under some circumstances that doesn't really clear the error...

Without a reproducible test case it's hard to debug; I might try to do some tests with hammering an instance with queries while restarting the database, but in the meantime I'm flying blind :(

@luhn
Copy link
Contributor Author

luhn commented Jul 21, 2014

Which version of Twisted are you using?

I'm using Twisted 13.2.0. Not the most recent, but within the past year.

OTOH, we've had a master/slave fallback recently and everything kept working.

I should note that I've also had failovers that have worked fine, this is the first time something went wrong.

Without a reproducible test case it's hard to debug; I might try to do some tests with hammering an instance with queries while restarting the database, but in the meantime I'm flying blind :(

I'll play around when I get a chance and see if I can come up with anything.

@luhn
Copy link
Contributor Author

luhn commented Jul 22, 2014

After doing some mild investigation: It looks like Ticket #4539 is broken again. The test_foo.py you wrote fails on Ubuntu. (Works fine on my mac. Different reactor?)

@luhn
Copy link
Contributor Author

luhn commented Jul 22, 2014

It appears that the tests created for the previous fix (removedFromReactor, negativeOneFileDescriptor) are passing.

@wulczer
Copy link
Owner

wulczer commented Jul 23, 2014

Thanks for the investigation! I just tried with Twisted trunk and test_foo.py failed on my Debian laptop.

I tried with various reactors and both epoll (the default on Linux) and poll failed, while select worked - and AFAICS select is what's used on OSX.

I'll try digging deeper into that, although I'm also a bit concerned about this problem being a red herring. I'm not 100% positive that it's what has been causing the reconnection problem you got; it would be great to have a reproducible test case for that so we can be sure that fixing the -1 FD problem fixes the txpostgres bug.

@wulczer
Copy link
Owner

wulczer commented Jul 23, 2014

OK, that one was easy: for epoll, fileno() is called from inside addReader, so it's obvious why my test_foo.py was failing. If I move self.insideReactor = True to just above self.reactor.addReader(self) the test passes.

I'll try to write a program to bombard a Postgres instance with queries and kill the database - let's see if I manage to get a similar error to yours.

@wulczer
Copy link
Owner

wulczer commented Jul 23, 2014

I did a few more tests with txpostgres and Twisted trunk and have been able to reproduce the -1 error, but it still managed to recover and continue querying. Here's the code:

https://gist.github.com/fa25fbae2855e8e5d2d6

I'm running it in one terminal and in another one doing:

sudo pg_ctlcluster -m fast 9.1 main restart

This gives me

Unhandled Error
Traceback (most recent call last):
  File "/home/wulczer/src/twisted.git/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/wulczer/src/twisted.git/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/wulczer/src/twisted.git/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/wulczer/src/twisted.git/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/wulczer/src/twisted.git/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
    why = selectable.doRead()
  File "/home/wulczer/src/txpostgres/txpostgres/txpostgres.py", line 683, in doRead
    self.reactor.addReader(self)
  File "/home/wulczer/src/twisted.git/twisted/internet/epollreactor.py", line 271, in addReader
    EPOLLIN, EPOLLOUT)
  File "/home/wulczer/src/twisted.git/twisted/internet/epollreactor.py", line 257, in _add
    self._poller.register(fd, flags)
exceptions.ValueError: file descriptor cannot be a negative integer (-1)

but after that I get a reconnection and the queries keep running.

I start to suspect that the -1 FD is indeed a red herring. We should focus on reproducing the actual problem with the asynchronous query underway error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants