Skip to content

Intermittent database errors in tests after updating to 4.2 #2118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sevdog opened this issue Nov 26, 2024 · 12 comments
Open

Intermittent database errors in tests after updating to 4.2 #2118

sevdog opened this issue Nov 26, 2024 · 12 comments

Comments

@sevdog
Copy link
Contributor

sevdog commented Nov 26, 2024

In my main project I have two websocket consumers which performs a lot of database-related actions.

With channels==4.1.0 my unittests with django were proceeding smoothly without any problem (I have already removed any references to database_sync_to_async since this comment, when I switched to async ORM methods).

After I updated to channels==4.2.0 it started randomly (see later) throwing database errors after the async tests are executed:

django.db.utils.OperationalError: the connection is closed

I use a multiprocess testing and I get errors only in the process which is executing consumer-related tests only after those are exeuted, due to how tests are picked up by workers and how the interpreter/host handles threads this error may not occurr.

Hunting down into channels' history I found that in #2101 it was introduced a work-around to prevent database_sync_to_async to close old connections by mocking close_old_connections calls in the communicator.

Shortly after, in #2090 the consumers handle/disconnect got an explicit call to aclose_old_connections in their workflow.

My doubt is that there is an execution path in which the mock performed in the communicator is not effective against the db-closure in the consumer, but I need to get more evidences ti get what is going on and how to fix it (currently I had 1 day of errors on my development machine and now I no longer get this error locally but only on github.actions).

@carltongibson
Copy link
Member

My doubt is that there is an execution path in which the mock performed in the communicator is not effective against the db-closure in the consumer...

Sounds likely from the description. Are you able to reduce it, even to an intermittent failure?

(pytest? Fixtures there have often shown these kind of errors, but I never dug in to see exactly why, since I don't use it myself.)

@sevdog
Copy link
Contributor Author

sevdog commented Nov 29, 2024

When I get more times to check on this I will try to introduce a bit of debug logs to check what is going on with connections and threads (currently I do not have enough tools).

I am using pytest with pytest-xdist/pytest-django to run tests on my project while I still use django-based test-cases because I prefer them to test django. pytest, in that project, is only used for its reporting capabilities which are better than django.

Apart from this I belive that the db-cleanup should have been implemented more like in django, using signals and not by putting a direct call to close_old_connections. Having two different way to handle this task in these projects seems odd, expecially because channels relies on django.
I understand that this would mean to drop support for django 4.2, because async signals were introduced in 5.0, maybe this could be done in a 5.0 release of channels?

@carltongibson
Copy link
Member

I understand that this would mean to drop support for django 4.2, because async signals were introduced in 5.0, maybe this could be done in a 5.0 release of channels?

I think we could drop Django 4.2 from the release of 5.2, and maybe even earlier. //cc @bigfootjon

It would be good to pin down exactly what's happening here. close_old_connections shouldn't really be a problem... — the connection handler should give you a new one when you next need it... 🤔

@francipvb
Copy link

Hi all,

Same issue here. I have a middleware for authentication and an async websocket consumer with database-related message handler to send data over the wire to a client.

However in the call to the database I get a database connection closed error.

I've also removed the database_sync_to_async decorator calls from my code to prevent closed connections but the issue is something other than this.

@bigfootjon
Copy link
Collaborator

@francipvb make sure you read this section of the docs: https://channels.readthedocs.io/en/latest/topics/databases.html#database-connections

Particularly the part about calling close_old_connections periodically

@francipvb
Copy link

Hello,

@francipvb make sure you read this section of the docs: https://channels.readthedocs.io/en/latest/topics/databases.html#database-connections

Particularly the part about calling close_old_connections periodically

I've read this documentation section and the consumers code directly without any success. The code just fails during tests without any good reason.

The database related code fails even inside a websocket middleware.

@bigfootjon
Copy link
Collaborator

Do you have a sample project that reproduces the issue? Or sample code at least? It's hard to debug without the code and reproduction steps

@francipvb
Copy link

Hello,

I will try to give you a sample project for this case.

Anyway I solved all connection related errors by writing a TransactionTestCase insthead of a TestCase.

@sevdog
Copy link
Contributor Author

sevdog commented Mar 26, 2025

Here is a sample project which I used for an other issue with channels and django TestCase.

I have setup a workflow which tests against channels 4.1.0 and 4.2.0, python 3.10-12 and using postgres or sqlite. Here are the results:
https://github.com/sevdog/test-websocker-user-lock/actions/runs/14088653132

When I run against postgres I get the error django.db.utils.OperationalError: the connection is closed.
With the current setup I always get the error, while before I was getting it in a more randomic way (dunno it is due to pycopg or django version).

@sevdog
Copy link
Contributor Author

sevdog commented Apr 2, 2025

Digging deeper, also adding a bit of stupid print statements (because if I try to use a debugger it will just get errors due to timeout) I have found the following hint:

# class/method threading.get_ident()
ApplicationCommunicator.send_input 133367373362880
ApplicationCommunicator.receive_output 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function no_op at 0x77c1fb000860> 133367373362880
[print statements for the connection phase, 2 threads execute code]
ApplicationCommunicator.send_input 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function close_old_connections at 0x77c1fb01d260>> 133367373362880

Even if everithing seems to be kept in the same thread, the first time it is executed the mock is applied while it is not in the second.

Goind deeper I found that the no_op method was the one setup in the receive_output method, because WebsocetCommunicator.connect calls it very shortly after sending data.

I belive that this may be bound to this issue python/cpython#114033 because mock.patch is a sync context manager and thus if used in an async context may not work as intended. An other reason to suspect this is the cause is my previous finding: the patch being applied to close_old_connection is that configured in the receive_output rather than that in send_input whereas the aclose_old_connections is called inside AsyncCommunicator.dispatch which is an input method.

IMO the current unittest.mock is not suitable for this task and the current implementation fails againsta a real database.

@francipvb
Copy link

Digging deeper, also adding a bit of stupid print statements (because if I try to use a debugger it will just get errors due to timeout) I have found the following hint:

# class/method threading.get_ident()
ApplicationCommunicator.send_input 133367373362880
ApplicationCommunicator.receive_output 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function no_op at 0x77c1fb000860> 133367373362880
[print statements for the connection phase, 2 threads execute code]
ApplicationCommunicator.send_input 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function close_old_connections at 0x77c1fb01d260>> 133367373362880

Even if everithing seems to be kept in the same thread, the first time it is executed the mock is applied while it is not in the second.

Goind deeper I found that the no_op method was the one setup in the receive_output method, because WebsocetCommunicator.connect calls it very shortly after sending data.

I belive that this may be bound to this issue python/cpython#114033 because mock.patch is a sync context manager and thus if used in an async context may not work as intended. An other reason to suspect this is the cause is my previous finding: the patch being applied to close_old_connection is that configured in the receive_output rather than that in send_input whereas the aclose_old_connections is called inside AsyncCommunicator.dispatch which is an input method.

IMO the current unittest.mock is not suitable for this task and the current implementation fails againsta a real database.

Hello,

Is there a PR to fix this already?

Thanks

@carltongibson
Copy link
Member

@bigfootjon bigfootjon marked this as a duplicate of #2150 Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants