Skip to content

SyncConsumer blocking all other consumers on Django ORM access #2132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JKasakyan opened this issue Jan 31, 2025 · 7 comments
Open

SyncConsumer blocking all other consumers on Django ORM access #2132

JKasakyan opened this issue Jan 31, 2025 · 7 comments

Comments

@JKasakyan
Copy link

Description
When a SyncConsumer performs a blocking action like using the Django ORM, all other actively connected consumers (both SyncConsumer and AsyncConsumer) are blocked and subsequent connections from any consumer are also blocked until the blocking action is completed.

Here 'blocked' in the context of actively connected consumers means that Daphne acknowledges incoming frames from the client but no consumer code is triggered:

daphne.ws_protocol DEBUG WebSocket incoming frame on ['127.0.0.1', 50176]

'blocked' in the context of subsequent connections means that Channels initiates the handshake and Daphne upgrades the connection to websocket, but no consumer code is triggered and ~5 seconds later the connection attempt times out:

django.channels.server INFO WebSocket HANDSHAKING /ws/chat/sync [127.0.0.1:50834]
daphne.http_protocol DEBUG Upgraded connection ['127.0.0.1', 50834] to WebSocket
daphne.ws_protocol DEBUG WebSocket closed for ['127.0.0.1', 50834]
django.channels.server INFO WebSocket DISCONNECT /ws/chat/sync [127.0.0.1:50834]

Expected behavior
My expectation is that only the thread for the SyncConsumer that is performing the blocking action should be blocked. Other connected consumers (both SyncConsumer and AsyncConsumer) should continue being able to send/receive messages, and new consumers should be able to open connections. That expectation is based off this section of the documentation, which indicates that a SyncConsumer will run in a dedicated thread:

If you’re calling any part of Django’s ORM or other synchronous code, you should use a SyncConsumer, as this will run the whole consumer in a thread and stop your ORM queries blocking the entire server.

Environment
All tests were performed in Python 3.10.6 virtual environments where the only added packages are the ones explicitly listed below:

Django 4.2 + Channels 4.2 environment:

Django==4.2.18
channels==4.2.0
daphne==4.1.2
psycopg2-binary==2.9.6

I have also tested the exact same application in a Django 4.1 + Channels 4.0 environment:

Django==4.1.13
channels==4.0.0
daphne==4.0.0
psycopg2-binary==2.9.6

As well as in a Django 5.1 + Channels 4.2 environment:

Django==5.1.5
channels==4.2.0
daphne==4.1.2
psycopg2-binary==2.9.6

Interestingly there is a slight change in behavior between the Django 4.1 + Channels 4.0 environment and the other two environments. In the Django 4.1 + Channels 4.0 environment, any actively connected AsyncConsumer can continue to send and receive messages while the SyncConsumer performs the blocking action (any other actively connected SyncConsumer is still blocked), while in the other environments the blocking SyncConsumer blocks all other operations, including from any actively connected AsyncConsumer. As in the other environments, all consumer connections attempted while the blocking action occurs are also blocked. This difference in behavior between the Django 4.1 + Channels 4.0 environment and the others stood out because it appears to be a regression in behavior.

Steps to reproduce

  • Sample code is available via a public repo: https://github.com/JKasakyan/channels-ws-blocking-sample. This is a basic expansion of the Channels tutorial chat application that sets up two endpoints for accessing a SyncConsumer and an AsyncConsumer. The consumer logic is identical. All messages are echoed back by the server, and a message containing 'sleep:' will run pg_sleep for the specified time (e.g. 'sleep: 60' will run pg_sleep(60) in the consumer). The requirements.txt in that repo is the Django 4.2 + Channels 4.2 environment referenced above
  • Follow the steps in the Reproduction section of that repo
  • Optionally run the same application in the other two environments referenced above (Django 4.1 + Channels 4.0 or Django 5.1 + Channels 4.2) and confirm behavior is as described above

Use case
We have an application that uses Django + Channels for WS, and some of these consumers use the Django ORM. We've observed situations where when traffic is high on certain WS endpoints that heavily use the Django ORM, other active WS connections are less responsive and new WS connections fail more frequently. We believe the blocking behavior described in this post is the source of the issue. I can't imagine this is intended behavior, and the section of the documentation I highlighted earlier seems to describe different behavior. Any clarification would be greatly appreciated!

@carltongibson
Copy link
Member

Thanks for the report @JKasakyan — Nice and detailed. Let me review.

@williamjburger
Copy link

@carltongibson Any update on this?

@bigfootjon
Copy link
Collaborator

@JKasakyan can you help us narrow this down a bit? The environments you shared change multiple variables at once and make it hard to see where the problem is coming from. Would you mind spending some time to identify whether the regression exists in Django or in Channels? And then if you have a "known good" and a "known bad" revision can you bisect the revision in that project? You can instruct pip to install a specific revision of a project with: https://stackoverflow.com/questions/13685920/install-specific-git-commit-with-pip

@JKasakyan
Copy link
Author

@bigfootjon I spent a few more hours on this today but won't be able to dedicate much more time in the near future.

Let me restate the main issue, since to your point my original post/example has multiple variables/dependencies in play at once:

Across all environments I've tested, I'm not seeing SyncConsumers behave as expected during synchronous blocking actions as described by this statement that's appeared in the documentation as far back as the the Channels 2.x documentation:

If you’re calling any part of Django’s ORM or other synchronous code, you should use a SyncConsumer, as this will run the whole consumer in a thread and stop your ORM queries blocking the entire server.

As of today I've tested and confirmed SyncConsumer exhibiting unexpected blocking behavior as far back as the Channels 3.0.0 release (October 30, 2020). To be clear, I've yet to find an environment where SyncConsumer behaves as described in the documentation and I've tested the following environments in addition to the ones mentioned in the original post:

  • Channels latest (4.2.2)
    • Python 3.13.2
    • Django 5.1.8
    • Channels 4.2.2
  • Channels 4 release
    • Python 3.10.6
    • Django 4.0
    • Channels 4.0.0
  • Channels 3 release
    • Python 3.9.1
    • Django 3.0
    • Channels 3.0.0

I attempted to get a Channels 2.0.0 release (February 2nd, 2018) environment tested but ran into numerous issues setting up the old dependencies on my M1 Mac (many of which are no longer officially supported like Python 3.5).

I've created another repo with a simplified project exhibiting the issue. The dependencies are stripped back completely to just Django and channels, and there is no longer a database or AsyncConsumer component.

If what I'm describing is true, this should be reproducible across pretty much any modern project that's using Channels 3.0.0+. I'm seeing it in a completely stripped down testing environment that's closely mirroring the official documentation's tutorial application across every release version I'm able to test, and we're seeing it in our production applications as well.

@carltongibson
Copy link
Member

carltongibson commented May 1, 2025

@JKasakyan From your test project's readme:

When a SyncConsumer performs a synchronous blocking action (sleep, infinite loop), all other actively connected SyncConsumers are blocked and subsequent connections from any SyncConsumer are also blocked until the blocking action is completed. See #2132

SyncConsumers using database_sync_to_async (thread_sensitive=True) are all going to run in the same thread, so yes, they're going to block each other. (Assuming that's what's actually happening.)

We'd need to use asgiref.sync.ThreadSensitiveContext to resolve this.

@JKasakyan
Copy link
Author

@JKasakyan From your test project's readme:

When a SyncConsumer performs a synchronous blocking action (sleep, infinite loop), all other actively connected SyncConsumers are blocked and subsequent connections from any SyncConsumer are also blocked until the blocking action is completed. See #2132

SyncConsumers using database_sync_to_async (thread_sensitive=True) are all going to run in the same thread, so yes, they're going to block each other. (Assuming that's what's actually happening.)

We'd need to use asgiref.sync.ThreadSensitiveContext to resolve this.

To be clear, in the simplified test project you’re quoting above I don’t explicitly use database_sync_to_async at all and I’m not using any part of Django’s ORM. The artificial blocking comes from running ‘time.sleep‘. I’m seeing database_sync_to_async appearing as a wrapper on the dispatch method of the SyncConsumer class in Channels source code. Based on your comment, this would mean that the synchronous blocking behavior is guaranteed to occur in any client consumer that uses or subclasses SyncConsumer unless the client overrides the dispatch method.

@carltongibson
Copy link
Member

Or unless we update SyncConsumer itself to use ThreadSensitiveContext, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants