Skip to content

Perf: remove unnecessary locks, replace polling with channels, increase buffers#44

Merged
s3rius merged 1 commit intocopilot/optimize-natspy-benchmarksfrom
copilot/optimize-codebase-performance
Mar 27, 2026
Merged

Perf: remove unnecessary locks, replace polling with channels, increase buffers#44
s3rius merged 1 commit intocopilot/optimize-natspy-benchmarksfrom
copilot/optimize-codebase-performance

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 27, 2026

Systematic pass to eliminate locking overhead and inefficient patterns that make natsrpy slower than nats.py in some benchmarks.

Remove unnecessary RwLock/Mutex from 7 types

All used async_nats methods on these types take &self, making locks pure overhead. Changed Arc<RwLock<T>>Arc<T>:

  • JetStreamMessage.acker — lock acquired on every ack/nack/term
  • KeyValue.store — lock acquired on every KV get/put/delete/watch
  • PullConsumer/PushConsumer — was cloning the entire consumer through a sync RwLock per operation
  • Stream, ConsumersManager, Counters — propagated from above

ObjectStore retains RwLock because seal() requires &mut self.

Replace IteratorSubscription polling with channel-based design

Old design: Arc<Mutex<Subscriber>> with 200ms timeout loop — constant lock churn and up to 200ms unsubscribe latency.

New design: spawned forwarder task + mpsc(128) channel + separate unsubscribe command channel. Same proven pattern already used by CallbackSubscription.

Increase Streamer channel buffer 1 → 128

Used by KV watch, keys iterator, consumer list, object store list. Capacity of 1 meant the producer could never get ahead of the consumer.

Optimize Message owned conversion

TryFrom<async_nats::Message> was re-borrowing the owned value, forcing subject.to_string() allocations. Now uses into_string() for zero-copy extraction.

Use Arc<Py<PyAny>> for callback sharing

CallbackSubscription was acquiring the GIL per message just to clone_ref the callback. Replaced with Arc clone (atomic inc only).

…crease buffers

- IteratorSubscription: Replace Mutex+200ms polling with channel-based approach
  (spawned forwarder task + mpsc channel, same pattern as CallbackSubscription)
- JetStreamMessage: Remove RwLock from Acker (ack methods take &self)
- KeyValue: Remove RwLock from Store (all methods take &self)
- PullConsumer/PushConsumer: Remove RwLock+clone (use Arc directly)
- Stream: Remove RwLock (all used methods take &self)
- ConsumersManager: Remove RwLock (propagated from Stream)
- Counters: Remove RwLock (all used methods take &self)
- Streamer: Increase channel buffer from 1 to 128
- Message: Optimize owned conversion to avoid re-borrowing (use into_string)
- CallbackSubscription: Use Arc<Py<PyAny>> for callback to avoid GIL per message

Agent-Logs-Url: https://github.com/taskiq-python/natsrpy/sessions/67772642-1ef4-4cf1-a068-9da30d75c528

Co-authored-by: s3rius <18153319+s3rius@users.noreply.github.com>
@s3rius s3rius marked this pull request as ready for review March 27, 2026 16:07
@s3rius s3rius merged commit 8408451 into copilot/optimize-natspy-benchmarks Mar 27, 2026
@s3rius s3rius deleted the copilot/optimize-codebase-performance branch March 27, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants