refactor(cbindings): Thread-safe communication between the main thread and the Waku Thread #1978

Ivansete-status · 2023-08-31T15:25:14Z

Description

This PR is part of a PR suite aimed to follow Jacek's recommendations re thread-safe communication.

Concretely,

Start using ChannelSPSCSingle so that the main thread and the Waku Thread communicate safely.
The two threads communicate by sending each other a ptr whose memory is being allocated in the thread-shared space.
One thread allocates in transmission, and the other thread deallocates in reception.
A request-response communication takes place.

This change is motivated by the next comment:
#1865 (comment)

Changes

Stop using Channel type in favour of ChannelSPSCSingle ( https://github.com/status-im/nim-taskpools/tree/15e23ef1cf0860330dcc32f50fcce5f840031e28 )
Adding nim-taskpools as a new submodule/vendor ( https://github.com/status-im/nim-taskpools/tree/15e23ef1cf0860330dcc32f50fcce5f840031e28 )

Issue

Closes #1878

github-actions · 2023-08-31T15:38:44Z

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:1978

Built from eb1cd3a

library/waku_thread/inter_thread_communication/requests/node_lifecycle_request.nim

SionoiS · 2023-09-01T11:45:30Z

library/waku_thread/inter_thread_communication/waku_thread_request.nim

+
+  case request[].reqType
+    of LIFECYCLE:
+      waitFor cast[ptr NodeLifecycleRequest](request[].reqContent).process(node)


I'm thinking that mixing pointer and async could cause problems but IDK how async works in Nim so 🤷

I guess blocking is the right thing to do here. Is blocking on a async proc the same as a sync proc?

Well, afaik, the waitFor is "sync", and it keeps there making the dispatcher progress until the process is done. The reqContent pointer will only be deallocated by the process proc when it completes.

waitFor turns async code into sync code indeed - but a problem with this setup is that the async loop will only be running while waitFor is running - this means that it relies on a stead flow of requests in order to process networking buffers and other waku activity (timers, etc).

To solve this, process itself must be async and this should be turned into await process - the "main" loop then needs to call poll / runForever

To solve this, process itself must be async and this should be turned into await process - the "main" loop then needs to call poll / runForever

Thanks for the comment @arnetheduck!
The waitFor is aimed to make the dispatcher to progress so that the request can be handled.
That process proc is called in

nwaku/library/waku_thread/waku_thread.nim

Line 57 in ae48b46

let resultResponse = InterThreadRequest.process(request, addr node)

On the other hand, for periods of no requests (no calls to any libwaku function,) the dispatcher progresses thanks to directly calling poll in:

nwaku/library/waku_thread/waku_thread.nim

Line 65 in ae48b46

poll()

Kindly let me know if that's fine from your point of view :)

right - this runs the risk of getting stuck in poll because poll itself will block until there is activity, and if there is no activity it will simply block forever - in fact, I'm not sure how this works at all - it should get stuck there and never perform any loop iterations - I'm probably missing some other detail which wakes up poll but this looks like a significant risk with the setup.

There are two ways to solve this: introducing a timer / sleepAsync (waitFor sleepAsync(1.millis) instead of poll) or using a [ThreadSignal]. Introducing a timer is the easier way to make this code correct.

Using a signal has the advantage of using fewer resources, but is slightly more difficult to implement. I recommend leaving it for a separate PR - the way to use that is to create a signal for every channel (this is the way channels are normally implemented, with a "notification mechanism"), and every time an item is added to the channel, the signal is fired - here's a simple example: https://gist.github.com/arnetheduck/b6a7ac8f4b85490d26d464674e09d57d#file-threadsynctask-nim

"in general" I would look for ways / strive to avoid calling waitFor in "inner" proc's and rather structure the code in such a way that the waitFor happens only at the "outer" layer - the waitFor strategy is not incorrect, but it's unusual and runs the risk of accidentally ending up with nested waitFor calls (calling waitFor from an async function) which is not supported.

"in general" I would look for ways / strive to avoid calling waitFor in "inner" proc's...

Okay perfect. I enhanced that in my last commit.

right - this runs the risk of getting stuck in poll because poll itself will block...

Ok thanks! We'll apply waitFor sleepAsync(1) for now and will implement the enhancement with signal in further PRs. I think the poll() doesn't get stuck because the Relay protocol is continuously dispatching network events.

library/waku_thread/inter_thread_communication/requests/node_lifecycle_request.nim

arnetheduck · 2023-09-06T11:04:31Z

library/waku_thread/inter_thread_communication/requests/peer_manager_request.nim

+  var ret = cast[ptr PeerManagementRequest](
+                      allocShared0(sizeof(PeerManagementRequest)))
+  ret[].operation = op
+  ret[].peerMultiAddr = peerMultiAddr


string is a garbage-collected type - here, a copy must be taken with the shared allocator (and later it must be deallocated) - no garbage collected types allowed in objects constructed with create: https://status-im.github.io/nim-style-guide/interop.html#garbage-collected-types

(type needs changing to cstring)

Okay, this is the trickiest part.

I was assuming that a thread-safe communication was ensured by just sending ptr types, given that the ptr type is not tracked by the GC by default. I also wanted to avoid the overhead of serializing/parsing JSON objects.

My assumption was that the next was secure:

Thread A creates a ptr of the request in the shared space.

Thread A sends the address of the request object to Thread B.

Thread B handles the request and deallocates the memory from the shared space.

The next is used to communicate both threads:

reqChannel: ChannelSPSCSingle[ptr InterThreadRequest] respChannel: ChannelSPSCSingle[ptr InterThreadResponse]

The RelayRequest type is the most complex example:

type RelayRequest* = object operation: RelayMsgType pubsubTopic: PubsubTopic relayEventCallback: WakuRelayHandler message: WakuMessage

... which is created by Thread A in

nwaku/library/waku_thread/inter_thread_communication/requests/protocols/relay_request.nim

Line 34 in 78ebb3a

var ret = createShared(T)

... and deallocated by Thread B in:

nwaku/library/waku_thread/inter_thread_communication/requests/protocols/relay_request.nim

Line 44 in 78ebb3a

defer: deallocShared(self)

Isn't that a thread-safe approach to sending ptr types over?

peerMultiAddr is a string which is a garbage-collected type (together with seq, ref and closures). Both the "main object" and all its fields need to be thread safe, ie non-ref.

This means that we manually allocate a copy of everything on createShared and release it - I suggest implementing a destroyShared function for every createShared which deallocates all fields and finally the main object - in the future, these can be turned into proper destructors but they are good for now.

library/waku_thread/inter_thread_communication/requests/peer_manager_request.nim

…cator.

github-actions · 2023-09-06T14:59:12Z

You can find the experimental image built from this PR at

quay.io/wakuorg/nwaku-pr:1978-experimental

arnetheduck · 2023-09-07T06:47:24Z

library/waku_thread/waku_thread.nim

+  ## Waiting for the response
+  var response: ptr InterThreadResponse
+  var recvOk = ctx.respChannel.tryRecv(response)
+  while recvOk == false:


instead of this loop, a signal can be used here too

Okay!, we'll apply the signal enhancements in a future PR.

…cking

When two threads send data each other, that data cannot contain any GC'ed type (string, seq, ref, closures) at any level.

…hread

arnetheduck · 2023-09-13T10:55:22Z

library/waku_thread/waku_thread.nim

-    resp = ctx.respChannel.tryRecv()
-    os.sleep(1)
+  ## Sending the request
+  let sentOk = ctx.reqChannel.trySend(req)


can it happen that requests are answered out-of-order because of async?

can it happen that requests are answered out-of-order because of async?

Good point! I believe this couldn't happen as requests are attended sequentially, thanks to the waitFor in

nwaku/library/waku_thread/waku_thread.nim

Line 58 in 01793d2

waitFor InterThreadRequest.process(request, addr node)

On the other hand, once we will apply the "signal" approach we'll have a better synchronization between both threads.

arnetheduck · 2023-09-14T16:46:48Z

LGTM! looking forward to the next round

Ivansete-status self-assigned this Sep 1, 2023

Ivansete-status force-pushed the thread-safe-comms-libwaku branch from be942aa to f5e5754 Compare September 1, 2023 06:47

Thread-safe comms between main thread & Waku Thread - ChannelSPSCSingle

ae48b46

Ivansete-status force-pushed the thread-safe-comms-libwaku branch from f5e5754 to ae48b46 Compare September 1, 2023 06:54

Ivansete-status requested a review from arnetheduck September 1, 2023 06:55

Ivansete-status marked this pull request as ready for review September 1, 2023 06:55

Ivansete-status requested review from jm-clius, vpavlin and SionoiS September 1, 2023 06:55

SionoiS approved these changes Sep 1, 2023

View reviewed changes

arnetheduck reviewed Sep 6, 2023

View reviewed changes

library/waku_thread/inter_thread_communication/requests/node_lifecycle_request.nim Outdated Show resolved Hide resolved

arnetheduck reviewed Sep 6, 2023

View reviewed changes

library/waku_thread/inter_thread_communication/requests/node_lifecycle_request.nim Outdated Show resolved Hide resolved

arnetheduck reviewed Sep 6, 2023

View reviewed changes

library/waku_thread/inter_thread_communication/requests/peer_manager_request.nim Outdated Show resolved Hide resolved

Renaming procs from 'new' to 'createShared'. They use the shared allo…

5c09a9d

…cator.

peer_manager_request: no need to use ptr WakuNode

78ebb3a

arnetheduck reviewed Sep 7, 2023

View reviewed changes

Ivansete-status added 3 commits September 7, 2023 09:37

waku_thread: moving the 'waitFor' to upper layer

42ea81e

waku_thread: poll() -> waitFor sleepAsync(1) to avoid risk of blo…

b81076a

…cking

libwaku: thread-safe "sub-objects" in an inter-thread requests

5813847

When two threads send data each other, that data cannot contain any GC'ed type (string, seq, ref, closures) at any level.

Ivansete-status mentioned this pull request Sep 8, 2023

nwaku c-bindings (NodeJS + Python) #1332

Closed

9 tasks

Allocating the 'configJson' in main thread and deallocating in Waku T…

01793d2

…hread

arnetheduck reviewed Sep 13, 2023

View reviewed changes

arnetheduck approved these changes Sep 14, 2023

View reviewed changes

Ivansete-status merged commit 72f9066 into master Sep 18, 2023
16 checks passed

Ivansete-status deleted the thread-safe-comms-libwaku branch September 18, 2023 07:21

Ivansete-status mentioned this pull request Sep 18, 2023

refactor(cbindings): waku_thread.nim - Using 'ThreadSignalPtr' instead of loop to handle req/resp #2045

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(cbindings): Thread-safe communication between the main thread and the Waku Thread #1978

refactor(cbindings): Thread-safe communication between the main thread and the Waku Thread #1978

Ivansete-status commented Aug 31, 2023 •

edited by gabrielmer

github-actions bot commented Aug 31, 2023 •

edited

SionoiS Sep 1, 2023

Ivansete-status Sep 1, 2023

arnetheduck Sep 6, 2023

Ivansete-status Sep 6, 2023

arnetheduck Sep 7, 2023

arnetheduck Sep 7, 2023

Ivansete-status Sep 7, 2023

Ivansete-status Sep 7, 2023 •

edited

arnetheduck Sep 6, 2023

arnetheduck Sep 6, 2023

Ivansete-status Sep 6, 2023

arnetheduck Sep 7, 2023 •

edited

github-actions bot commented Sep 6, 2023

arnetheduck Sep 7, 2023

Ivansete-status Sep 7, 2023

arnetheduck Sep 13, 2023

Ivansete-status Sep 13, 2023

arnetheduck commented Sep 14, 2023

refactor(cbindings): Thread-safe communication between the main thread and the Waku Thread #1978

refactor(cbindings): Thread-safe communication between the main thread and the Waku Thread #1978

Conversation

Ivansete-status commented Aug 31, 2023 • edited by gabrielmer

Description

Changes

Issue

github-actions bot commented Aug 31, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivansete-status Sep 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck Sep 7, 2023 • edited

Choose a reason for hiding this comment

github-actions bot commented Sep 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck commented Sep 14, 2023

Ivansete-status commented Aug 31, 2023 •

edited by gabrielmer

github-actions bot commented Aug 31, 2023 •

edited

Ivansete-status Sep 7, 2023 •

edited

arnetheduck Sep 7, 2023 •

edited