Enable RX Fan-In #3

quetric · 2021-09-21T10:00:18Z

Currently the receive pipeline supports fan-in == 1 and all collectives are ring-based as a result. Adding support for fan-in > 1 would enable tree collectives.

DanieleParravicini · 2021-09-27T12:17:57Z

As the image here shows the rx datapath provide a notification AXIS.
This is how it works

when some bytes arrives at the stack and they are stored in the buffer (window) ready to be read by the application (in our case the CCLO). the stack sends a session ID and the amount of bytes ready.
the user kernel (CCLO in this case) replies providing the session id and the number of bytes that he wants to read.
As a side note, the high level message that we have now carries at bits 32-63 the total number of bytes of the message.
What I propose to get past the RX FAN IN =1 is the following in the easy case.
The depacketizer waits for notifications from the stack.
When the first arrives the depacketizer gets the first bytes (which includes the header) and discovers the amount of bytes to be read.
It dequeue stack notifications to understand when new data have been received.
All data coming from different sessions are kept aside and they wait.
All data from the first active session are consumed up until last byte.
Then we move to next session.

We need to:

avoid that we run out of memory in the network stack. Is data saved in DDR/HBM at the moment?
(related to previous) avoid limiting number of ranks we can support
avoid that this approach does leads to a deadlock (imagine you have to sum 2 buffers of 32 MB each coming from two external FPGAs. You can receive 1 MB each and sum them to create the resulting 32 MB result. If we receive 32MB from one and 32 from the other we may run out of space in the CCLO staging area (spare buffer))
ensure that we handle properly the end of the message. (e.g. message of 5 MB. The stack notifies each time that 4MB arrives. The depacketizer needs to respect the boundaries and fetch only missing 1MB after first 4MB read. this because:
a. we need to be fair and serve other sessions
b. avoid mixing data coming from different sessions

DanieleParravicini · 2021-09-27T12:18:18Z

I can think of a FSM for that if you want

quetric · 2022-02-23T17:25:26Z

Cloding, feature implemented in dev

quetric added the enhancement New feature or request label Sep 21, 2021

quetric mentioned this issue Oct 6, 2021

Multiple collectives overlapping #16

Closed

quetric self-assigned this Dec 7, 2021

quetric closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable RX Fan-In #3

Enable RX Fan-In #3

quetric commented Sep 21, 2021

DanieleParravicini commented Sep 27, 2021

DanieleParravicini commented Sep 27, 2021

quetric commented Feb 23, 2022

Enable RX Fan-In #3

Enable RX Fan-In #3

Comments

quetric commented Sep 21, 2021

DanieleParravicini commented Sep 27, 2021

DanieleParravicini commented Sep 27, 2021

quetric commented Feb 23, 2022