-
Notifications
You must be signed in to change notification settings - Fork 8.3k
soc: nordic: add cache helpers #71278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hubertmis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested clarifications in API description.
But the issue requiring changes is the algorithm description in dmm.c. Current one could fail in some scenarios (race conditions).
soc/nordic/common/dmm.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since in dmm_dma_buffer_in_release you are also providing user_buffer then it seems that user_buffer is redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing is that dmm_dma_buffer_in_prepare needs to check whether the provided buffer has correct attributes (location, alignment) before performing dynamic allocation:
- if
user_bufferis in correct location, reachable by DMA of given device, and has correct attributes, then*buffer_in = user_buffer; - otherwise
*buffer_in = dynamically_allocate(user_length);
soc/nordic/common/dmm.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this function should not be split into alloc and prepare. Currently this api does not cover case when driver would like to allocated at compile time a buffer (e.g UART for polling out or for RX flushing) and then use this API only for cache management (if we want to use only dmm for cache management). alloc might be common for both directions.
For dynamic buffer it would look like
err = dmm_dma_buffer_alloc(dev, &dma_buffer, length);
if (err == 0) dmm_dma_buffer_out_prepare(dev, dma_buffer, user_buffer, length);
For statically allocated buffers
uint8_t dma_buffer[length] MEM_REGION(...);
// Prepare function just copies and performs cache writeback (if needed)
dmm_dma_buffer_out_prepare(dev, dma_buffer, user_buffer, length2); //length2 <= length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a next step dmm API should be extended with macros for static buffer allocation. These macros can be used by the drivers covering the scenario you described, or by the applications to improve the performance of the drivers.
If the buffers are allocated using the to-be-created dmm macros, they are guaranteed to be bypassed by dmm_dma_buffer_in/out_prepare() without dynamic buffer allocation and copying overhead.
I don't think the driver should have any indication if the buffer in use is to be bypassed or not by dmm. The driver should not track the qualities of the buffers, because it's dmm's responsibility. The driver should blindly call dmm_..._prepare() on any buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this api does not cover case when driver would like to allocated at compile time a buffer (e.g UART for polling out or for RX flushing) and then use this API only for cache management (if we want to use only dmm for cache management).
Actually this API covers that - if the buffer has correct attributes (location and cache alignment) then no dynamic allocation will happen and dmm will simply return what was provided to it
soc/nordic/common/dmm.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the device that will use DMA (e.g. UARTE) or mempool device that is associated with the DMA device through DT reference?
Imo, it cannot be the peripheral because there is no generic, runtime device API to get mempool reference from the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I wanted to use reference to device that will use DMA, but if we cannot obtain mempool associated with it in runtime then I guess it must be mempool device provided directly by the context calling dmm (i.e. UARTE driver). It this feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The device drivers must be generic: they must work correctly with the compatible peripherals regardless of how they are integrated into the system (which memory is DMAble by them, what are properties of this memory). The integration information should be included in the device tree properties of the compatible node and retrieved by the device driver. The device driver should use only the node it instantiates.
If a device tree node misses a property like mempool, then maybe the driver should not use dmm at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch my previous message. I misunderstood the proposal.
A device driver X uses a reference to a device tree node it instantiates (N), with no changes compared to the current device drivers.
What we would modify is to add a property M to node N being a reference to a mempool. The device driver X would retrieve property M from node N and use const struct dev* (D) represented by M. When X calls dmm functions it would use D as the dev argument.
If I correctly described the proposal, then it's feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see another issue here. In dmm.c _prepare API we need to check whether the provided buffer is already in correct memory region and if not, use mempool device to dynamically allocate new buffer.
Does it mean we actually need to provide two devices to _prepare API:
- reference to RAM region node, associated with specified peripheral node (to check buffer location),
- reference to mempool device (to perform dynamic allocation if buffer is in incorrect location) ?
If yes, I am wondering if these two could be somehow linked, so we provide single device instead of two.
Imo, it cannot be the peripheral because there is no generic, runtime device API to get mempool reference from the device.
@nordic-krch would be it feasible to add such runtime device API? I believe it would improve the abstraction and also solve this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we actually need to provide two devices to
_prepareAPI
Good point. I don't think a device tree node should have an indication of software configuration like if a given memory region contains a pool for dynamically allocated memory. The device tree node should describe the hardware properties like which memory region is DMA-able by this peripheral. Something like:
ram30: memory@0x... {
properties = NON_CACHEABLE;
};
uart130 {
compatible = "nordic,...";
memory = <&ram30 &ram31>;
};
And dmm should figure out internally which allocator to use to correctly allocate a buffer in memory pointed by &uart130.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated PR - now dmm accepts memory region address pointer as first parameter. Device drivers (and other components) can easily obtain this value by peeking at devicetree property. dmm has internal list of regions and finds the one matching the one provided by the user of the API
|
I've been experimenting with cache, mvdma and access time to RAM (in particular RAM3 used for slow peripherals). I've noticed that access to RAM3 is very slow. On average it takes ~0.37us to transfer a byte from global RAM (cacheable) to RAM3 and ~0.3us to transfer from RAM3 to global RAM. Dependency is linear so memcpy of 100 bytes takes 30-37us (100 cycles per byte, all with DCACHE enabled). It is a lot of time. On the other hand, setting up a MVDMA transfer takes ~4us (it's mainly writing back data and job descriptors from DCACHE). MVDMA transfer takes approximately same time as memcpy. Given that, I think that it might be good to have asynchronous API as well. Something like: Depending on the threshold user handler would be called directly from the context of |
|
@nordic-krch alternatively we could make the My understanding is that adding
|
5fb3132 to
50c31e3
Compare
soc/nordic/common/dmm.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comment - now I see that this _dma_ part could be removed from all of the APIs, as it feels to be redundant
I don't know the details of the requirement to manage DMA buffers directly from ISRs. I would expect ISRs to be short and avoid BTW, for MVDMA the buffers must be prepared as well, potentially copied to the correctly aligned and padded cacheable buffers. If we implemented MVDMA usage instead of |
|
We already have performance issues due to peripheral register access and buffer management so imo having an option to use DMA and reduce CPU is a plus even if there will be cases when it is not used. Deferring to thread context takes additional amount of time (posting a semaphore is few us) and we already have asynchronous API in UART where next buffer is requested from the interrupt context and there is rtio allows asynchronous i2c and spi transfers (not yet supported by nordic).
I think that at some point zephyr will get
Sorry, I didn't get where would recursive execution would happen. |
Please take a look at the pseudo-code for
If MVDMA driver would use
I think simply adding |
I don't think this will happen due to performance reasons. Let's discuss an example: We can get a network frame through Bluetooth or Wi-Fi radio. The network frame is stored in network buffers. A part of the network frame structure is payload which we could want to transmit using I2S, UART, or whatever. This payload in network buffers is not aligned to anything DMA-related. In Linux, a user-space application would take the network frame, parse it, and send it to I2S or UART driver. This would require data to cross kernel/user space two times, copying the buffers, and during each copy, the buffer could get aligned with But in Zephyr in MCUs, we prefer a zero-copy model, so if that's possible I2S should DMA data out directly from the network buffers. Only in architectures in which that's not possible, the data should be copied. I can imagine application code like this: As you can see, nothing in this code gets aligned by the application. The network buffer is aligned to the Bluetooth or Wi-Fi's DMA. The payload is potentially moved and aligned to I2S DMA by the I2S driver. This data movement would happen in nRF54H, but never in nRF52 or nRF54L. But I agree we can add the |
That's the example where it might e.g. sensor, shell transport having a static buffer for transfering data can be tagged. |
Sure, there are examples in which buffers can be tagged by the application. There is an idea to extend the dmm API with such tags. But we need to be prepared for tag-less buffers as well because there are also scenarios in which tags won't be used. |
|
@hubertmis @nordic-krch |
I understood tagging as adding preprocessor macros ensuring DMAbility of the buffers like: The tag would expand to setting a linker section contained in RAM achievable by DMA, and to alignment/padding parameters if caching requires them. |
This attribute denotes that DMA operation can be performed from a given region. Signed-off-by: Nikodem Kastelik <nikodem.kastelik@nordicsemi.no>
DMM stands for Device Memory Management and its role is to streamline the process of allocating DMA buffer in correct memory region and managing the data cache. Signed-off-by: Nikodem Kastelik <nikodem.kastelik@nordicsemi.no>
Added tests verify output and input buffers allocation using dmm component. Signed-off-by: Nikodem Kastelik <nikodem.kastelik@nordicsemi.no>
hubertmis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted potential memory leaks in the spi driver. Other than that looks good!
|
I have dropped SPI commits from this PR - I will add them in separate PR |
nordic-krch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments which are not blocking.
(Simplified PR description from original proposal, since the scope is vastly reduced)
PR adds cache helpers specific to nRF devices that streamline the process of allocating DMA buffer in correct memory region and managing the data cache on nRF54H20.