Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uart_mcux_lpuart: Enable Asynchronous UART API. #42750

Merged

Conversation

NickolasLapp
Copy link
Contributor

This PR enables the Ansynchronous UART API for using the MCUX drivers.
It is tested on the RT1062EVKB. In order to make this functionality
possible, the MCUX EDMA driver is also exteneded to enable the
dma_reload functionality.

Signed-off-by: Nickolas Lapp nickolaslapp@gmail.com

Copy link
Collaborator

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks ok to me, there's some stuff I'd like to see comments/changes on before my +1. The serial driver changes are large so I'm leaving it up to others to review those as I'm not as familiar with it.

@@ -33,16 +33,42 @@ struct dma_mcux_edma_config {
void (*irq_config_func)(const struct device *dev);
};

#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this could be rearranged somehow to create a define for an attribute that might be nicer. It's a bit hard to grok at first glance.

The end goal would be removing the repeated declarations of tcdpool here with one that looks more like...

static __aligned(32) EDMA_TCD_ATTR tcdpool[...];

Where your ifdefs instead define the EDMA_TCD_ATTR as needed (nothing, __nocache, __dtcm_noinit_section, etc) to use perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is better! Will make the change.

static __aligned(32) __nocache edma_tcd_t
tcdpool[DT_INST_PROP(0, dma_channels)][CONFIG_DMA_TCD_QUEUE_SIZE];
#else
#warning tcdpool may be located in cacheable memory which can cause EDMA issues
Copy link
Collaborator

@teburd teburd Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no cache flush/invalidate for imxrt? I imagine there is. Either it works and there's cache flush/invalidation work done in the driver to ensure it works, or it should be disallowed if it doesn't would be my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. using cacheable memory should not be allowed if the driver doesn't maintain cache coherency. I think it is best to force the tcdpool in non-cacheable memory, and this should be a #error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--I was concerned about breaking the ability to compile projects which had used this in the past but did not have proper cache setup. I agree an error is more appropriate here so I will make this change.

/* Clear the done flag before the callback so a user can optionally
* install (reload) the next handle during the callback if desired
*/
if (transfer_done) {
Copy link
Collaborator

@teburd teburd Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an edma expert, but I'm wondering if there's more transfer descriptors to do in the tcd, will marking the current transfer as done like this cause the hardware to move to the next one before the callback is done? That seems to go against the prior intent here and could be problematic potentially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does break my code, and DONE bit should not be cleared before the callback. I am updating an I2S driver to use scatter/gather mode of eDMA, and also added dma_reload() to the eDMA driver (I am working now to merge in your updates before submitting my PR). channel_irq() in this file is based on EDMA_HandleIRQ() from SDK eDMA driver, and we should be careful about diverging from the SDK driver. The SDK driver clears the DONE bit after the callback for a specific reason.

When using scatter/gather mode, EDMA_SubmitTransfer() has some logic for the coherency check when TCD registers are updated. That logic depends on the DONE bit set in some cases. The I2S test case I am working on is one of these cases, and the SubmitTransfer() does not work properly with this update where the DONE bit is cleared before the callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of the UART ASYNC driver, the callback will trigger a call to DMA reload when the currently used buffer is full, and the callback requests from the user, via: UART_RX_BUF_REQUEST

If dma_reload is called when the done bit is set (as a result of uart_rx_buf_rsp) then the DMA transfer is not restarted and the ping-pong transfer stops. This problem was mitigated by clearing the DONE bit before calling into the user callback.

I'll make the change back now to clear the done bit where as done in the SDK driver, but will have to look at how to correctly modify the UART driver so that transfers continue in this case.

switch (config->channel_direction) {
case MEMORY_TO_MEMORY:
case MEMORY_TO_MEMORY: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These brackets in this switch case aren't necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought uncrustify added these for me, but I will remove for now.

Copy link
Collaborator

@teburd teburd Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncrustify does some unfortunate things. I know the docs still mention it. Please do try the newer clang format setup if you are interested in doing auto formatting like me. See #41307

LOG_ERR("not support transfer direction");
return -EINVAL;
}
}

if (!data_size_valid(config->source_data_size)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice cleanup of this check

@mmahadevan108
Copy link
Collaborator

@NickolasLapp Thank you for the contribution. Can you please share how this feature was tested.

@@ -33,16 +33,42 @@ struct dma_mcux_edma_config {
void (*irq_config_func)(const struct device *dev);
};

#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to change the name of this Kconfig symbol, to help avoid confusion. With the eDMA, there are 2 items concerned with cache coherency: the buffers that DMA read/write data to/from, plus the Transfer Control Descriptors (TCDs). For the lpuart driver, the buffers appear to be allocated by the app, and passed to the driver in mcux_lpuart_rx_enable() and mcux_lpuart_tx(). This symbol controls where the TCDs are placed, not the buffers.

Suggested change
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_DESCRIPTORS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will make the change!

@@ -26,4 +26,11 @@ config DMA_MCUX_TEST_SLOT_START
help
test slot start num

config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in dma_mcux_edma.c

Suggested change
config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
config DMA_MCUX_USE_DTCM_FOR_DMA_DESCRIPTORS

@@ -26,4 +26,11 @@ config DMA_MCUX_TEST_SLOT_START
help
test slot start num

config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER
bool "Use DTCM for DMA buffers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool "Use DTCM for DMA buffers"
bool "Use DTCM for DMA descriptors"

bool "Use DTCM for DMA buffers"
default y
help
When this option is activated, the buffers for DMA transfer are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When this option is activated, the buffers for DMA transfer are
When this option is activated, the descriptors for DMA transfer are

Copy link
Contributor

@DerekSnell DerekSnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing all these features. Clearing the eDMA DONE bit in the ISR must be done after the callback. Let me know if I can help with your code to work with that change. Other than that change, the rest of my feedback are comments.

/* Clear the done flag before the callback so a user can optionally
* install (reload) the next handle during the callback if desired
*/
if (transfer_done) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does break my code, and DONE bit should not be cleared before the callback. I am updating an I2S driver to use scatter/gather mode of eDMA, and also added dma_reload() to the eDMA driver (I am working now to merge in your updates before submitting my PR). channel_irq() in this file is based on EDMA_HandleIRQ() from SDK eDMA driver, and we should be careful about diverging from the SDK driver. The SDK driver clears the DONE bit after the callback for a specific reason.

When using scatter/gather mode, EDMA_SubmitTransfer() has some logic for the coherency check when TCD registers are updated. That logic depends on the DONE bit set in some cases. The I2S test case I am working on is one of these cases, and the SubmitTransfer() does not work properly with this update where the DONE bit is cleared before the callback.

@dleach02
Copy link
Member

dleach02 commented Feb 22, 2022

@NickolasLapp thank you for your contribution.

@DerekSnell and I have been discussing the cache coherency changes you put into the UART driver when interfacing with the EDMA. I'm not sure that the proper place for those changes are in the UART driver as it seems like they should be put down into the EDMA driver. My thinking is that multiple stacks use the EDMA driver and would potentially have the same coherency issues and if we centralized the code in the edma driver all of the other drivers would get the same benefit. Otherwise, the code you added to the UART driver would need to be duplicated to the other drivers.

Thoughts?

@NickolasLapp
Copy link
Contributor Author

Sorry all about the delay in getting back to your feedback. I got caught up in some other work things and just now got back to this. I appreciate the thorough review and will make the changes as requested

@NickolasLapp thank you for your contribution.

@DerekSnell and I have been discussing the cache coherency changes you put into the UART driver when interfacing with the EDMA. I'm not sure that the proper place for those changes are in the UART driver as it seems like they should be put down into the EDMA driver. My thinking is that multiple stacks use the EDMA driver and would potentially have the same coherency issues and if we centralized the code in the edma driver all of the other drivers would get the same benefit. Otherwise, the code you added to the UART driver would need to be duplicated to the other drivers.

Thoughts?

I agree that it would be preferable to include the cache clean / cache invalidate operations in the dma_mcux_edma.c instead of in the uart driver. I will put up a commit to move that logic from the UART driver into the EDMA driver.

@NickolasLapp
Copy link
Contributor Author

@NickolasLapp Thank you for the contribution. Can you please share how this feature was tested.

@mmahadevan108 I tested this on an RT1062EVKB. We have a zephyr project which is based closely on the RT1062EVK project in zephyr. I have 2 UART interfaces connected (lpuart0 and lpuart3) and I passed data with checksum back and forth between the two UARTs, confirming the correct data was received.

@NickolasLapp
Copy link
Contributor Author

@dleach02 and @DerekSnell , I was thinking a bit about this comment:

I'm not sure that the proper place for those changes are in the UART driver as it seems like they should be put down into the EDMA driver. My thinking is that multiple stacks use the EDMA driver and would potentially have the same coherency issues and if we centralized the code in the edma driver all of the other drivers would get the same benefit. Otherwise, the code you added to the UART driver would need to be duplicated to the other drivers.

and remembered why I had not done this in the first place. My concern is that the dcache operations have to operate on a full cache line (32 bytes for the processor I'm working on), and so we round up the length for invalidate / clean operations as appropriate to ensure we always pass a multiple of the DCACHE line length to the DCACHE_xxx operations.

This could be an issue, however, if the buffers passed to the EDMA module are not padded / aligned to the appropriate value. For example, if an end user did something like:

struct my_data {
   uint8_t data[16];
   double some_important_double;
}

and passed my_data.data as the rx buffer to the edma engine, some_important_double could get improperly invalidated (assuming it's only been written to cache, and the value hasn't been written back to main memory) in the call to DCACHE_Invalidate because we invalidate the full cache line which could include that extra value.

I think this is probably a fine tradeoff given that the alternative is the data that's received is possibly not correct because the cache operations haven't been performed after the DMA rx, but this is definitely non-obvious behavior and not documented in the zephyr DMA page (which makes no mention of aligning the DMA buffers).

I still think this functionality does belong inside the EDMA driver itself, but I'm not sure in general what the right behavior here is.

/* Invoke callback function. */
if (handle->callback != NULL) {
(handle->callback)(handle, handle->userData,
transfer_done, tcds_done);
}

if (transfer_done) {
handle->base->CDNE = handle->channel;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back through and figured out why it was an issue to call EDMA_SubmitTransfer() from within the callback ISR context.

If you look at fsl_edma.c, considering the case of a ping-pong buffer, we end up in this block:
https://github.com/NXPmicro/mcux-sdk/blob/79cdc5799191a7411479a2617e28a191f4383276/drivers/edma/fsl_edma.c#L1176-L1207

In this block, we've correctly loaded the next DMA descriptor to point to, and now we are trying to set the ESG bit for the active DMA descriptor to enable it to continue on into the reloaded buffer. However, because we haven't cleared the DONE bit yet, writes to ESG are disabled and so it looks to the driver like the previous transfer finished (see the comment here).

I get around this in the PR that I pushed where I just delay reloading the DMA reload until we're out of ISR context in the case that dma_reload is called from w/in ISR, but this feels a bit ugly to me. Happy to take suggestions if you have thoughts about a better way to handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NickolasLapp , you are right about this. I have been working on my PR #43227 and ran into the same issue with eDMA. I learned that EDMA_SubmitTransfer() was not intended to be called within the eDMA callback. Working with the NXP SDK team to enable this, their solution was similar to yours, and clears the DONE bit in the ISR before the callback. I have submitted their updated driver in the hal_nxp PR fsl_edma driver fix scatter/gather mode. This fixes the issue I had, and I think it will work for you as well.

Our PRs overlap with the DMA changes, so I have been waiting for yours to merge before rebasing and submitting mine. One change we need to make to dma_mcux_edma.c is call the IRQ handler in fsl_edma.c, rather than the copy channel_irq(). This will help keep Zephyr's eDMA driver in sync with the SDK driver. I have this change in my PR. But if you want to resolve this DONE bit problem in your PR, then you can first use the attached patch drivers-dma-mcux_edma-usr-ISR-from-NXP-SDK.zip to update the IRQ handler in dma_mcux_edma.c, and then point the west.yml to hal_nxp PR. Or I am happy to handle this in my PR if you want :)

Thanks again for your contributions and the thought you have put into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I think I've got this correctly updated to use the changes from the hal_nxp PR (#144), and to use the HandleIRQ function from within the driver itself instead of the custom channel_irq. I tested with the driver fixes on my board and all seems to work as intended.

@github-actions github-actions bot added the area: Tests Issues related to a particular existing or missing test label Mar 1, 2022
@NickolasLapp
Copy link
Contributor Author

Thanks for the hint at @danieldegrasse! I was able to get this working with your tips.
(Especially cool is the chosen-> zephyr,sram = dtcm, I was using SCB_DisableDCace() to the same effect, but doing this via overlay file is definitely cleaner).

After getting the external loopback cable connected and resolving the cache issues, this unit tests exposed a couple bugs in the serial implementation, as well as a few functional differences to the nominal serial drivers as written for the unit tests. I fixed these up so that all unit tests are now passing and I get the desired: PROJECT EXECUTION SUCCESSFUL at the end of running.

Thanks a lot for your help, and please let me know if there's anything else you notice that should be tested/fixed!

@NickolasLapp
Copy link
Contributor Author

NickolasLapp commented Mar 14, 2022

@danieldegrasse looks like this PR is failing the twister unit test build because some of the RT parts don't have the DMA cells/channels defined for enabled UARTs.
After looking through a couple of the datasheets, I think the DMA channel assignments might be the same for at least LPUART1 and 3 across the NXP parts, and the DMA cell and channel assignments could be added directly to the nxp base .dtsi: https://github.com/zephyrproject-rtos/zephyr/blob/main/dts/arm/nxp/nxp_rt.dtsi.
It looks like this is what was done, at least for sai1, sai2 and sai3.

I can't be sure without checking all of the applicable reference manuals, however, that the dma mux channel assignments are the same. Additionally, I have no way of testing that these targets actually work since I don't have the applicable hardware.

What are your thoughts on best way forward here?

@danieldegrasse
Copy link
Collaborator

@NickolasLapp I've verified your latest push on the RT1060 EVK, and it works for me as well. Thank you for getting the implementation updated to pass that test.

What are your thoughts on best way forward here?

Given that this PR effectively enables the asynchronous UART API for all in tree boards with an LPUART peripheral, we'd like to run twister against it using our internal test range. To do that all boards that contain LPUART peripherals will need to have DMA definitions added and the correct pinmux settings applied.

What I was considering was opening a separate PR with the required DTS changes and pinmux settings for all boards, that you can base your PR off of. That PR could be merged into main without breaking the UART driver, since it would only be DTS and pinmux changes. From there, this PR would be updated with those changes, and then we would run CI against this PR. Is that ok with you? I appreciate your contribution, and want to keep your workload beyond supporting the RT1060 EVK as low as possible.

@NickolasLapp
Copy link
Contributor Author

@danieldegrasse That works great for me. I imagine it should be relatively quick to merge through the edma channel / pinmux settings, and then I can rebase as soon as that lands. Please just post here and let me know when the PR is available with these changes and I can rebase as appropriate.

@danieldegrasse
Copy link
Collaborator

@NickolasLapp Just wanted to give you an update- I'm finished the pinmux settings, but during testing I ran into issues with the async API on the RT1170 EVK, and I'm debugging those. Will update you once I've narrowed down the issue.

@danieldegrasse
Copy link
Collaborator

@NickolasLapp I've figured out a solution for the RT11xx series boards- moving buffers to noncacheable OCRAM results in the test passing. I've opened #43885 to enable that functionality, and provide appropriate pinmuxes for all RT series boards that should be tested with the uart async api. I'll update you once that PR merges, and we can proceed from there.

@danieldegrasse
Copy link
Collaborator

@NickolasLapp FYI- #43885 is merged, so this PR should be ready for testing once it is rebased against the latest main.

@NickolasLapp
Copy link
Contributor Author

Thanks for the update @danieldegrasse .
I rebased and fixed the merge conflicts, then re-ran the test suite as well as my local board tests and confirmed everything passed:

Running test suite uart_async_test
===================================================================
START - test_single_read_setup
 PASS - test_single_read_setup in 0.1 seconds
===================================================================
START - test_single_read
E: No buffers to release from RX DMA!
 PASS - test_single_read in 0.356 seconds
===================================================================
START - test_multiple_rx_enable_setup
 PASS - test_multiple_rx_enable_setup in 0.1 seconds
===================================================================
START - test_multiple_rx_enable
E: No buffers to release from RX DMA!
E: No buffers to release from RX DMA!
 PASS - test_multiple_rx_enable in 0.710 seconds
===================================================================
START - test_chained_read_setup
 PASS - test_chained_read_setup in 0.1 seconds
===================================================================
START - test_chained_read
E: No buffers to release from RX DMA!
 PASS - test_chained_read in 0.221 seconds
===================================================================
START - test_double_buffer_setup
 PASS - test_double_buffer_setup in 0.1 seconds
===================================================================
START - test_double_buffer
 PASS - test_double_buffer in 3.403 seconds
===================================================================
START - test_read_abort_setup
 PASS - test_read_abort_setup in 0.1 seconds
===================================================================
START - test_read_abort
 PASS - test_read_abort in 1.110 seconds
===================================================================
START - test_chained_write_setup
 PASS - test_chained_write_setup in 0.1 seconds
===================================================================
START - test_chained_write
E: No buffers to release from RX DMA!
E: No buffers to release from RX DMA!
 PASS - test_chained_write in 0.9 seconds
===================================================================
START - test_long_buffers_setup
 PASS - test_long_buffers_setup in 0.1 seconds
===================================================================
START - test_long_buffers
 PASS - test_long_buffers in 0.152 seconds
===================================================================
START - test_write_abort_setup
 PASS - test_write_abort_setup in 0.1 seconds
===================================================================
START - test_write_abort
 PASS - test_write_abort in 0.102 seconds
===================================================================
START - test_forever_timeout_setup
 PASS - test_forever_timeout_setup in 0.1 seconds
===================================================================
START - test_forever_timeout
E: No buffers to release from RX DMA!
E: No buffers to release from RX DMA!
 PASS - test_forever_timeout in 3.5 seconds
===================================================================
Test suite uart_async_test succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL

Looking forward to getting this merged finally! :)

@danieldegrasse
Copy link
Collaborator

@NickolasLapp I've tested this PR on the RT1170, RT1064, RT1050, and RT1024 EVKs, and it passes on all of them. Can you squash your commits appropriately so that they will pass CI?

@danieldegrasse danieldegrasse removed the DNM This PR should not be merged (Do Not Merge) label Apr 8, 2022
@NickolasLapp
Copy link
Contributor Author

Squashed and pushed @danieldegrasse ! Thanks for testing

@danieldegrasse
Copy link
Collaborator

@NickolasLapp just a heads up: you can check for issues like trailing whitespace using the checkpatch script, see here. Just don't want you to have to wait on the compliance check report from CI every time.

Also, can you split the changes to the EDMA driver into a separate commit? This will make traceability easier, since the changes within the EDMA driver are intended to properly support scatter/gather mode, which is required for this PR but a separate enhancement from enabling the UART async API itself.

This PR fixes up the Scatter-Gather EDMA mode for the MCUX EDMA Driver,
as well as enabling the dma reload feature for the same EDMA Driver.

Signed-off-by: Nickolas Lapp <nickolaslapp@gmail.com>
This PR enables the Ansynchronous UART API for using the MCUX drivers.
It is tested on the RT1062EVKB.

Signed-off-by: Nickolas Lapp <nickolaslapp@gmail.com>
@NickolasLapp
Copy link
Contributor Author

Got it. 🤞 that it runs now and all tests pass. Checkpatch passes locally:

- total: 0 errors, 0 warnings, 1235 lines checked
Your patch has no obvious style problems and is ready for submission.

Copy link
Contributor

@DerekSnell DerekSnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks for all your hard work, @NickolasLapp !

@dleach02 dleach02 merged commit f1b0b45 into zephyrproject-rtos:main Apr 14, 2022
@NickolasLapp NickolasLapp deleted the nlapp-add-uart-async-api-nxp branch April 14, 2022 20:57
NickolasLapp pushed a commit to NickolasLapp/zephyr that referenced this pull request Apr 15, 2022
NickolasLapp pushed a commit to NickolasLapp/zephyr that referenced this pull request Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Boards area: Devicetree area: DMA Direct Memory Access area: Modules area: Tests Issues related to a particular existing or missing test area: UART Universal Asynchronous Receiver-Transmitter manifest manifest-hal_nxp platform: NXP NXP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants