Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB CDC ACM UART driver's interrupt-driven API hangs when no host is connected #53319

Closed
xyzzy42 opened this issue Dec 23, 2022 · 9 comments
Closed
Assignees
Labels
area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale

Comments

@xyzzy42
Copy link
Contributor

xyzzy42 commented Dec 23, 2022

Describe the bug
If there is no host connected, either there never was or it has disconnected, the CDC ACM UART driver will stop accepting input when using the interrupt-driven API.

The CDC ACM UART does not do this when using the UART polling API. If no host is connected, the polling API will drop transmitted data. It is only the CDC ACM UART with the interrupt-driven API that has this behavior.

Other UART drivers, which do not even have the concept of a connected host, do not block if there is no remote device to receive data. Any code written for a generic UART won't correctly handle this behavior of the CDC ACM UART.

To Reproduce
Steps to reproduce the behavior:

  1. Use the CDC ACM USB sample with this simple patch applied to produce some output to the CDC ACM UART without needing a connected host to supply input to be echoed back.
    0001-Add-output-to-cdc-acm-sample.patch.gz
  2. Connect to the CDC ACM UART to pass the sample app's DTR check.
  3. Disconnect to host. E.g., close minicom.

After disconnecting, the next ping will enable tx on the uart. The interrupt handler will be called once, call uart_fifo_fill() and not disable TX. The interrupt handler should be called again, but it is not. The app is basically hung in uart TX and will wait forever for the data to be sent to a USB host.

Expected behavior
The interrupt handler will continue to be called as long as it is enabled and more data is supplied to the FIFO.

Impact
Anything outputting data to a CDC ACM UART will hang if the USB host disconnects. Other UARTs don't behave this way.

Logs and console output

[00:00:11.480,590] <inf> cdc_acm_echo: Sending ping to uart
[00:00:11.480,621] <dbg> cdc_acm_echo: interrupt_handler: Interrupt handler called
[00:00:11.480,651] <dbg> cdc_acm_echo: interrupt_handler: ringbuf -> tty fifo 6 bytes
[00:00:11.480,804] <dbg> cdc_acm_echo: interrupt_handler: Interrupt handler called
[00:00:11.480,834] <dbg> cdc_acm_echo: interrupt_handler: Ring buffer empty, disable TX IRQ
[00:00:12.480,926] <inf> cdc_acm_echo: Sending ping to uart
[00:00:12.480,957] <dbg> cdc_acm_echo: interrupt_handler: Interrupt handler called
[00:00:12.480,987] <dbg> cdc_acm_echo: interrupt_handler: ringbuf -> tty fifo 6 bytes
[00:00:13.481,140] <inf> cdc_acm_echo: Sending ping to uart
[00:00:14.481,231] <inf> cdc_acm_echo: Sending ping to uart

The 1st ping, at 11.480, has a USB host connected and completes. The interrupt handler is called, data is sent to the UART, then the interrupt handler is called again and the ring buffer is now empty and TX is stopped.

The 2nd ping, at 12.480, is after the host disconnected. The interrupt handler is called, data is sent to the UART, and then we hang. The interrupt handler is never called again.

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Gcc 10.3.1
  • zephyr-v3.2.0-2637-gde1bd1fa04

Additional context
An obvious fix is to cdc_acm_fifo_fill() drop data immediately. But consider how the interrupt handler in the sample will call fifo_fill in a while loop until either it runs out of data or the UART stops accepting more. If the amount of data to send was very large or infinite, it could loop in the interrupt handler for a long time or forever. This is generally bad. On a real UART, eventually the hardware FIFO would fill and the irq handler would be throttled to the baud rate.

So it may be better to have cdc_acm_fifo_fill() add data to the queue with a 1ms delay, as is now done for the poll mode API, see PR #53003. The interrupt handler will be throttled if it fills the ring buffer during that 1ms delay. This also allows multiple calls to fifo_fill to be merged into a single USB URB, greatly increasing throughput if the data was provided in small chunks.

@xyzzy42 xyzzy42 added the bug The issue is a bug, or the PR is fixing a bug label Dec 23, 2022
@carlescufi carlescufi added the area: USB Universal Serial Bus label Dec 23, 2022
@desowin
Copy link

desowin commented Jan 2, 2023

Other UART drivers, which do not even have the concept of a connected host, do not block if there is no remote device to receive data. Any code written for a generic UART won't correctly handle this behavior of the CDC ACM UART.

What if there is hardware flow control? Wouldn't plain UART block when the FIFO fills and the CTS is inactive?

So it may be better to have cdc_acm_fifo_fill() add data to the queue with a 1ms delay, as is now done for the poll mode API, see PR #53003. The interrupt handler will be throttled if it fills the ring buffer during that 1ms delay. This also allows multiple calls to fifo_fill to be merged into a single USB URB, greatly increasing throughput if the data was provided in small chunks.

Relying on the throttling sounds like bad application design, however delayable work items are inherently throttled by reschedule points (won't execute until scheduler is invoked). While k_work_schedule_for_queue documentation mentions that K_NO_WAIT is equivalent to k_work_submit_to_queue, there are scheduling differences between the two. I am not sure if the documentation is wrong or the implementation, of it is way too low-level difference (@andyross what do you think?). The difference boils down to the fact k_work_submit_to_queue is effectively submit_to_queue_locked followed by z_reschedule_unlocked while k_work_schedule_for_queue called with K_NO_WAIT is only submit_to_queue_locked.

@xyzzy42
Copy link
Contributor Author

xyzzy42 commented Jan 2, 2023

What if there is hardware flow control? Wouldn't plain UART block when the FIFO fills and the CTS is inactive?

It would depend on hardware design. Without pull-ups or pull-downs, which are not standard on RS-232 control lines, CTS would be floating. So it would randomly transmit or not.

In cases of an external port with flow-control (which is very rare nowadays), usually the designer will have chosen to add a pull-up or pull-down to CTS. It could be either way, based on what the design calls for. A pull-down would cause the uart to always transmit with no external device connected. A pull-up would halt transmission.

I haven't seen RTS/CTS used outside of on-board connections to a BT HCI UART for years now. It would be by far the most common to see code using a UART to assume no control flow that will detect a connected host.

Relying on the throttling sounds like bad application design, however delayable work items are inherently throttled by

Suppose the FW wants to send a lot of data through the UART as fast as possible. Most CPUs are now more than capable of saturating a UART at 115.2 kbaud or more. So they write a loop in the interrupt callback that writes until uart_irq_tx_ready() returns false. Which is after all how the examples do it. Isn't this how one would normally write code to send data a UART as fast as it can accept it.

This works fine for any normal UART and CPU combination. It doesn't seem like bad design to me.

The important thing is that the hardware has a limit to the rate it can accept data, and the CPU can meet this rate.

Even for USB ACM, which on e.g. nRF52840 can do about 3-4 Mbaud, the CPU is faster than the USB hardware.

But if the UART immediately drops data, then the hardware is now infinity fast. Most of the uart examples or other users I see in Zephyr are not designed to handle the case of an infinitely fast uart.

Note that the throttling is not supposed to come from scheduling details. IMHO, that would be too hard to keep working with any changes to scheduling details.

The USB CDC ACM driver should have a ring buffer of a limited size. The size of the buffer multiplied by the 1 ms (or whatever) delay used produces a maximum throughput.

@stephanosio stephanosio added the priority: low Low impact/importance bug label Jan 3, 2023
@jfischer-no
Copy link
Collaborator

After disconnecting, the next ping will enable tx on the uart. The interrupt handler will be called once, call uart_fifo_fill() and not disable TX. The interrupt handler should be called again, but it is not. The app is basically hung in uart TX and will wait forever for the data to be sent to a USB host.

The only possible problem I see is "After disconnecting, ...The interrupt handler will be called once,". Ideally, interrupt handler should not even be called once after disconnected event. This does not look like a bug, rather a discussion.

Can you please tell what in the description of UART driver API makes you think "The interrupt handler should be called again"? What do you mean by "hung in uart TX"?

@xyzzy42
Copy link
Contributor Author

xyzzy42 commented Jan 9, 2023

On all other UART drivers, the interrupt handler will continue to be triggered as long as tx is enabled. Does it not seem like the API is supposed to work this way, when all other drivers do it?

It is also not how the CDC ACM driver works when the polling API is used.

That the CDC ACM driver in interrupt mode behaves differently in all other UART drivers and also from itself in polling mode should be proof that the behavior is not correct.

There is no description of how the UART API works. There are comments for individual functions. The one for uart_irq_tx_enable() is "Enable TX interrupt in IER." That's it. So the only way to define how the API should work is to look at the example code and the drivers and see how they work.

And all code expects than when uart_irq_tx_enable() is called the irq callback function will continue to be triggered at the rate at which the UART consumes the data.

@desowin desowin assigned tmon-nordic and unassigned desowin Jan 26, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Mar 28, 2023
@xyzzy42
Copy link
Contributor Author

xyzzy42 commented Mar 28, 2023

@tmon-nordic Can we acknowledge that this driver behaves different than other UART drivers?

@tmon-nordic
Copy link
Contributor

@tmon-nordic Can we acknowledge that this driver behaves different than other UART drivers?

Well, it is operating on different "hardware" which than the other UART drivers. I don't quite agree on the fake throttling part. For the dropping data, please open PR fixing the issue so we can review it.

@github-actions github-actions bot removed the Stale label Mar 29, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label May 28, 2023
@nashif
Copy link
Member

nashif commented May 30, 2023

closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

7 participants