drivers: serial: stm32: Data loss & possible hardfault

**Describe the bug**
The async_user_callback can be triggered from two places one from the DMA transfer complete interrupt generated by the DMA and from an k_work queue timeout.
This timeout is not in a ISR context, which means the DMA transfer complete event can interrupt the timeout, resulting in data not in sequence and a DMA buffer offset and length that can be changed while the timeout is using these variables.

This can be seen on the callstack, note the timeout is mid copy (memcpy) when the interrupt triggeres;

![image](https://github.com/user-attachments/assets/e586cb86-99ca-45df-a7da-595e66833392)

**To Reproduce**
We've used this callback handler to debug the issue simply be setting a breakpoint at LOG_ERR("UART FAULT"), the variables bad_message_last_buffer_ptr  and expected_new_offset needs to be stored outside the callback. 
From what I can see this issue is a race condition, for example if the timeout triggers when the DMA buffer is almost filled (missing 1 byte or something) and the last bytes are received, the DMA transfer complete interrupt will trigger and interrupt the timeout.

```
static void dma_tx_done_callback(const struct device *dev, struct uart_event *evt, void *user_data) {
        auto *data  = static_cast<UART::dma_callback_data *>(user_data);
        int32_t err = 0;

        switch (evt->type) {
            case UART_RX_RDY: {
                IO::Message message;

                uint8_t *dataLocation = &evt->data.rx.buf[evt->data.rx.offset];

                if (data->bad_message_last_buffer_ptr != evt->data.rx.buf) {
                    data->expected_new_offset = 0U;
                }
                data->bad_message_last_buffer_ptr = evt->data.rx.buf;

                if (data->expected_new_offset != evt->data.rx.offset) {
                    LOG_ERR("UART Fault");
                } else {
                    data->expected_new_offset += evt->data.rx.len;

                    memcpy(message.data.begin(), dataLocation, evt->data.rx.len);
                    message.length = evt->data.rx.len;

                    err = k_msgq_put(&data->recv_buffer, &message, K_NO_WAIT);
                    if (err != 0) {
                        data->uart->rxError = err;
                    }
                }

                err = k_poll_signal_raise(&data->recv_signal, 1U);
                if (err != 0) {
                    data->uart->rxError = err;
                }

            } break;

// Rest of the callback...
```

**Expected behavior**
The timeout cannot be interrupted by DMA transfer complete. 

**Impact**
For now we've had to patch the uart_stm32.c driver with an irq_lock/irq_unlock in and start and end of the uart_stm32_async_rx_timeout function;

```
static void uart_stm32_async_rx_timeout(struct k_work *work)
{
	// Added to stop the dma transfer complete interrupt for interrupting while this is running,
	// Since they share data 
	unsigned int key = irq_lock();

	struct k_work_delayable *dwork = k_work_delayable_from_work(work);
	struct uart_dma_stream *rx_stream =
		CONTAINER_OF(dwork, struct uart_dma_stream, timeout_work);
	struct uart_stm32_data *data = CONTAINER_OF(rx_stream, struct uart_stm32_data, dma_rx);
	const struct device *dev = data->uart_dev;

	LOG_DBG("rx timeout");

	if (data->dma_rx.counter == data->dma_rx.buffer_length) {
		uart_stm32_async_rx_disable(dev);
	} else {
		uart_stm32_dma_rx_flush(dev);
	}
 
	irq_unlock(key);
}
```
However, without this the driver is completely broken i.e. can lead to hard faults (we've seen the event->data.rx.offset be int max, which when used with a memcpy causes a hardfault) or simply just lost data.

**Environment (please complete the following information):**

- OS: Linux
- Toolchain GCC 12.3.1
- Zephyr 3.6.0

**Additional context**
This was tested on the STM32H743


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

drivers: serial: stm32: Data loss & possible hardfault #76799

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

drivers: serial: stm32: Data loss & possible hardfault #76799

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions