drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

alexanderwachter · 2019-10-01T07:12:36Z

Currently, the CAN drivers behave differently in case of a TX error.
For example, the stm32- and mcp2515-driver retry sending after arbitration lost, where the MCUX-driver abort.

Describe the solution you'd like
Unify the behavior for all CAN controllers and drivers.

This issue is created to collect opinions on the desired behavior of the CAN API.
Please comment on the proposals or issue your own opinion here.

alexanderwachter · 2019-10-01T07:14:00Z

ping @karstenkoenig @henrikbrixandersen @carlescufi

alexanderwachter · 2019-10-01T07:17:19Z

My proposals are:

Convert the actual mailbox-allocation-timeout to a general send timeout.
Arbitration lost should trigger an automatic retry (until timeout)
Error-Frames should trigger an automatic retry (until timeout or bus-off)
NACK should trigger an automatic retry (until timeout)

alexanderwachter · 2019-10-01T07:21:09Z

FYI: https://github.com/alexanderwachter/zephyr/tree/can_extend_api extends the API to have a propper bus-error management.

nixward · 2019-10-01T14:17:12Z

I would imagine aiming for parity with the features in the Linux CAN configuration with Kconfig would be a good ultimate goal:
https://www.kernel.org/doc/Documentation/networking/can.txt (see section 6.5.1)

If the TX automatic retry controller configuration can be defined in Kconfig then the user can configure the driver's behavior. If the user configures the CAN driver TX without retries then the user's CAN application protocol choices should define how to handle arbitration errors, or nack errors. From my understanding the CAN controller hardware would generally be configured to handle the retries.

Possibly a callback register to receive bus state changes for the following states (including bus-off) would be simple way to go:
https://github.com/torvalds/linux/blob/81160dda9a7aad13c04e78bb2cfd3c4630e3afab/include/uapi/linux/can/netlink.h#L67

alexanderwachter · 2019-10-01T14:29:28Z

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.
A Kconfig flag to disable automatic retransmission is nevertheless a good idea.

karstenkoenig · 2019-10-03T19:00:08Z

I like the approach of a general timeout on the send, would also match expectations more I'd think. The send call could still be non blocking though when providing a callback right? Should the can_tx callbacks then get a status that informs about timeout so a message can be assumed to not have been sent? Or would you always block until a message was sent?

Possibly a callback register to receive bus state changes for the following states (including bus-off) would be simple way to go:
https://github.com/torvalds/linux/blob/81160dda9a7aad13c04e78bb2cfd3c4630e3afab/include/uapi/linux/can/netlink.h#L67

@alexanderwachter has something like this in his https://github.com/alexanderwachter/zephyr/tree/can_extend_api can_register_state_change_isr

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.

Can you not cancel a transmission on a mailbox with a pending transmission? The abort/timeout might not be able to guarantee that a message was not sent at all, but at least that it will not be sent after the timeout has triggered/returned.

alexanderwachter · 2019-10-04T07:05:19Z

The send call could still be non blocking though when providing a callback right?

Exactly. I'll even try to introduce a second timeout for the locks and message box allocation. With that change, can_send() could be called with K_NOWAIT, the second timeout parameter as K_MSEC(100), and no callback. This combination would be save to call from an ISR

Should the can_tx callbacks then get a status that informs about timeout so a message can be assumed to not have been sent? Or would you always block until a message was sent?

The callback informs you about the reason why the timeout happened. For example TX-errors, arbitration lost, NACK. This is the same behavior as the blocking call. In my prototype, I INTRODUCED TIMERS FOR EACH MAILBOX.

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.

Can you not cancel a transmission on a mailbox with a pending transmission? The abort/timeout might not be able to guarantee that a message was not sent at all, but at least that it will not be sent after the timeout has triggered/returned.

Transmissions can be aborted. At least the stm32 and mcux can do that. For the MCP2515, I have to check the datasheet.

Messages that are in transmission can't be canceled. Messaged pending or need a retransmit can be canceled.

karstenkoenig · 2019-10-04T14:59:39Z

Sorry I wasn't clear, yes you can abort transmissions on the MCP2515, I was confused whether this was not a feature to be expected everywhere :-) Not so sure on the detail of the timeout one could provide. I can read back status bits on the mailbox but I am not sure if that can be as detailed as 'message was never acknowledged'. Will need to dig in deeper into the datasheet :-)

alexanderwachter · 2019-10-08T15:10:59Z

First prototype:
https://github.com/alexanderwachter/zephyr/tree/can_send_timeout

Edit:
Prototype PR:
#19707

Rework the transmit error handling in the NXP MCUX FlexCAN driver: - Frame transmission must be automatically retried in case of lost arbitration or missing acknowledge. - Abort any pending TX frames when bus-off state is entered. - Fail early in can_send() if in bus-off state. Fixes: zephyrproject-rtos#19502 Signed-off-by: Henrik Brix Andersen <hebad@vestas.com>

Rework the transmit error handling in the NXP MCUX FlexCAN driver: - Frame transmission must be automatically retried in case of lost arbitration or missing acknowledge. - Abort any pending TX frames when bus-off state is entered. - Fail early in can_send() if in bus-off state. Fixes: #19502 Signed-off-by: Henrik Brix Andersen <hebad@vestas.com>

alexanderwachter added Enhancement Changes/Updates/Additions to existing features area: CAN labels Oct 1, 2019

alexanderwachter added this to Triage in Architecture Project via automation Oct 1, 2019

alexanderwachter added the area: API Changes to public APIs label Oct 1, 2019

alexanderwachter self-assigned this Oct 1, 2019

carlescufi moved this from Triage to In Progress in Architecture Project Oct 2, 2019

alexanderwachter mentioned this issue Oct 9, 2019

[WIP] Drivers: CAN: Implementation of a send timeout for can_send(...) #19707

Closed

carlescufi moved this from In Progress to To Do in Architecture Project Feb 16, 2021

henrikbrixandersen mentioned this issue Jul 5, 2021

The road to a stable Controller Area Network driver API #36724

Closed

25 tasks

henrikbrixandersen mentioned this issue Jan 4, 2022

drivers: can: rename API functions for better consistency #41480

Merged

henrikbrixandersen assigned henrikbrixandersen and unassigned alexanderwachter Jan 14, 2022

henrikbrixandersen mentioned this issue Jan 15, 2022

drivers: can: unify TX behaviour #41845

Merged

henrikbrixandersen added the has-pr label Jan 15, 2022

henrikbrixandersen moved this from To Do to In Progress in Architecture Project Jan 15, 2022

henrikbrixandersen added this to the v3.0.0 milestone Jan 15, 2022

nashif closed this as completed in #41845 Jan 18, 2022

Architecture Project automation moved this from In Progress to Done Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

nixward commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

karstenkoenig commented Oct 3, 2019

alexanderwachter commented Oct 4, 2019

karstenkoenig commented Oct 4, 2019

alexanderwachter commented Oct 8, 2019 •

edited

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

Comments

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

nixward commented Oct 1, 2019

alexanderwachter commented Oct 1, 2019

karstenkoenig commented Oct 3, 2019

alexanderwachter commented Oct 4, 2019

karstenkoenig commented Oct 4, 2019

alexanderwachter commented Oct 8, 2019 • edited

alexanderwachter commented Oct 8, 2019 •

edited