Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

Closed
Tracked by #36724
alexanderwachter opened this issue Oct 1, 2019 · 9 comments · Fixed by #41845
Closed
Tracked by #36724

drivers: CAN: Unify the TX error behavior of all CAN drivers #19502

alexanderwachter opened this issue Oct 1, 2019 · 9 comments · Fixed by #41845
Assignees
Labels
area: API Changes to public APIs area: CAN Enhancement Changes/Updates/Additions to existing features
Milestone

Comments

@alexanderwachter
Copy link
Member

Currently, the CAN drivers behave differently in case of a TX error.
For example, the stm32- and mcp2515-driver retry sending after arbitration lost, where the MCUX-driver abort.

Describe the solution you'd like
Unify the behavior for all CAN controllers and drivers.

This issue is created to collect opinions on the desired behavior of the CAN API.
Please comment on the proposals or issue your own opinion here.

@alexanderwachter alexanderwachter added Enhancement Changes/Updates/Additions to existing features area: CAN labels Oct 1, 2019
@alexanderwachter alexanderwachter added this to Triage in Architecture Project via automation Oct 1, 2019
@alexanderwachter
Copy link
Member Author

@alexanderwachter
Copy link
Member Author

My proposals are:

  • Convert the actual mailbox-allocation-timeout to a general send timeout.
  • Arbitration lost should trigger an automatic retry (until timeout)
  • Error-Frames should trigger an automatic retry (until timeout or bus-off)
  • NACK should trigger an automatic retry (until timeout)

@alexanderwachter
Copy link
Member Author

FYI: https://github.com/alexanderwachter/zephyr/tree/can_extend_api extends the API to have a propper bus-error management.

@alexanderwachter alexanderwachter added the area: API Changes to public APIs label Oct 1, 2019
@alexanderwachter alexanderwachter self-assigned this Oct 1, 2019
@nixward
Copy link
Member

nixward commented Oct 1, 2019

I would imagine aiming for parity with the features in the Linux CAN configuration with Kconfig would be a good ultimate goal:
https://www.kernel.org/doc/Documentation/networking/can.txt (see section 6.5.1)

If the TX automatic retry controller configuration can be defined in Kconfig then the user can configure the driver's behavior. If the user configures the CAN driver TX without retries then the user's CAN application protocol choices should define how to handle arbitration errors, or nack errors. From my understanding the CAN controller hardware would generally be configured to handle the retries.

Possibly a callback register to receive bus state changes for the following states (including bus-off) would be simple way to go:
https://github.com/torvalds/linux/blob/81160dda9a7aad13c04e78bb2cfd3c4630e3afab/include/uapi/linux/can/netlink.h#L67

@alexanderwachter
Copy link
Member Author

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.
A Kconfig flag to disable automatic retransmission is nevertheless a good idea.

@carlescufi carlescufi moved this from Triage to In Progress in Architecture Project Oct 2, 2019
@karstenkoenig
Copy link
Collaborator

I like the approach of a general timeout on the send, would also match expectations more I'd think. The send call could still be non blocking though when providing a callback right? Should the can_tx callbacks then get a status that informs about timeout so a message can be assumed to not have been sent? Or would you always block until a message was sent?

Possibly a callback register to receive bus state changes for the following states (including bus-off) would be simple way to go:
https://github.com/torvalds/linux/blob/81160dda9a7aad13c04e78bb2cfd3c4630e3afab/include/uapi/linux/can/netlink.h#L67

@alexanderwachter has something like this in his https://github.com/alexanderwachter/zephyr/tree/can_extend_api can_register_state_change_isr

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.

Can you not cancel a transmission on a mailbox with a pending transmission? The abort/timeout might not be able to guarantee that a message was not sent at all, but at least that it will not be sent after the timeout has triggered/returned.

@alexanderwachter
Copy link
Member Author

The send call could still be non blocking though when providing a callback right?

Exactly. I'll even try to introduce a second timeout for the locks and message box allocation. With that change, can_send() could be called with K_NOWAIT, the second timeout parameter as K_MSEC(100), and no callback. This combination would be save to call from an ISR

Should the can_tx callbacks then get a status that informs about timeout so a message can be assumed to not have been sent? Or would you always block until a message was sent?

The callback informs you about the reason why the timeout happened. For example TX-errors, arbitration lost, NACK. This is the same behavior as the blocking call. In my prototype, I INTRODUCED TIMERS FOR EACH MAILBOX.

The problem I see is that when a user enables automatic retransmission, there is no way to abort a transmission or have a timeout.

Can you not cancel a transmission on a mailbox with a pending transmission? The abort/timeout might not be able to guarantee that a message was not sent at all, but at least that it will not be sent after the timeout has triggered/returned.

Transmissions can be aborted. At least the stm32 and mcux can do that. For the MCP2515, I have to check the datasheet.

Messages that are in transmission can't be canceled. Messaged pending or need a retransmit can be canceled.

@karstenkoenig
Copy link
Collaborator

Sorry I wasn't clear, yes you can abort transmissions on the MCP2515, I was confused whether this was not a feature to be expected everywhere :-) Not so sure on the detail of the timeout one could provide. I can read back status bits on the mailbox but I am not sure if that can be as detailed as 'message was never acknowledged'. Will need to dig in deeper into the datasheet :-)

@alexanderwachter
Copy link
Member Author

alexanderwachter commented Oct 8, 2019

First prototype:
https://github.com/alexanderwachter/zephyr/tree/can_send_timeout

Edit:
Prototype PR:
#19707

@carlescufi carlescufi moved this from In Progress to To Do in Architecture Project Feb 16, 2021
henrikbrixandersen added a commit to vestas-wind-systems/zephyr that referenced this issue Jan 15, 2022
Rework the transmit error handling in the NXP MCUX FlexCAN driver:
- Frame transmission must be automatically retried in case of lost
  arbitration or missing acknowledge.
- Abort any pending TX frames when bus-off state is entered.
- Fail early in can_send() if in bus-off state.

Fixes: zephyrproject-rtos#19502

Signed-off-by: Henrik Brix Andersen <hebad@vestas.com>
@henrikbrixandersen henrikbrixandersen moved this from To Do to In Progress in Architecture Project Jan 15, 2022
@henrikbrixandersen henrikbrixandersen added this to the v3.0.0 milestone Jan 15, 2022
Architecture Project automation moved this from In Progress to Done Jan 18, 2022
nashif pushed a commit that referenced this issue Jan 18, 2022
Rework the transmit error handling in the NXP MCUX FlexCAN driver:
- Frame transmission must be automatically retried in case of lost
  arbitration or missing acknowledge.
- Abort any pending TX frames when bus-off state is entered.
- Fail early in can_send() if in bus-off state.

Fixes: #19502

Signed-off-by: Henrik Brix Andersen <hebad@vestas.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: CAN Enhancement Changes/Updates/Additions to existing features
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants