Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openocd: Can't flash on various STM32 boards #50590

Closed
erwango opened this issue Sep 23, 2022 · 24 comments · Fixed by #50806
Closed

openocd: Can't flash on various STM32 boards #50590

erwango opened this issue Sep 23, 2022 · 24 comments · Fixed by #50806
Assignees
Labels
area: OpenOCD area: Toolchains Toolchains bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: medium Medium impact/importance bug Regression Something, which was working, does not anymore Stale
Milestone

Comments

@erwango
Copy link
Member

erwango commented Sep 23, 2022

Describe the bug
Since SDK0.15.0, a flashing issue using openocd appears in specific conditions on various STM32 boards.

Impacted boards identified so far (likely more to come):

  • disco_l475_iot1
  • nucleo_f429zi
  • stm32f3_disco
  • stm32f429i_disc1

Note: Potentially applies to non STM32 boards, due to the nature of the issue.

Issues could be seen when trying to flash just after the board is plugged-in.
Following flashing attempts work seamlessly (reason it was not seen on test bench)

Workarounds
Several workarounds are possible:

  • Run the flash command twice
  • Press reset button at the beginning of flash sequence
  • Use other runners (stm32cubeprorgammer, pyocd)

To Reproduce
Steps to reproduce the behavior:
Build and try to flash any of these boards using Zephyr SDK 0.15.0

Expected behavior
Flashing using openocd works seamlessly, every time.

Impact
Annoyance, widely spread. Could disturb new users.

Environment (please complete the following information):

  • OS: (e.g. Linux, MacOS, Windows): Linux
  • Toolchain (e.g Zephyr SDK, ...): Zephyr SDK 0.15.0
  • Commit SHA or Version used: v3.2-rc1

Additional context

Open On-Chip Debugger 0.11.0+dev-00724-g42b6471c1 (2022-08-17-18:23)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_deassert_srst

Info : clock speed 500 kHz
Info : STLINK V2J35M26 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.225790
Warn : target stm32l4x.cpu examination failed
Info : starting gdb server for stm32l4x.cpu on 3333
Info : Listening on port 3333 for gdb connections
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* stm32l4x.cpu       hla_target little stm32l4x.cpu       unknown

Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : Unable to match requested speed 500 kHz, using 480 kHz
TARGET: stm32l4x.cpu - Not halted
FATAL ERROR: command exited with status 1: <redacted>/sdk/zephyr-sdk-0.15.0/sysroots/x86_64-pokysdk-linux/usr/bin/openocd -s  <redacted>/zephyrproject/zephyr/boards/arm/disco_l475_iot1/support -s  <redacted>/sdk/zephyr-sdk-0.15.0/sysroots/x86_64-pokysdk-linux/usr/share/openocd/scripts -f <redacted>/zephyrproject/zephyr/boards/arm/disco_l475_iot1/support/openocd.cfg '-c init' '-c targets' -c 'reset halt' -c 'flash write_image erase build/disco_l475_iot1/zephyr/zephyr.hex' -c 'reset halt' -c 'verify_image build/disco_l475_iot1/zephyr/zephyr.hex' -c 'reset run' -c shutdown
@erwango erwango added the bug The issue is a bug, or the PR is fixing a bug label Sep 23, 2022
@erwango erwango self-assigned this Sep 23, 2022
@erwango erwango added priority: medium Medium impact/importance bug platform: STM32 ST Micro STM32 area: OpenOCD Regression Something, which was working, does not anymore labels Sep 23, 2022
@erwango
Copy link
Member Author

erwango commented Sep 23, 2022

Guilty commit in openocd: zephyrproject-rtos/openocd@98d9f11

Reverting this commit fixes the issue.

@henrikbrixandersen
Copy link
Member

I can reproduce this on the stm32f3_disco as well.

@erwango
Copy link
Member Author

erwango commented Sep 23, 2022

I can reproduce this on the stm32f3_disco as well.

Thanks, I've updated the list.

@fabiobaltieri
Copy link
Member

Tried a couple boards, can reproduce on stm32f429i_disc1 but not on nrf52dk_nrf52832 or nucleo_h745zi_q_m7. Could this only affect ST-Link v2 boards?

@erwango
Copy link
Member Author

erwango commented Sep 23, 2022

Tried a couple boards, can reproduce on stm32f429i_disc1 but not on nrf52dk_nrf52832 or nucleo_h745zi_q_m7. Could this only affect ST-Link v2 boards?

Thanks for the tests. nucleo_l073rz, using a ST-Link/V2-1, is not impacted so it might be trickier than that.

@stephanosio
Copy link
Member

stephanosio commented Sep 23, 2022

I will look into reverting zephyrproject-rtos/openocd@98d9f11 in the upcoming Zephyr SDK 0.15.1, unless a proper fix comes before that.

@fabiobaltieri fabiobaltieri added the area: Toolchains Toolchains label Sep 23, 2022
@fabiobaltieri
Copy link
Member

About the workaround: does it really work if you press reset for you? For me it only works by retrying. That is: plug - reset - wests flash <- does not work.

@henrikbrixandersen
Copy link
Member

About the workaround: does it really work if you press reset for you?

Yes. If I hold down reset, run west flash, then release reset after openocd is running, it works.

@erwango
Copy link
Member Author

erwango commented Sep 23, 2022

About the workaround: does it really work if you press reset for you? For me it only works by retrying. That is: plug - reset - wests flash <- does not work.

I confirm this works on nucleo_f429zi. Note that you need to release once the flashing starts.

@fabiobaltieri
Copy link
Member

About the workaround: does it really work if you press reset for you? For me it only works by retrying. That is: plug - reset - wests flash <- does not work.

I confirm this works on nucleo_f429zi. Note that you need to release once the flashing starts.

Ok that works. But also re-running west flash a second time right? (may be worth noting it in the workarounds)

@erwango
Copy link
Member Author

erwango commented Sep 23, 2022

But also re-running west flash a second time right? (may be worth noting it in the workarounds)

You're right! And that might be the easiest one in most cases. Updating the description.

@tom-van
Copy link

tom-van commented Sep 23, 2022

The main problem is wrong OpenOCD configuration used in Zephyr.
Zephyr code puts Cortex-M into some sleep mode and CPU clock is stopped.
In such state the debugger is not able to connect the sleeping target.
The CPU wakes up time to time. If this interval is long enough, the debugger may randomly succeed connecting.
There is the only reliable way to connect: assert reset during connect!
To do so OpenOCD needs a configuration command
reset_config connect_assert_srst
before 'init'. This acts similarly as when you press the reset button and release it
when OpenOCD starts. All STM32 devices support connect_assert_srst.
Unfortunately some others do not. Check target config file for
reset_config srst_nogate

Why programming fails at the first try after power-up only?
OpenOCD sets DBGMCU_CR |= DBG_STANDBY | DBG_STOP | DBG_SLEEP
This register is not cleared by reset and ensures the sleep mode doesn't stop
the clock therefore CPU can respond to the debugger.

There is another serious problem in commands Zephyr sends to OpenOCD.
OpenOCD strictly requires 'reset init' to halt and prepare the device before flash programming.
See https://openocd.org/doc/html/Flash-Commands.html
chapter 12.2 Preparing a Target before Flash Programming
I don't know why Zephyr uses 'reset halt', be aware it's wrong.
On most devices it results just in slower flashing (CPU clock is not boosted)
but on some devices may result in programming failures (e.g. watchdog is not stopped or so).

@erwango
Copy link
Member Author

erwango commented Sep 27, 2022

Thanks @tom-van for these insights. We'll try to fix configurations accordingly.

@erwango
Copy link
Member Author

erwango commented Sep 27, 2022

Here is the status on the tests/searches I've made so far.

board Current config Impacted Flash with connect_assert_srst
nucleo_f429zi st_nucleo_f4.cfg (srst_nogate srst_only) Yes ok
nucleo_f103rb st_nucleo_f103rb.cfg (srst_nogate srst_only) Yes ok
nucleo_f030r8 st_nucleo_f0.cfg (srst_nogate srst_only) No ok
nucleo_l073rz st_nucleo_l0.cfg (srst_nogate srst_only) No
nucleo_l476rg st_nucleo_l4.cfg (srst_nogate srst_only) Yes ok
nucleo_wb55rg stm32wbx.cfg (srst_nogate srst_only) No
disco_l475_iot1 srst_nogate srst_only Yes ok
stm32f3_disco stm32f3discovery.cfg (srst_nogate srst_only) Yes ok

First insights:

  • Not easy to guess which boards are impacted based on the current srst config
  • Systematic use of connect_assert_srst when initial config is srst_nogate srst_only seems harmless

Drawback:
Systematic use of connect_assert_srst breaks west debug.
One way to bypass is to provide specific configuration in board.cmake files (as a --cmd-pre-init "reset_config connect_assert_srst" for instance) and discard these commands in openocd runner for debug and attach commands.
From what I see there is no way to do this with current version of openocd runner.

@erwango
Copy link
Member Author

erwango commented Sep 27, 2022

Adding the following in the openocd.cfg solves the west debug issue:

rename init old_init
proc init {} {
        old_init
       reset halt
}

That works, but I'm a bit reluctant to add this blindly on all STM32 based boards.

@erwango
Copy link
Member Author

erwango commented Sep 28, 2022

Current status:

Assuming openocd is correct and that the commit that reveals the issue should be kept, fixing the issue globally (fix flash and debug on all impacted boards) requires a fix (likely in openocd.cfg) on all boards.
Two issues with that approach:

  • we don't know for sure which boards are impacted (I don't have all the boards at hand, and even if we can guess rules of thumb, nothing is certain)
  • we can't fix out of tree boards

I'm not clear on how a clean solution should look like (could be in openocd.cfg or board.cmake and openocd runner), but I don't see today a clean solution that would not impact out of tree boards (if we assume openocd is correct, then the fix should be on boards side, whatever it is).

For now:
I would propose to revert the impacting commit from openocd for current DV (3.2.X).
In next DV, we can take time to find a clean board side solution and communicate clearly to out of tree users that it should be applied.

@stephanosio Are you ok with this approach ?

@stephanosio
Copy link
Member

For now: I would propose to revert the impacting commit from openocd for current DV (3.2.X). In next DV, we can take time to find a clean board side solution and communicate clearly to out of tree users that it should be applied.

@stephanosio Are you ok with this approach ?

Sure, as long as the Zephyr-side configurations are going to be fixed.

I will revert zephyrproject-rtos/openocd@98d9f11 in the Zephyr SDK 0.15.1-rc2 (I already tagged rc1 today).

stephanosio added a commit to zephyrproject-rtos/openocd that referenced this issue Sep 29, 2022
This reverts commit 98d9f11 because
it causes flashing issues with some Zephyr targets (notably, STM32
family devices).

The patch in itself does not seem to be doing anything wrong in
particular; but, it uncovers various problems with the Zephyr-side
OpenOCD configurations -- mainly, not resetting the target device on
debugger connect, which may lead to connection failures because Zephyr
puts the target CPU to sleep while idling and the CPU cannot respond to
the debugger's request in this state.

For more details, refer to the GitHub issue
zephyrproject-rtos/zephyr#50590.

Revert this commit once the Zephyr-side OpenOCD configurations are
fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@stephanosio
Copy link
Member

Here is a test build with zephyrproject-rtos/openocd@98d9f11 reverted:
https://github.com/zephyrproject-rtos/sdk-ng/suites/8525740612/artifacts/380154374

@erwango can you confirm if this fixes the issue?

@erwango
Copy link
Member Author

erwango commented Sep 29, 2022

@stephanosio I confirm this works on the boards mentioned earlier in this issues and some more. I also confirm no regression on the boards that were not hit by this bug (tested >10 boards from various STM32 series)

With one exception: nucleo_f103rb
On this specific board, reset_config connect_assert_srst and the patch for west debug are required. It's probably impacted by another change. I'll try to dig into that when I have some time.
Meantime, I'll propose a fix for this specific board.

Note that this exception reinforce my belief in the fact that reverting the commit for DV3.2 is safer as the impacts of openocd changes are hard to know for sure.

stephanosio added a commit to zephyrproject-rtos/openocd that referenced this issue Sep 29, 2022
This reverts commit 98d9f11 because
it causes flashing issues with some Zephyr targets (notably, STM32
family devices).

The patch in itself does not seem to be doing anything wrong in
particular; but, it uncovers various problems with the Zephyr-side
OpenOCD configurations -- mainly, not resetting the target device on
debugger connect, which may lead to connection failures because Zephyr
puts the target CPU to sleep while idling and the CPU cannot respond to
the debugger's request in this state.

For more details, refer to the GitHub issue
zephyrproject-rtos/zephyr#50590.

Revert this commit once the Zephyr-side OpenOCD configurations are
fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
erwango added a commit to erwango/zephyr that referenced this issue Sep 29, 2022
With latest version of openocd delivered in Zephyr SDK 0.15.0,
a new configuration is required to flash and debug this board:
"reset_config connect_assert_srst" allows to flash the board
The new "init" function allows to run debug.

Fixes zephyrproject-rtos#50590 for this specific board

Note that other boards might be also impacted by this new version
of openocd but a revert of zephyrproject-rtos/openocd@98d9f11 allows
to get back to the previous status.
A new Zephyr SDK release (V0.15.1) will be available with a revert of
this commit. Unfortunately this has no impact on nucleo_f103rb. Hence
this change.


Signed-off-by: Erwan Gouriou <erwan.gouriou@linaro.org>
@tom-van
Copy link

tom-van commented Sep 29, 2022

I learned from your conversation that you prefer to run OpenOCD without reset_config connect_assert_srst.
This way is not as reliable as with connect_assert_srst and OpenOCD inevitably shows some errors as it tries to talk to a sleeping and unresponsive target. The first reset halt should restore the communication with the target so this way should be viable too if you tolerate the errors before this command. Unfortunately this way is not extensively tested by us at OpenOCD (if tested at all). That's why problems started after merging zephyrproject-rtos/openocd@98d9f11
To be honest I proposed to revert this change almost a year ago
6753: Revert "target: reset target examined flag if target::examine() fails" | https://review.openocd.org/c/openocd/+/6753
Unfortunately no other maintainer agreed with this so I recently dropped that PR.

After the deep analyse of the way how debug is used in zephyr I proposed 4 new OpenOCD changes to harden the robustness of reset commands.
6745: target/cortex_m: make reset robust again | https://review.openocd.org/c/openocd/+/6745
7228: target/cortex_m: try to re-examine under reset in cortex_m_assert_reset() | https://review.openocd.org/c/openocd/+/7228
7229: target/hla_target: try to re-examine under reset in hl_assert_reset() | https://review.openocd.org/c/openocd/+/7229
7230: target: re-examine before arp_waitstate in ocd_process_reset_inner | https://review.openocd.org/c/openocd/+/7230

Although it's not necessary with new changes I also revived the request to revert the problematic commit - just to be safe from other regressions. Reverting it in your project makes good sense to me.

I would appreciate if you test the new changes and give comments to the OpenOCD gerrit. Thanks

@erwango
Copy link
Member Author

erwango commented Sep 29, 2022

@tom-van Thanks for your time on this issue

I learned from your conversation that you prefer to run OpenOCD without reset_config connect_assert_srst.

In general we're quite open to changes and personally I don't have strong preference, but there are 2 points going against this change today:

  • It needs to be deployed on more than 100 boards in tree (with a good part that can't be tested directly/easily) and on a unknown number of out of tree boards.
  • It requires some additional changes to be compatible with west debug (and some more maybe for west attach)

Although it's not necessary with new changes I also revived the request to revert the problematic commit - just to be safe from other regressions.

Sounds safe indeed.

Reverting it in your project makes good sense to me.

Thanks for this feedback.

I would appreciate if you test the new changes and give comments to the OpenOCD gerrit. Thanks

Will do for sure.

fabiobaltieri pushed a commit that referenced this issue Sep 29, 2022
With latest version of openocd delivered in Zephyr SDK 0.15.0,
a new configuration is required to flash and debug this board:
"reset_config connect_assert_srst" allows to flash the board
The new "init" function allows to run debug.

Fixes #50590 for this specific board

Note that other boards might be also impacted by this new version
of openocd but a revert of zephyrproject-rtos/openocd@98d9f11 allows
to get back to the previous status.
A new Zephyr SDK release (V0.15.1) will be available with a revert of
this commit. Unfortunately this has no impact on nucleo_f103rb. Hence
this change.


Signed-off-by: Erwan Gouriou <erwan.gouriou@linaro.org>
@erwango
Copy link
Member Author

erwango commented Sep 29, 2022

I'm reopening this issue, as we're not done with it.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Nov 29, 2022
@fabiobaltieri
Copy link
Member

fabiobaltieri commented Nov 29, 2022

This should be now fixed with the latest SDK.

katyo pushed a commit to katyo/openocd that referenced this issue Jun 21, 2023
This reverts commit 98d9f11 because
it causes flashing issues with some Zephyr targets (notably, STM32
family devices).

The patch in itself does not seem to be doing anything wrong in
particular; but, it uncovers various problems with the Zephyr-side
OpenOCD configurations -- mainly, not resetting the target device on
debugger connect, which may lead to connection failures because Zephyr
puts the target CPU to sleep while idling and the CPU cannot respond to
the debugger's request in this state.

For more details, refer to the GitHub issue
zephyrproject-rtos/zephyr#50590.

Revert this commit once the Zephyr-side OpenOCD configurations are
fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
katyo pushed a commit to katyo/openocd that referenced this issue Jun 21, 2023
This reverts commit 98d9f11 because
it causes flashing issues with some Zephyr targets (notably, STM32
family devices).

The patch in itself does not seem to be doing anything wrong in
particular; but, it uncovers various problems with the Zephyr-side
OpenOCD configurations -- mainly, not resetting the target device on
debugger connect, which may lead to connection failures because Zephyr
puts the target CPU to sleep while idling and the CPU cannot respond to
the debugger's request in this state.

For more details, refer to the GitHub issue
zephyrproject-rtos/zephyr#50590.

Revert this commit once the Zephyr-side OpenOCD configurations are
fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: OpenOCD area: Toolchains Toolchains bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: medium Medium impact/importance bug Regression Something, which was working, does not anymore Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants