Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption in STM32 SPI driver in Slave Mode #43115

Closed
kborowski2 opened this issue Feb 23, 2022 · 15 comments
Closed

Data corruption in STM32 SPI driver in Slave Mode #43115

kborowski2 opened this issue Feb 23, 2022 · 15 comments
Assignees
Labels
area: SPI SPI bus bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug Stale Waiting for response Waiting for author's response
Milestone

Comments

@kborowski2
Copy link

Describe the bug
There is a problem with STM32 SPI driver using SLAVE mode with DMA. We can observe 2 problematic cases:

  1. Driver SPI transmission can be enabled during ongoing data transmission (when CS is active), which produces garbage data (data frame is shifted).
  2. Data transmission start can occur after enabling SPI bus and before enabling DMA - we can observe strange bits shifts, which may be caused because of that.

We are using STM32WB55.

To Reproduce
To reproduce this behavior extra device is needed, which will be in the role of SPI master device.
Bug can be seen when master device sends frame to the slave device and then is polling (with very small period) for an answer.

Expected behavior

  1. Master sends command to slave, and it starts polling for an answer (if slave do not put any data to SPI (DMA buffer) then master should read zeros on SPI bus).
  2. Slave device receives command from master, and it starts command processing (during that time master is sending polling frame with small period, for example 50 us).
  3. After slave has prepared an answer then answer content it is put for sending over SPI (in this case into DMA buffer), and it is transmitted during next polling frame (after CS is activated).
  4. Master receives unchanged answer from slave.

Logs and console output
We have captured our problem on oscillograms.

Channels description:

  1. SPI CS active low
  2. SPI CLK
  3. SPI MISO
  4. This is extra signal, which was added for debug purposes. It is controlled by STM32 SPI driver (spi_ll_stm32.c) in transceive_dma function. It is set to 1 when calling LL_SPI_Enable(spi); and set to 0 when calling LL_SPI_EnableDMAReq_RX(spi); and LL_SPI_EnableDMAReq_TX(spi); Thanks to that signal we can observe what is happening between enabling SPI and DMA and also we can notice if DMA is enabled during active transmission.

Case 1: DMA is enabled during active transmission
Result: Data shifted to right by 1 byte - we expected 0001000000010010 and received 0000000000010000

enable_dma
Figure 1: Enabling DMA during active transmission

Case 2. We can observe strange data shift by 5 bits - it can be caused by enabling SPI before DMA

enable_spi_shift_by_5_bits
Figure 2: Whole transmission appear between enabling SPI and enabling DMA

shift_by_5_bits
Figure 3: Enabling DMA and receiving data shifted by 5 bits - we expected 0001000000010010 and received 0000000010000000

Additional context
SPI settings:
CONFIG_SPI=y
CONFIG_SPI_ASYNC=y
CONFIG_SPI_STM32=y
CONFIG_SPI_STM32_USE_HW_SS=y
CONFIG_SPI_STM32_INTERRUPT=n

@kborowski2 kborowski2 added the bug The issue is a bug, or the PR is fixing a bug label Feb 23, 2022
@erwango erwango self-assigned this Feb 23, 2022
@erwango erwango added area: SPI SPI bus platform: STM32 ST Micro STM32 priority: low Low impact/importance bug labels Feb 23, 2022
@erwango erwango assigned FRASTM and unassigned erwango Feb 23, 2022
@FRASTM
Copy link
Collaborator

FRASTM commented Mar 10, 2022

Could the cs control, change the tx timing during DMA transfer like on #43411

@kborowski2
Copy link
Author

Not really because spi_stm32_cs_control shouldn't do anything in Slave Mode

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label May 10, 2022
@r2r0
Copy link
Member

r2r0 commented May 10, 2022

@FRASTM any chance to go forward?

@erwango erwango removed the Stale label May 10, 2022
@FRASTM
Copy link
Collaborator

FRASTM commented May 10, 2022

I have a branch here https://github.com/FRASTM/zephyr/tree/spi_using_dma
with the following recommended order for the SPI communication sequence:
When starting communication using DMA, to prevent DMA channel management raising
error events, these steps must be followed in order:

1. Enable DMA Rx buffer in the RXDMAEN bit in the SPI_CR2 register, if DMA Rx is
used.

2. Enable DMA streams for Tx and Rx in DMA registers, if the streams are used.
3. Enable DMA Tx buffer in the TXDMAEN bit in the SPI_CR2 register, if DMA Tx is used.
4. Enable the SPI by setting the SPE bit.

@kborowski2, @r2r0 Could you try if it changes your transmission ?

@kborowski2
Copy link
Author

It still does not work as expected. I suspect that now we have a problem that transmission can be started when NSS is already active, which causes data corruption. Is there any hardware mechanism, which prevent that kind of behavior?

@FRASTM
Copy link
Collaborator

FRASTM commented May 19, 2022

Could we fall in one pb with the BSY flag as we can read in the ES0394. Especially in the ll_func_spi_dma_busy the BSY flag is monitored ANDed with the TXE.
Then a work around could be in the static inline uint32_t ll_func_spi_dma_busy(SPI_TypeDef *spi) function of the spi_ll_stm32.h
:

#if defined(CONFIG_SOC_SERIES_STM32WBX)
	/* this is a WA to the BSY flag high : ErrateSheet ES0394 */
	int64_t timeout_time = k_uptime_get() + STM32_SPI_TIMEOUT;
	while (LL_SPI_IsActiveFlag_BSY(spi) == 1) {
		if (k_uptime_get() > timeout_time) {
			LOG_ERR("SPI Busy flag Timeout!");
			return -EIO;
		}
	}
	return 1;
#else 
...

@erwango
Copy link
Member

erwango commented May 24, 2022

@kborowski2 Would you be able to provide more details (bus configuration: mode, polarity, ..)? Also any ready to use sample would help investigations

@kborowski3
Copy link

Still doesn't work as expected ->

  • I am enabling SPI after data are already loaded into DMA buffers
  • I modified ll_func_spi_dma_busy function as you said
  • I deleted extra gpio config for CS - now it is only defined in cs-gpios array

About spi configuration:

  • We have two devices defined on this particular SPI bus: one is the master device and the other one is the slave device. Those devices are not working at the same time. Either we use SPI in the master mode or in the slave mode on a separate CS gpio.
  • On the DT we are using the cs-gpios array to define those two CS gpios: cs-gpios = <&gpioa 4 0>, <&gpiob 2 0>;
  • The spi config that we are passing to the driver looks like this:
    struct spi_config spi_cfg; spi_cfg.operation = SPI_WORD_SET(8) | SPI_OP_MODE_SLAVE | SPI_TRANSFER_MSB; spi_cfg.frequency = 4000000;
    I suppose that we shouldn't define any frequency in the Slave mode but the driver does not allow to pass the empty value (maybe it should be considered as a separate bug).

The rest of the configuration remains with default values.

@FRASTM
Copy link
Collaborator

FRASTM commented May 25, 2022

Could you please share your zephyr application for the slave.
I guess any SPI master device (not necessarily zephyr) could play that role.

@erwango erwango added the Waiting for response Waiting for author's response label Jun 29, 2022
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Aug 29, 2022
@erwango erwango added this to the future milestone Aug 31, 2022
@kborowski3
Copy link

We have successfully solved our problem making some small changes in the zephyr source code. Additionally we had to add the extra feature, which is the software NSS control in the Slave mode. This allows us to stop the transmission if the master device changes NSS state. Thanks to that we are able to pass a big data buffer to the driver so we can receive either small or large data streams and we do not have to wait for a timeout to detect the end of the transmission. This was very ineffective in case of receiving small data streams.
Are you interested in that kind of feature and do you want us to share our changes?

@erwango
Copy link
Member

erwango commented Sep 19, 2022

We have successfully solved our problem

Thanks for the head's up !

Are you interested in that kind of feature and do you want us to share our changes?

For sure!

@erwango
Copy link
Member

erwango commented Sep 26, 2022

@kborowski3 can you tell if the issue you faced is similar to #50501 ?

@kborowski3
Copy link

For me using the interrupt mode wasn't working as well but I don't remember the exact problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: SPI SPI bus bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug Stale Waiting for response Waiting for author's response
Projects
None yet
Development

No branches or pull requests

6 participants