Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: spi: stm32h7: Use SPI FIFO #63173

Merged
merged 4 commits into from
Nov 30, 2023

Conversation

dgastonochoa
Copy link
Contributor

@dgastonochoa dgastonochoa commented Sep 27, 2023

This is a proposal to support different "packet sizes" in the SPI driver. In this context, a packet size bigger than one allows to write more than one frame to the SPI's TxFIFO (in case it has one) so that delays can be avoided:

  • It allows to remove delays between frames.
  • If interrupts are enabled, the same data will be sent using less interrupts, as each one will send several frames.

To ilustrate the first point, here are some captures of a transmission with the stm32h7 SPI current driver:

with_gaps

As you can see above, there are delays between each frame. This can dramatically impair performance when sending large amounts of data.

If a bigger packet size is used for the same transmission, the delays disappear:

without_gaps

EDIT

After the received feedback, this PR will no longer use the FIFO threshold. It now implements the use of the FIFO but without using modifying the FIFO threshold (so it defaults to 1). This still fixes the performance problem described above.

Test plan

Connect an STM32H7 board to the PC with pins D11 and D12 connected.

Open a minicom or equivalent terminal to read the test report.

Execute the following commands:

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_test_packet_sizes.loopback
west flash

And verify all tests pass.

EDIT

The tests above are no longer necessary as the use of the FIFO threshold has been discarded. Thus, run the already existing spi loopback tests and verify they pass:

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi tests/drivers/spi/spi_loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_dma.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames_dma.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_dma_no_nocache.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames_dma_no_nocache.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_interrupt.loopback
west flash




rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi tests/drivers/spi/spi_loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_dma.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames_dma.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_dma_no_nocache.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h743zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_16bits_frames_dma_no_nocache.loopback
west flash

rm -rf ~/zephyrproject/zephyr/build
west build -p auto -b nucleo_h753zi -T tests/drivers/spi/spi_loopback/drivers.spi.stm32_spi_interrupt.loopback
west flash

EDIT 2

This PR has also been tested with 2 separated nucleo boards (H743), one acting as SPI master and the other as SPI slave.

@zephyrbot zephyrbot added area: SPI SPI bus platform: STM32 ST Micro STM32 labels Sep 27, 2023
@dgastonochoa dgastonochoa changed the title stm32h7: drivers: spi: Support different packet sizes in SPI drivers: spi: stm32h7: Support different packet sizes in SPI Sep 27, 2023
@dgastonochoa dgastonochoa force-pushed the stm32-spi-fifo-2 branch 3 times, most recently from 99aecd1 to 344be08 Compare September 27, 2023 17:15
Copy link
Collaborator

@tbursztyka tbursztyka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fifo handling needs more thoughts I think

static void spi_stm32_send_next_frame(SPI_TypeDef *spi,
struct spi_stm32_data *data)
{
const uint8_t frame_size = SPI_WORD_SIZE_GET(data->ctx.config->operation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it would be better to store the frame size in struct spi_stm32_data directly, so getting it would be faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cleaner and easier to retrieve. However, the same information would be replicated in two places, with the inconsistency problems that could carry. I don't have a strong opinion on this so I'm happy to add a new attribute to the struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More pointer dereference might increase the risk of cache miss. And since in this case you are only reaching for a small info, so better storing it in data right away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken a look at this. There are other places in which ctx.config is accessed; to be consistent, all these references should be modified. This is out of this PR's scope as I conceived it.

LL_SPI_TransmitData16(spi, tx_frame);
spi_context_update_tx(&data->ctx, 2, 1);
break;
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need that. Once you configured your controller, frame size will not change.
Actually, you could just have an if/else since you only handle 2 frame size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need that. Once you configured your controller, frame size will not change.

You mean the switch statement? I replaced the if ... else ... by switch because it leaves the code in a better position to support more frame sizes in the future (H7 has quite a lot of them). However, I don't mind reverting this back to if ... else ....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frame size can be of below a byte or above, but less than 2 bytes etc... (I.E.: 4, 6, ... 10, 12...). But in the end you handle everything on byte-basis. So even if it's below a byte: you feed one byte (the controller will only take the relevant n bits representing the frame). Above a byte but below 2bytes: you feed 2 bytes. etc...

Stm32's HAL does not seem to provide anything but LL_SPI_TransmitData<8/16/32>.

if you enable sizes above 16bits, then perhaps keep the switch. if/else if/else is less clean than a switch in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will revert to if else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

include/zephyr/drivers/spi.h Outdated Show resolved Hide resolved
drivers/spi/spi_ll_stm32.c Outdated Show resolved Hide resolved
drivers/spi/spi_ll_stm32.c Outdated Show resolved Hide resolved
drivers/spi/spi_ll_stm32.c Outdated Show resolved Hide resolved
uint8_t packet_size = SPI_PACKET_SIZE_GET(data->ctx.config->operation);

for (int i = 0; i < packet_size; i++) {
spi_stm32_send_next_frame(spi, data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how stm32's SPI controllers work, generally fifo interrupt threshold is meant to make sure that the controller is not going to starve of frames to send (if there are way more to be sent, that is). Which means that between the time you get the interrupt and handle it, the controller may have had time to continue sending frames left in the fifo, which in return, means that it went over the threshold so you'll need to feed more frames than the configured threshold. It can also be true in receiving mode in a RX only transaction type where the controller can send the dummy bytes by itself (I don't kmow if that applies to this controller though).

Copy link
Contributor Author

@dgastonochoa dgastonochoa Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how stm32's SPI controllers work, generally fifo interrupt threshold is meant to make sure that the controller is not going to starve of frames to send (if there are way more to be sent, that is).

That's my understanding as well, yes.

[...] you'll need to feed more frames than the configured threshold.

If I understood you correctly, you are suggesting replacing this:

	if (ll_func_tx_is_not_full(spi)) {
		spi_stm32_send_next_packet(spi, data);
	}

by this:

	while (ll_func_tx_is_not_full(spi)) {
		spi_stm32_send_next_packet(spi, data);
	}

If this is done, the interrupt handler takes more time to finish; this might be undesirable. However, as you mention, there are less chances of "starving" the FIFO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, make it a while loop. The little extra time spent doing so, is less of an issue for the system than handling more interrupts due to a threshold that could hit more often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a while loop here caused several problems produced by the fact that SPI supports transmissions in which tx is bigger than rx or vice-versa. The current implementation takes care of this, defaulting to the classic way of transmitting in case of rx > tx (which causes more problems that require further modifications).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this and found a clean way to support the rx > tx case, so this is no longer a problem.

@dgastonochoa
Copy link
Contributor Author

Kind reminder to review this @tbursztyka. I agree with several of your comments, but others require a further discussion.

@nordicjm nordicjm added the dev-review To be discussed in dev-review meeting label Oct 23, 2023
@MaureenHelm
Copy link
Member

Kind reminder to review this @tbursztyka. I agree with several of your comments, but others require a further discussion.

@dgastonochoa @tbursztyka @teburd can you attend the dev meeting this week?

@dgastonochoa
Copy link
Contributor Author

dgastonochoa commented Oct 26, 2023

@MaureenHelm I won't be able to attend to the dev meeting today, but I will the next week. Maybe you want to consider postponing the review of this PR until then if it's more convenient for you, @tbursztyka and @teburd.

@MaureenHelm
Copy link
Member

@MaureenHelm I won't be able to attend to the dev meeting today, but I will the next week. Maybe you want to consider postponing the review of this PR until then if it's more convenient for you, @tbursztyka and @teburd.

No problem, I'll push it out to next week.

@MaureenHelm MaureenHelm removed the dev-review To be discussed in dev-review meeting label Nov 8, 2023
@dgastonochoa
Copy link
Contributor Author

@teburd @tbursztyka could you please take a look? The current proposal avoids the mentioned conflictive points.

@dgastonochoa dgastonochoa force-pushed the stm32-spi-fifo-2 branch 3 times, most recently from 3ed1e09 to 27f3188 Compare November 16, 2023 13:11
@dgastonochoa dgastonochoa changed the title drivers: spi: stm32h7: Support different packet sizes in SPI drivers: spi: stm32h7: Use SPI FIFO Nov 16, 2023
Avoind calling startMasterTransfer multiple times in a
transaction by moving it to the transceive() function.

Signed-off-by: Daniel Gaston Ochoa <dgastonochoa@gmail.com>
In H7, TXP indicates when its FIFO has room for, at least, one
packet. Thus, rename ll_func_tx_is_empty as ll_func_tx_is_not_full,
to be consistent in all platforms.

Signed-off-by: Daniel Gaston Ochoa <dgastonochoa@gmail.com>
Simplify and clarify spi_stm32_shift_m by splitting it in
3 smaller functions with clear names.

Signed-off-by: Daniel Gaston Ochoa <dgastonochoa@gmail.com>
Use H7 SPI FIFO to improve performance.

Signed-off-by: Daniel Gaston Ochoa <dgastonochoa@gmail.com>
@dgastonochoa
Copy link
Contributor Author

All feedback has been applied, @teburd and @tbursztyka please could you take a look?

@dgastonochoa
Copy link
Contributor Author

@teburd, @tbursztyka please could you confirm whether or not you intent to review this at some point?

@henrikbrixandersen henrikbrixandersen dismissed tbursztyka’s stale review November 29, 2023 11:43

@tbursztyka I am dismissing your stale review here as it appears all your comments have been addressed. If not, please re-review and re-request changes.

@carlescufi carlescufi merged commit 50f64ea into zephyrproject-rtos:main Nov 30, 2023
16 checks passed
@dgastonochoa dgastonochoa deleted the stm32-spi-fifo-2 branch December 4, 2023 15:38
@GeorgeCGV
Copy link
Collaborator

GeorgeCGV commented Dec 5, 2023

@dgastonochoa with CONFIG_NO_OPTIMIZATIONS everything seems to work.

However, with CONFIG_DEBUG_OPTIMIZATIONS some SPI devices stop working on H730 (including #57848).
Stuck within spi_stm32_complete in spi_stm32_isr as SPI is kept busy: CR2 and SR are 0. Disabling fifo by returning false from spi_stm32_can_use_fifo solves the problem for the optimized build.

Any ideas what it can be?

@dgastonochoa
Copy link
Contributor Author

dgastonochoa commented Dec 5, 2023

Hi @GeorgeCGV, I have run the spi_loopback tests again with interrupts enable in with CONFIG_DEBUG_OPTIMIZATIONS; they pass (on H743).

I imagine you are seeing this error not in the loopback tests, but in some other scenario, right? Would it be possible to run the loopback tests on H730 to see if the problem is reproduced there?

Does the problem reproduce without enabling SPI interrupts?

When you say "SPI is kept busy", do you mean that ll_func_spi_is_busy keeps returning true? Would it be possible to check if spi_context_tx_on returns false (all bytes have been sent) but spi_context_rx_on returns true (not all bytes have been read)?

@GeorgeCGV
Copy link
Collaborator

GeorgeCGV commented Dec 6, 2023

@dgastonochoa

Does the problem reproduce without enabling SPI interrupts?

With disabled interrupts it is not getting into a freezing state. But I am observing overruns and some devices fail.

When you say "SPI is kept busy", do you mean that ll_func_spi_is_busy keeps returning true? Would it be possible to check if spi_context_tx_on returns false (all bytes have been sent) but spi_context_rx_on returns true (not all bytes have been read)?

Yes, it gets stuck in spi_stm32_complete after data reception. The CR2 and SR registers are 0. Therefore, ll_func_spi_is_busy returns true.

  • spi_context_tx_on - the context tx_len is 0.
  • spi_context_rx_on - the context rx_len is 0.

Therefore, both return false. The SPI devices fail not on every boot, but sometimes. Feels like it is not stable. Would need to do signal capture for details...

Would it be possible to run the loopback tests on H730 to see if the problem is reproduced there?

Would need to do some HW modifications.

@dgastonochoa
Copy link
Contributor Author

dgastonochoa commented Dec 6, 2023

@GeorgeCGV

Does the problem reproduce without enabling SPI interrupts?

With disabled interrupts it is not getting into a freezing state. But I am observing overruns and some devices fail.

When you say "SPI is kept busy", do you mean that ll_func_spi_is_busy keeps returning true? Would it be possible to check if spi_context_tx_on returns false (all bytes have been sent) but spi_context_rx_on returns true (not all bytes have been read)?

Yes, it gets stuck in spi_stm32_complete after data reception. The CR2 and SR registers are 0. Therefore, ll_func_spi_is_busy returns true.

* `spi_context_tx_on` - the context `tx_len` is `0`.

* `spi_context_rx_on` - the context `rx_len` is `0`.

This seems to indicate that the SPI driver has send/received all the expected bytes, as the tx/rx_on functions return 0. However, it seems that either TXC is not set (there is activity in the bus), or it is set and cleared before spi_stm32_complete is called, so the driver ends up in an endless loop.

Would it be possible to confirm the activity on the bus when the error occurs?

Also, is it possible to double check that is TXC the flag that's being checked? (ll_func_spi_is_busy can check TXC, EOT or BSY depending on circumstances)

@GeorgeCGV
Copy link
Collaborator

GeorgeCGV commented Dec 8, 2023

Also, is it possible to double check that is TXC the flag that's being checked? (ll_func_spi_is_busy can check TXC, EOT or BSY depending on circumstances)

#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
	if (LL_SPI_GetTransferSize(spi) == 0) {
		return LL_SPI_IsActiveFlag_TXC(spi) == 0;
	} else {
		return LL_SPI_IsActiveFlag_EOT(spi) == 0;
	}
...

it is H7, so we go and check the transfer size. The CR2 is 0, obviously the SPI_CR2_TSIZE is 0 as well.
Now we go into LL_SPI_IsActiveFlag_TXC, SR is 0x2 in one case or 0x0 in the other. The TXC bit is 0.
Because TXC is 0, the ll_func_spi_is_busy returns true.

Different SPI peripherals fail at different times. To trigger the issue I have to power cycle the board multiple times. When it gets stuck it is always within spi_stm32_complete in spi isr. The TSIZE, TXC, and context rx/tx count, len are 0 as well as rx/tx buffers are set to 0x0.

What is also observed is the corruption of transmitted and/or received data. That doesn't happen every time... but often enough.

Would it be possible to confirm the activity on the bus when the error occurs?

Nothing happens on the bus when it stucks:

stuck_state

Maybe it is related somehow to having multiple slave devices...

@dgastonochoa
Copy link
Contributor Author

dgastonochoa commented Dec 8, 2023

In that picture you sent there is a lot of cross-talk to CS, is it sure that that is caused bu the logic analyzer probes? Could you try to reduce the SPI SCK frequency? Could you give more details about your setup? (SCK frequency, is this being tested on a stand-alone nucleo board etc.)

[...] SR is 0x2 in one case or 0x0 in the other.

This doesn't sound good either. SR = 0x00 means TxFIFO has no room for more frames. That should never happen if all the frames have been sent; this flag can only be set and clear by HW, so I can't see how this in particular can be caused by the driver, provided it is in an infinite loop in stm32_spi_complete and so cannot be writing bytes to the TxFIFO.

@GeorgeCGV
Copy link
Collaborator

GeorgeCGV commented Dec 8, 2023

@dgastonochoa it is a zoomed out screen. The clocks are validated and set to be within supported range. It is a custom device based on H730. Multiple SPIs are used where each has multiple targets (slaves).

After reducing the clock the issue started to appear every time. With higher clock the issue required board restart to trigger it. Sized down connected zoo to one device per SPI (to eliminate possible issue(s) from having multiple devices).

Zephyr version is pretty much up to date: f69641f7d204864aa26f8bdd9fecab259e535da2.

SPI without FIFO, the device works:
fifo_off_works

SPI with FIFO, the device doesn't work and it is stuck as described above (within ll_func_spi_is_busy, SR=0):
fifo_on_fail

It is clearly visible that the CS (chip select) is dramatically off..., It is software controlled by the driver. Most probably that is where the problems are coming from.

@erwango
Copy link
Member

erwango commented Dec 8, 2023

@GeorgeCGV please raise an issue to follow this up and make it visible

@avisconti
Copy link
Collaborator

@dgastonochoa I opened an issue (#66486) that looks different from #66326, but I'm not sure. @erwango just closed it, but I'm kindly ask you to take a look to it and see if it might be the same. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: SPI SPI bus platform: STM32 ST Micro STM32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants