-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filesystem Write Fails with Some SD-Cards #52931
Comments
@clamattia Can you try running this test with |
Thanks for the quick response. for(int i=0;i<10;i++){
init_buffer();
zassert_true(write_file(), "failed to write file");
} and added some data variation: buffer[i]+=3*(1+i)+buffer[i]; The bug must be timing sensitive. If I enable the log output as you suggested or if I put a Debug Log for the
I was unable to reproduce it on the |
FYI: I have the same issue in production. Adding the debug log there made the bug go away. |
Ok, so it seems the card fails to return to a ready state. Can you try this patch? #53017 This change should poll the SD card for the full duration of |
Thanks @danieldegrasse for the quick patch. On a first glance it seems to improve but not completely fix it.
On a first glance it seems unrelated to |
I was able to reproduce the problem with the both "bad"-cards, even with the patches. Edit: If debug log is enabled both cards pass unfortunately. Probably due to timing reasons. |
Do you see this behavior even with the patch? Without the patch, |
It is hard to tell now. To increase the likelihood of the bug I removed all debug out. So it is hard to tell, when the timeout is starting and ending. I re-enabled some debug out to check the timing. But then unfortunately it does not happen anymore. But I think I have found what was causing the ignoring of the timeout earlier (unrelated) and believe we can assume, that the timeout is now respected. If you have an idea, how I could check this (without debug logging) let me know. Edit: Btw, I think the patch helped, because it feels like it has become more difficult to reproduce the bug but it is hard to say for sure. |
One way you can check the location of the error return would be to change the return code at the SD subsystem level. For example, I believe that your error is occurring on this line: https://github.com/nxp-zephyr/zephyr/blob/065b7ac02661d580b9300d4c1c1048874671f118/subsys/sd/sdmmc.c#L1421 (Note I've linked to code on the patched branch, as that is what I assume you're working from now). If you changed that return code to something obvious (say |
@danieldegrasse I can say confidently now that:
|
Just to be clear here, from what you are describing it sounds like the timeout is being respected, just that the card responds to a query after some time. In A few things that might help figure out the root cause here:
Also, I've added emulation of the retry behavior in v2.6.0 on this sha: https://github.com/nxp-zephyr/zephyr/tree/fdcf37a65a033d59081d8807f72866185d15ef59. Could you try this, and see if the error recovery logic can fix your issue? |
Thanks for the help. Yes I did not express myself well. At first I thought, that maybe the timeout was ignored. But in fact, as you point out, and as I later found out by figuring out the exact source of the problem, it is not the timeout causing the issue. The error code is -5 returned here https://github.com/nxp-zephyr/zephyr/blob/065b7ac02661d580b9300d4c1c1048874671f118/subsys/sd/sdmmc.c#L1358 (I will double check just to be sure) I will try the other suggestions today. Thank you. |
One more thing I observed, The "good" card supports UHS-I with speed class 3. One of the "bad" cards does only support UHS-I with speed class 1, while the other does not seem to support UHS at all (seems to only support regular speed with class 10). Not sure if relevant. |
I'm interested to know more about this failure. Does the test get stuck at the same location the error previously occurred at? If so, where did you place |
I put the
Do you have a suggestion how to find out? Or where I could add a |
Can adding a call to
I expect the hang is occurring in the error recovery function, so I'd start there: nxp-zephyr@fdcf37a#diff-c4b06563dc5d9e3210c473243a2cdfbd20800c1ab33dcf71ccbc44233ea83a8eR596 |
I went back to before the proposed fixes. Note, that this is not zephyr main but a fork based on Compare for example: https://github.com/nrfconnect/sdk-zephyr/commits/86893246053c50d34618f09ec6722cae8ba19472/drivers/sdhc and https://github.com/nrfconnect/sdk-zephyr/commits/86893246053c50d34618f09ec6722cae8ba19472/subsys/sd with https://github.com/nxp-zephyr/zephyr/commits/fdcf37a65a033d59081d8807f72866185d15ef59/drivers/sdhc and https://github.com/nxp-zephyr/zephyr/commits/065b7ac02661d580b9300d4c1c1048874671f118/subsys/sd/sdmmc.c Doing so, I can get the write error still. If I then add the Note on how I tested the your proposed fixes: I cherry picked them (and previous commits in that history) on our fork based on |
One more note. Before the proposed fix (and the other cherry-picks from that history), I can reproduce the bug with the DEBUG_LOG enabled:
With the proposed fix (and the other cherry-picks from that history) and the DEBUG_LOG enabled, the bug can not be reproduced. Only with debug log disabled. This suggests, that there might be two separate issues. One fixed by the proposal and a separate, timing-sensitive issue. |
Thanks for the clarification- there are two commits that might be useful since the branch you linked:
The |
Could you try applying these commits? The implementation of |
I went back to our fork of version With those I can not reproduce the bug under conditions it consistently happens without them. Great job and congratulations on finding this 👍 Feel free to close this issue, when appropriate. [ Out of curiosity, do you know why only some of the SD-cards are affected? Might it be related to the speed-class, such that some of them are always ready again, before zephyr can contact them, while the slower might not be ready? ] |
Absolutely, appreciate the prompt responses.
I believe that is the reason, yes. The changes present in the two commits you applied wait for the card to finish programming before checking its status, while the upstream SD implementation simply checks SD status directly after the write has completed. |
Update sdmmc framework to use sdmmc_wait_ready when accessing card in SPI mode. this will allow cards that do not return to ready to be polled for busy status until the SD data timeout expires Fixes #52931 Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Writing Data to a file on the SD-card fails with the version
*** Booting Zephyr OS build v3.1.99-ncs1-9-g85edda1e989c ***
of zephyr. The Failure depends on the card and on the data being written. It fails with the cardsLexar 300x 32GB microSD HC
andTranscend 4GB microSD HC
. It succeeds withDelkin Devices. Utility + 4GB microSD HC
. If the data is all zeros instead of initialized via the function provided below, the test always succeeds.See more details and failing unit-test on target below.
This is a regression and it worked fine with version 2.6.
The target board is a variant of nrf9160ns.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The test should pass independently of the data written to the file and of the SD card being used.
Impact
Show stopper. We are likely not able to update zephyr version without a workaround.
Logs and console output
Failing test result using
v3.1.99
andTranscend 4GB microSD HC
:Succeeding test result using
v3.1.99
andDelkin Devices. Utility + 4GB microSD HC
:Environment:
Furthermore
SYS_INIT
. We do it early in the boot process (high priority).:Let me know if more information is needed to reproduce.
The text was updated successfully, but these errors were encountered: