Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usdhc disk_usdhc_access_write busy fail #39942

Closed
lgl88911 opened this issue Oct 30, 2021 · 7 comments · Fixed by #40522
Closed

usdhc disk_usdhc_access_write busy fail #39942

lgl88911 opened this issue Oct 30, 2021 · 7 comments · Fixed by #40522
Assignees
Labels
area: Disk Access area: Drivers bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: low Low impact/importance bug

Comments

@lgl88911
Copy link
Collaborator

lgl88911 commented Oct 30, 2021

Describe the bug
disk_usdhc_access_write fail because CDIHB reports busy fail

To Reproduce
Test on rt1052, Zephyr V2.7.0 & V2.4.0

  1. Test Code
	FILE *fp = fopen("/SD:/1", "w+");

	shell_print(shell, "fopen %x", fp);
	if (fp == NULL) {
		return 0;
	}

	ret = fseek(fp, 0, SEEK_END);
	shell_print(shell, "fseek END %d %s", ret, ret?strerror(errno):"");

	ret = ftell(fp);
	shell_print(shell, "file length ftell %d %s", ret, ret?strerror(errno):"");

	ret = fseek(fp, 0, SEEK_SET);
	shell_print(shell, "fseek START %d %s", ret, ret?strerror(errno):"");

	memset(patten_data, 0x5a, PATTEN_LENGTH);
	ret = fwrite(patten_data, 1, PATTEN_LENGTH, fp);
	shell_print(shell, "fwrite %d %s", ret, (ret<=0)?strerror(errno):"");

	memset(patten_data, 0xa5, PATTEN_LENGTH);
	ret = fwrite(patten_data, 1, PATTEN_LENGTH, fp);  // -> Write fail
        shell_print(shell, "fwrite %d %s", ret, (ret<=0)?strerror(errno):"");
  1. conf
CONFIG_DISK_DRIVER_SDMMC=y

CONFIG_FILE_SYSTEM=y
CONFIG_FAT_FILESYSTEM_ELM=y

CONFIG_FS_FATFS_CODEPAGE=936
CONFIG_FS_FATFS_LFN=y
CONFIG_FS_FATFS_LFN_MODE_BSS=y
CONFIG_FS_FATFS_NUM_FILES=20
CONFIG_FS_FATFS_NUM_DIRS=20
  1. make & run
  2. fwrite will fail at the second time

Additional context
For unknown reasons, calling disk_usdhc_access_write continuously will check that the CDIHB field of the register PRES_STATE is 1 after the previous transfer is completed and before the next transfer. This phenomenon is more likely to occur. May be a SOC issue, and more experiments are needed to confirm.
Below is a temporary patch:
madmachineio@5034b84

@lgl88911 lgl88911 added bug The issue is a bug, or the PR is fixing a bug area: Drivers area: Disk Access labels Oct 30, 2021
@nashif nashif added platform: NXP NXP priority: low Low impact/importance bug labels Nov 2, 2021
@danieldegrasse
Copy link
Collaborator

@lgl88911 I've reproduced the bug on my end. The issue results from poor support for 1.8V SD card signalling in NXP's USDHC driver. This is likely the same root cause as #32289

The latest version of main has this pr applied, as a temporary fix. I tested your fix as well, and it will resolve the issue. However, since our SDK USDHC driver does not use a while loop to check the CDIHB field, just an if statement, I believe the root cause of the problem lies elsewhere. You're experiencing this issue on a mmswiftio board, correct? I'm testing on an RT1050 EVK, and can reproduce it on v2.7.0

@lgl88911
Copy link
Collaborator Author

lgl88911 commented Nov 5, 2021

@danieldegrasse

You're experiencing this issue on a mmswiftio board, correct?

Yes, I tested it on mm_swiftio and mm_feather,can reproduce it .

The latest version of main has this pr applied, as a temporary fix.

Do you mean that using no-1-8-v can solve this problem? Currently mm_swiftio is configured with no-1-8-v, but there are still problems.

However, since our SDK USDHC driver does not use a while loop to check the CDIHB field, just an if statement, I believe the root cause of the problem lies elsewhere.

I agree, I think this is more like hardware of SOC instability. I did some tracking: The following is a simplified process of calling code when something goes wrong

//First write
disk_usdhc_access_write
{
    usdhc_write_sector
    {
        usdhc_xfer
        {
            usdhc_adma_table_cfg
            usdhc_data_xfer_cfg
            usdhc_send_cmd
            usdhc_data_sync_xfer
            {
                usdhc_write_data_port_sync
                //Check CDIHB == 0
            }
        }
    }
}

//Second write,No usdhc cmd occurred between the last usdhc_write_data_port_sync
disk_usdhc_access_write
{
    usdhc_write_sector
    {
        usdhc_xfer
        {
             //Check CDIHB == 1, No action, change!!
            usdhc_adma_table_cfg
            usdhc_data_xfer_cfg
            //usdhc_data_xfer_cfg check CDIHB == 1, retrur busy
            usdhc_send_cmd
            usdhc_data_sync_xfer
            {
                usdhc_write_data_port_sync
            }
        }
    }
}

In addition, under mm_swiftio(with no-1-8-v), this issue appears randomly. Once it is written correctly twice in a row, there will be no more errors afterwards.

@danieldegrasse
Copy link
Collaborator

Do you mean that using no-1-8-v can solve this problem? Currently mm_swiftio is configured with no-1-8-v, but there are still problems.

This is interesting. I have not been able to reproduce the issue on an RT1050 EVK with no-1-8-v set. I've tested other fixes for this issue, and found that it seems to come down to timing errors. A couple of things that might be worth checking:

  • Does the issue occur on all SD cards, or only some?
  • Check if the SD card is a 1.8V model. You can do this by temporarily removing no-1-8-v, and then adding a debug statement like so around line 2398 of usdhc.c:
if (priv->config->no_1_8_v == false) {
	/* 1.8V support */
	if (cmd->response[0U] & SD_OCR_SWITCH_18_ACCEPT_FLAG) {
		priv->card_info.card_flags |= SDHC_1800MV_FLAG;
		LOG_INF("SD card supports 1.8V");
	}
}

@lgl88911
Copy link
Collaborator Author

lgl88911 commented Nov 6, 2021

@danieldegrasse I tested a total of 7 cards from 4 manufacturers, and one of them is a card that does not support 1.8V, but this problem will occur no matter which one.

I've tested other fixes for this issue, and found that it seems to come down to timing errors.

Does it refer to this #39114

@danieldegrasse
Copy link
Collaborator

Does it refer to this #39114

@lgl88911 It does, yes. I tested your temporary patch on an RT1050 when the no-1-8-v flag was removed from the DTS, and it appears that the driver will lock up in that while loop, and the bit will never clear. There are quite a few differences between NXP's SDK driver and the Zephyr implementation, so it's difficult to narrow down where the bug appears. It may be that the Zephyr driver does not properly perform clock tuning, while the SDK driver does, but I'm unsure what the root cause is at this point.

@danieldegrasse
Copy link
Collaborator

danieldegrasse commented Nov 19, 2021

@lgl88911 Could you test the PR I just linked (#40522 ) with a MadMachine board? I was able to reproduce the issue you encountered on an RT1064 EVK with multiple writes to a variety of sectors on the disk, and I suspect that the fix may also work on the MadMachine boards.

@lgl88911
Copy link
Collaborator Author

@danieldegrasse Thanks, I test #40522 on MadMachine board, It can solve this problem. I think you have found the root cause:

This is due to the USDHC driver neglecting to wait for the SD card to be idle before attempting to send it new commands

danieldegrasse added a commit to nxp-zephyr/zephyr that referenced this issue Dec 10, 2021
USDHC driver did not pass disk test (failed during multiple writes). Add
logic to wait for the SD card to be idle before reading new data, so
that the test will pass. Also, add logic to reject OOB reads and writes
to the disk.

Fixes zephyrproject-rtos#39942

Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
cfriedt pushed a commit that referenced this issue Dec 14, 2021
USDHC driver did not pass disk test (failed during multiple writes). Add
logic to wait for the SD card to be idle before reading new data, so
that the test will pass. Also, add logic to reject OOB reads and writes
to the disk.

Fixes #39942

Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Disk Access area: Drivers bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants