-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SoundWire] Invalid register access for RT1308 #3924
Comments
Could be a hardware issue. I see an 'enumeration_complete' and 'initialization_complete' only for one of the two amplifiers
and all we get on link2 is:
So I would think the device never showed up on the bus. That said if that was the case there shouldn't be any register access. That's likely a corner case we didn't see before. Oh well, just what we needed, huh? |
@shumingfan any idea what this c070 register is? it's part of the regmap apparently: regmap_write(rt1308->regmap, 0xc070, 0x00); |
Last known good daily run: 16107 Start Time: 2022-10-11 21:36:57 UTC Kernel Branch: topic/sof-dev |
I don't see any PRs merged in the last few days that could explain this error. Can we run a test with plain vanilla 6.0 kernel on this device to see how this goes? |
@plbossart I checked the kernel log. I think the system wants to restore the control settings. The "RX Channel Select" control will access the 0xc070 register. |
This is a hardware issue. I've tried yesterday's daily kernel on this device and it can be still reproduced, and after a long power-off, this issue disappeared. Anyway, it would be better if the system can restore the settings automatically. |
The driver missed the default value of register 0xc070. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com>
@miRoox @plbossart |
@shumingfan thanks, but sorry I don't know the way to reproduce it after the register got recovered, so I can only check the PR won't introduce other issues, but it's hard to say whether it really fixes this issue. |
Humm, what would adding the default value change @shumingfan? If userspace wants to restore a non-default value, then there will be an access generated. I am wondering if we don't have a conceptual issue here. On boot, the card is registered when all required components are registered. That registration is done when we probe a driver, and that is done based on ACPI information. So it can very well happen that if the physical device is late in its enumeration we end-up exposing a non-functional device. Moving the component registration after the device initialization would not help. The deferred_probe mechanism only works when there's a new probe. if we did a driver probe when the physical device enumerates, that would create other issues, because we would need to wait for the probe to be finished to be able to call the initialization callback provided precisely by the driver. That gives me a migraine. @bardliao thoughts? |
@plbossart Adding the default value can ensure that driver will not try to access the register and can get the cached value properly when the regmap is in cache only mode. |
That sounds like a good move @bardliao, we do need to prevent access to the registers until the initialization completes. The suspend-resume case is good but it doesn't cover the boot case. |
The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com>
The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com>
The SoundWire codec drivers are probed based on ACPI/DT information. During the SoundWire probe, the ASoC component for each device is registered. This is enough for the card to be created. However if the physical SoundWire device becomes attached with a delay, it is possible that userspace tries to restore ALSA control settings. Those transactions would result either in a timeout or a paging error. This is not a normal behavior, but we can initialize the regmap to cache_only so that values are only restored when the device actually becomes ATTACHED. This should take care of transient issues. Note that if the hardware is truly broken we are missing a mechanism to signal that the codec and some PCM devices exposed by the card are not functional. Moving the component registration to the time when the device becomes attached would not be compatible with the deferred probe mechanism which is only triggered when new Linux devices are created. Additional helpers would be needed to nudge the deferred probe mechanism into re-evaluating dependencies. BugLink: thesofproject#3924 Suggested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
This issue happened again in our daily test. Test ID:16577 |
@shumingfan I just tested #3927 because this issue happened again, but looks like the issue is still there even with #3927. |
The SoundWire codec drivers are probed based on ACPI/DT information. During the SoundWire probe, the ASoC component for each device is registered. This is enough for the card to be created. However if the physical SoundWire device becomes attached with a delay, it is possible that userspace tries to restore ALSA control settings. Those transactions would result either in a timeout or a paging error. This is not a normal behavior, but we can initialize the regmap to cache_only so that values are only restored when the device actually becomes ATTACHED. This should take care of transient issues. Note that if the hardware is truly broken we are missing a mechanism to signal that the codec and some PCM devices exposed by the card are not functional. Moving the component registration to the time when the device becomes attached would not be compatible with the deferred probe mechanism which is only triggered when new Linux devices are created. Additional helpers would be needed to nudge the deferred probe mechanism into re-evaluating dependencies. BugLink: thesofproject#3924 Suggested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Another instance of the issue, just for the record: Also this causes failures in a handful of the following tests. |
The SoundWire codec drivers are probed based on ACPI/DT information. During the SoundWire probe, the ASoC component for each device is registered. This is enough for the card to be created. However if the physical SoundWire device becomes attached with a delay, it is possible that userspace tries to restore ALSA control settings. Those transactions would result either in a timeout or a paging error. This is not a normal behavior, but we can initialize the regmap to cache_only so that values are only restored when the device actually becomes ATTACHED. This should take care of transient issues. Note that if the hardware is truly broken we are missing a mechanism to signal that the codec and some PCM devices exposed by the card are not functional. Moving the component registration to the time when the device becomes attached would not be compatible with the deferred probe mechanism which is only triggered when new Linux devices are created. Additional helpers would be needed to nudge the deferred probe mechanism into re-evaluating dependencies. BugLink: thesofproject#3924 Suggested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Still happening in today's daily 17437?model=TGLH_SKU0A5E_SDW_ZEPHYR&testcase=verify-kernel-boot-log Draft PR #3941 not merged. I spotted some BIOS error too, dunno whether it's related
|
fwiw, I have a Dell Latitude 9520 (SKU: 0A3E) which have both rt1308, rt715 and rt711 (loads sof-tgl-rt715-rt711-rt1308-mono.tplg).
or
or the ones mentioned in this issue. |
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Source: Kernel.org MR: 123931 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: bb3edbd09287e515fe289fbedbffed5661a0ebda Description: [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
Source: Kernel.org MR: 123931 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: bb3edbd09287e515fe289fbedbffed5661a0ebda Description: [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
Source: Kernel.org MR: 123931 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: bb3edbd09287e515fe289fbedbffed5661a0ebda Description: [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
[ Upstream commit 75d8b1662ca5c20cf8365575222abaef18ff1f50 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: randomhydrosol <randomhydrosol@glassrom.org>
[ Upstream commit 75d8b1662ca5c20cf8365575222abaef18ff1f50 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: randomhydrosol <randomhydrosol@glassrom.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 7779efbb99bf4e97734d30851f419dde2e88ac39) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
@marc-hb do you know if we still test this platform? It's been a while and I've lost track of whether this was a hardware issue or something we need to look into. |
I don't know, @keqiaozhang , @greg-intel ? |
Yes, the daily test uses TGLH_SKU0A5E_SDW_ZEPHYR |
On the last three daily tests for TGLH_SKU0A5E_SDW_ZEPHYR, two were not run (not available) and one passed. This makes it hard to figure out if this issue is still present. |
@plbossart This issue is not present in CI. I did a cold reboot recently(AC off, wait about 1~2 mins for the device to discharge fully, then power on). This issue has occurred several times before, and we all use the same method to restore it. |
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Author: Shuming Fan <shumingf@realtek.com> Date: Wed Oct 19 17:57:15 2022 +0800 Signed-off-by: Jaroslav Kysela <jkysela@redhat.com> (cherry picked from commit 75d8b1662ca5c20cf8365575222abaef18ff1f50) Bugzilla: https://bugzilla.redhat.com/2125540
BugLink: https://bugs.launchpad.net/bugs/2003122 [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
[ Upstream commit 75d8b1662ca5c20cf8365575222abaef18ff1f50 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d8b1662ca5c20cf8365575222abaef18ff1f50 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
stable inclusion from stable-v5.10.156 commit bb3edbd09287e515fe289fbedbffed5661a0ebda category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7MCG1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bb3edbd09287e515fe289fbedbffed5661a0ebda -------------------------------- [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>
stable inclusion from stable-v5.10.156 commit bb3edbd09287e515fe289fbedbffed5661a0ebda category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7MCG1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bb3edbd09287e515fe289fbedbffed5661a0ebda -------------------------------- [ Upstream commit 75d8b16 ] The driver missed the default value of register 0xc070/0xc360. This patch adds that default value to avoid invalid register access when the device doesn't be enumerated yet. BugLink: thesofproject/linux#3924 Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20221019095715.31082-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> (cherry picked from commit 6cec85c)
Internal Intel test result/planresultdetail/16234?model=TGLH_SKU0A5E_SDW_ZEPHYR&testcase=verify-kernel-boot-log
Dmesg:
dmesg.txt
@bardliao FYI
The text was updated successfully, but these errors were encountered: