Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][TGL] failed to ready HDA controller gctl 0x0 (sof_probe_work failed) #2571

Closed
sathya-nujella opened this issue Nov 20, 2020 · 15 comments · Fixed by #2581
Closed

[BUG][TGL] failed to ready HDA controller gctl 0x0 (sof_probe_work failed) #2571

sathya-nujella opened this issue Nov 20, 2020 · 15 comments · Fixed by #2581
Labels
bug Something isn't working Chrome Related to Chrome integration HDA reset failure HD-Audio controller failed to reset P1 Blocker bugs or important features TGL Applies to Tiger Lake platform

Comments

@sathya-nujella
Copy link

DUT: TGL Chrome DUT with Max98373_RT5682 I2S
BIOS: Coreboot
Kernel: Google - Kernel 5.4
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-5.4

Issue:
Sound card creation failed.

TEST SCENARIO:
During reboot tests, randomly seeing this issue.
Able to reproduce the issue (no consistent method, just reboot some times catch this issue).

From logs:
1) error from dmesg:
[ 8.912119] sof-audio-pci 0000:00:1f.3: error: failed to ready HDA controller gctl 0x0
[ 8.922070] sof-audio-pci 0000:00:1f.3: error: get caps error
[ 8.929600] sof-audio-pci 0000:00:1f.3: display power disable
[ 8.929622] sof-audio-pci 0000:00:1f.3: error: failed to probe DSP -5
[ 8.937423] sof-audio-pci 0000:00:1f.3: error: sof_probe_work failed err: -5

  1. lspci -vvvv //1f.3 part
    00:1f.3 Multimedia audio controller: Intel Corporation Device a0c8 (rev 20)
    Subsystem: Intel Corporation Device 7270
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at 7ff48000 (64-bit, non-prefetchable) [size=16K]
    Region 4: Memory at 7fe00000 (64-bit, non-prefetchable) [size=1M]
    Capabilities: [50] Power Management version 3
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [80] Vendor Specific Information: Len=14 <?>
    Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
    Address: 0000000000000000 Data: 0000
    Kernel driver in use: sof-audio-pci
    Kernel modules: snd_hda_intel, snd_sof_pci
@sathya-nujella sathya-nujella added P1 Blocker bugs or important features TGL Applies to Tiger Lake platform labels Nov 20, 2020
@kv2019i kv2019i added bug Something isn't working Chrome Related to Chrome integration labels Nov 20, 2020
@kv2019i
Copy link
Collaborator

kv2019i commented Nov 20, 2020

There are a couple of past bugs with similar fingerprint:
#1399
#1556
#1735

Most have such low reproduce rates, that no exact patch could be pointed to for making the problem go away.

One concrete thing to try is this:
#1735 (comment)

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 20, 2020

Findings today:

  • with snd-hda-intel, bug is harder to trigger (on same system)
  • when bug occurs, unloading and reloading does not help
  • when bug occurs, unloading and reloading snd-hda-intel instead of SOF, does not help

Yet to be tested:

  • try to reproduce with force-nocodec mode
  • unloading driver always before reboot
  • increase delay to wait HDA to come from reset

@plbossart
Copy link
Member

Right, we faced this before and never reached a conclusion as to what cause the capability parsing to fail. I had a discussion with hardware folks and there was no smoking gun.

A work-around was added in upstream commit f09e9c7 ('ASoC: SOF: Intel: hda-ctrl: add reset cycle before parsing capabilities')

@sathya-nujella can you check if this was back-ported and try to revert it if yes?

@ranj063
Copy link
Collaborator

ranj063 commented Nov 20, 2020

Right, we faced this before and never reached a conclusion as to what cause the capability parsing to fail. I had a discussion with hardware folks and there was no smoking gun.

A work-around was added in upstream commit f09e9c7 ('ASoC: SOF: Intel: hda-ctrl: add reset cycle before parsing capabilities')

@sathya-nujella can you check if this was back-ported and try to revert it if yes?

@plbossart I already suggested this last night and it didnt help

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 20, 2020

More tests:

  • try to reproduce with force-nocodec mode -> still happens
  • increase delay to wait HDA to come from reset -> done, did not help

Ongoing:

  • unloading driver always before reboot -> under progress, @sathya-nujella running, no errors yet but very limited iterations

@plbossart
Copy link
Member

@ranj063 if the unloading driver before reboot solves the issue, wouldn't this point to a .shutdown() callback that's not doing the same thing as a .remove()?

@ranj063
Copy link
Collaborator

ranj063 commented Nov 20, 2020

@ranj063 if the unloading driver before reboot solves the issue, wouldn't this point to a .shutdown() callback that's not doing the same thing as a .remove()?

@plbossart I'd think so. I'll work with @sathya-nujella this morning to try it out.

@sathya-nujella
Copy link
Author

Thank you @ranj063 . In today's debug with Ranjani's support tried, warm reboot tests.
With her suggestion, found out that after reverting this SOF patch, issue not reproduced in 100-iterations.
As it is random issue, I will continue more tests. But clearly reverting that SOF patch shows improvement.

Patch reverted: thesofproject/sof@75561a9

SOF issue reported based on this: thesofproject/sof#3633

@mengdonglin
Copy link
Collaborator

Let's track the issue in thesofproject/sof#3633 atm

keyonjie added a commit to keyonjie/linux that referenced this issue Nov 25, 2020
Invoke hda_dsp_shutdown() as the .shutdown() callback, this will help to
perform needed operation on TGL platforms before shutting down or
rebooting the system.

Fixes: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
keyonjie added a commit to keyonjie/linux that referenced this issue Nov 25, 2020
Invoke hda_dsp_shutdown() as the .shutdown() callback, this will help to
perform needed operation on TGL platforms before shutting down or
rebooting the system.

Fixes: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
keyonjie added a commit to keyonjie/linux that referenced this issue Nov 26, 2020
Invoke hda_dsp_shutdown() as the .shutdown() callback, this will help to
perform needed operation on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@mengdonglin mengdonglin reopened this Nov 30, 2020
@mengdonglin
Copy link
Collaborator

Reopen since we also need fix in kernel driver

@keyonjie
Copy link

keyonjie commented Nov 30, 2020

We have a driver PR aim to address this: #2581

Unfortunately, it can't fix the issue thoroughly, there is still a tricky point to hit the issue: if we perform system reboot at the point that the WoV is switching the DSP from D0 to D0I3(e.g. at around 5 Seconds(+-0.5S) after the WoV is switched on), we can hit this within 10 iterations.

keyonjie added a commit to keyonjie/linux that referenced this issue Dec 2, 2020
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
keyonjie added a commit to keyonjie/linux that referenced this issue Dec 4, 2020
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@keyonjie
Copy link

the PR #2600 will fix this issue also. @ranj063 FYI.

keyonjie added a commit to keyonjie/linux that referenced this issue Dec 24, 2020
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
keyonjie added a commit to keyonjie/linux that referenced this issue Dec 25, 2020
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@keyonjie
Copy link

@sathya-nujella PR merged, can you check if we can close this?

@sathya-nujella
Copy link
Author

@sathya-nujella PR merged, can you check if we can close this?

Hi @keyonjie ,
Original reported issue was fixed.
But later, we also observed the same error prints in this scenario as we updated in email,
i.e. "wov tests via cmd line + reboot".

For this, as you and @ranj063 suggested, we ran tests with including linux PR's (both of them included): #2600 and #2581.
It passed 500+ cycles after including fixes from PR's: #2581 and #2600.
We will continue tests one more day and update on this.

@jairaj-arava
Copy link

We have verified the speaker and headset playback too with the above PR's and seems the functionality is good.

bardliao pushed a commit that referenced this issue Dec 29, 2020
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: #2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 5, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
kv2019i pushed a commit that referenced this issue Jan 8, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: #2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 13, 2021
Invoke hda_dsp_remove() as the .shutdown() callback. This will help to
perform shutdown of the DSP safely on TGL platforms before shutting down
or rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
sudipm-mukherjee pushed a commit to sudipm-mukherjee/linux-test that referenced this issue Jan 14, 2021
Invoke hda_dsp_remove() as the .shutdown() callback. This will help to
perform shutdown of the DSP safely on TGL platforms before shutting down
or rebooting the system.

BugLink: thesofproject/linux#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210113152617.4048541-4-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jan 15, 2021
Invoke hda_dsp_remove() as the .shutdown() callback, this will help to
perform shutdown the DSP safely on TGL platforms before shutting down or
rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
riverzhou pushed a commit to riverzhou/chromiumkernel that referenced this issue Feb 11, 2021
…back

Invoke hda_dsp_remove() as the .shutdown() callback. This will help to
perform shutdown of the DSP safely on TGL platforms before shutting down
or rebooting the system.

BugLink: thesofproject/linux#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210113152617.4048541-4-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 44a4cfa
 git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next)

BUG=b:175915544
TEST=verified the audio

Signed-off-by: Jairaj Arava<jairaj.arava@intel.com>
Signed-off-by: Sathyanarayana Nujella <sathyanarayana.nujella@intel.com>
Change-Id: Ifb3a8dd1c8982dff7766063071020b9d826e0d95
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2628182
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Curtis Malainey <cujomalainey@chromium.org>
Tested-by: Jairaj Arava <jairaj.arava@intel.corp-partner.google.com>
Commit-Queue: Curtis Malainey <cujomalainey@chromium.org>
(cherry picked from commit 3916179)
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2643522
Tested-by: Curtis Malainey <cujomalainey@chromium.org>
jackpot51 pushed a commit to pop-os/linux that referenced this issue Apr 13, 2021
BugLink: https://bugs.launchpad.net/bugs/1919930

Invoke hda_dsp_remove() as the .shutdown() callback. This will help to
perform shutdown of the DSP safely on TGL platforms before shutting down
or rebooting the system.

BugLink: thesofproject#2571
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210113152617.4048541-4-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 44a4cfa)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Timo Aaltonen <tjaalton@ubuntu.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
@mengdonglin mengdonglin added the HDA reset failure HD-Audio controller failed to reset label Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Chrome Related to Chrome integration HDA reset failure HD-Audio controller failed to reset P1 Blocker bugs or important features TGL Applies to Tiger Lake platform
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants