Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: HDA: power down all DSP cores at suspend and remove #2873

Closed
wants to merge 1 commit into from

Conversation

keyonjie
Copy link

Although the snd_sof_dsp_core_power_up/down will actually take effect on
the host managed DSP cores on Intel platforms, they are actually
designed for all DSP cores, and the enabled_cores_mask will update the
mask of those non host managed cores also.

In either hda_suspend() or hda_dsp_remove(), we should make sure all DSP
cores are power off and the enabled_cores_mask is reset to 0, so we
should call snd_sof_dsp_core_power_down() with mask of all cores, not
only the host_managed_cores_mask one.

Otherwise, those non host managed cores (e.g. core 1 on TigerLake) will
be preserved as 'enabled' in the enabled_cores_mask, in case those DSP
cores are 'ON' before _suspend() or_remove(). This misalignment will
lead to failure when we try to set up widgets or pipelines which rely
on those cores after resumed, as no _CORE_ENABLE IPC will be sent in
sof_core_enable() with enabled_cores_mask already cover it.

Signed-off-by: Keyon Jie yang.jie@linux.intel.com

Although the snd_sof_dsp_core_power_up/down will actually take effect on
the host managed DSP cores on Intel platforms, they are actually
designed for all DSP cores, and the enabled_cores_mask will update the
mask of those non host managed cores also.

In either hda_suspend() or hda_dsp_remove(), we should make sure all DSP
cores are power off and the enabled_cores_mask is reset to 0, so we
should call snd_sof_dsp_core_power_down() with mask of all cores, not
only the host_managed_cores_mask one.

Otherwise, those non host managed cores (e.g. core 1 on TigerLake) will
be preserved as 'enabled' in the enabled_cores_mask, in case those DSP
cores are 'ON' before _suspend() or_remove(). This misalignment will
lead to failure when we try to set up widgets or pipelines which rely
on those cores after resumed, as no _CORE_ENABLE IPC will be sent in
sof_core_enable() with enabled_cores_mask already cover it.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@keyonjie keyonjie changed the title ASoC: SOF: topology: power down all DSP cores at suspend and remove ASoC: SOF: HDA: power down all DSP cores at suspend and remove Apr 27, 2021
@mengdonglin mengdonglin added the multicore Features and Issues related to multiple DSP core support label Apr 27, 2021
@keyonjie keyonjie changed the title ASoC: SOF: HDA: power down all DSP cores at suspend and remove fixup! ASoC: SOF: HDA: power down all DSP cores at suspend and remove Apr 27, 2021
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keyonjie ! I'd polish the commit message a bit, but fix itself looks good. This is clearly a bug in hda-dsp.c.

/* power down DSP */
ret = snd_sof_dsp_core_power_down(sdev, chip->host_managed_cores_mask);
/* power down all DSP cores */
ret = snd_sof_dsp_core_power_down(sdev, GENMASK(chip->cores_num - 1, 0));
Copy link
Collaborator

@kv2019i kv2019i Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably warrants more cleanup (update: i.e. PR2850 is really needed) -- we've had multiple issues stemming from unclarity on the split between DSP ops and core. And this seems one more in this category.

But this specific fix is good. I'd reword the first paragraph of the commit a bit:
""The 'host_managed_cores' field is a Intel HDA specific concept and it should not be used when calling generic SOF core functions like snd_sof_dsp_core_power_down. By mixing the two, the SOF core view of the DSP core power status can gets out of sync.""

I'm still a bit puzzled why the core_power_down call need to be in Intel specific suspend call. Isn't this something that should be common to all SOF implementations? I see we only enable cores in topology.c -- this is a bit weird, right. @lyakh @ranj063 ? (update: this might be covered in PR2850).

I wouldn't block this fix though. Given current interface split between core and the DSP ops, this fix seems both correct and required.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing here indeed I find. E.g. hda_dsp_core_reset_power_down() will only power down host-managed cores indeed. So, the comment "power down all cores" really isn't true. At least we should explain here in the comment, that we only have to update the enabled cores' mask. Otherwise I'm not sure we really need to send an additional IPC to power down those cores, not managed by the host. Isn't SOF_IPC_PM_CTX_SAVE a sufficient indication to the firmware, that we're about to shut it down?

/* power down DSP */
ret = snd_sof_dsp_core_power_down(sdev, chip->host_managed_cores_mask);
/* power down all DSP cores */
ret = snd_sof_dsp_core_power_down(sdev, GENMASK(chip->cores_num - 1, 0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch looks good. But why is it a fixup? I think the patch is already upstream isnt it? Did you mean to add a Fixes tag instead for this patch?
commit 5607b18
Author: Bard Liao yung-chuan.liao@linux.intel.com
Date: Mon Oct 19 13:27:33 2020 +0800

ASoC: SOF: Intel: hda: use snd_sof_dsp_core_power_up/down API

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch looks good. But why is it a fixup? I think the patch is already upstream isnt it? Did you mean to add a Fixes tag instead for this patch?
commit 5607b18
Author: Bard Liao yung-chuan.liao@linux.intel.com
Date: Mon Oct 19 13:27:33 2020 +0800

ASoC: SOF: Intel: hda: use snd_sof_dsp_core_power_up/down API

Oh, thanks, didn't realize that, let me change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't bother @keyonjie this PR isn't going to be merged as is, it's fundamentally wrong on multiple levels.

/* power down DSP */
ret = snd_sof_dsp_core_power_down(sdev, chip->host_managed_cores_mask);
/* power down all DSP cores */
ret = snd_sof_dsp_core_power_down(sdev, GENMASK(chip->cores_num - 1, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this doesn't look good to me.

On TGL, the driver cannot power up/down core1..3. Only Core0 can be powered up, and the other cores are powered using Core0 as a proxy through IPC. Setting the mask here is just not right.

And furthermore, if you look at the code of hda_dsp_core_reset_power_down(), there is a mask with no other than the host_managed_cores_mask, which on TGL is

const struct sof_intel_dsp_desc tgl_chip_info = {
	/* Tigerlake , Alderlake */
	.cores_num = 4,
	.init_core_mask = 1,
	.host_managed_cores_mask = BIT(0),
int hda_dsp_core_reset_power_down(struct snd_sof_dev *sdev,
				  unsigned int core_mask)
{
	struct sof_intel_hda_dev *hda = sdev->pdata->hw_pdata;
	const struct sof_intel_dsp_desc *chip = hda->desc;
	int ret;

	/* restrict core_mask to host managed cores mask */
	core_mask &= chip->host_managed_cores_mask;

The question is thus: what does this patch actually do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart Please see my clarfified commit message proposal. "host managed cores" is an Intel SOF DSP adaptation concept that SOF core does not know about. So if you use SOF core functions, you have to power down all cores and not apply the host managed cores mask.

Current code applies the mask and the state of cores1...3 on TGL will be not uptodate in SOF core and when a new pipelne is later sent requireding cores1...3, the IPC is not sent as SOF core (incorrectly) thinks cores1...3 are already enabled.

So I'd say this patch is correct. It does reveal a conceptual issue in how we track core power state in SOF core, but I'd say that's a separate issue. Masking the cores this way in hda-dsp.c is not right.

@ranj063
Copy link
Collaborator

ranj063 commented Apr 28, 2021

@keyonjie I disucssed this with @plbossart and I think whats missing is that this PR only updates the enabled_cores_mask and doesnt actually power down the secondary enabled cores. So I think we should first send the IPC power down those cores before powering down the host_managed_cores.

@keyonjie
Copy link
Author

keyonjie commented Apr 28, 2021

@keyonjie I disucssed this with @plbossart and I think whats missing is that this PR only updates the enabled_cores_mask and doesnt actually power down the secondary enabled cores. So I think we should first send the IPC power down those cores before powering down the host_managed_cores.

exactly, if you like that decent cleanup and fix, we actually should not call snd_sof_dsp_core_power_up/down at all, we should go with _get/put() helpers in #2850 .

And here is the simple version, it could help e.g. for TGL Chrome kernel multi-core scenarios without dynamic pipeline support, in case they are in post-PV stage and don't want to take the risk of back-porting big series.

@keyonjie keyonjie changed the title fixup! ASoC: SOF: HDA: power down all DSP cores at suspend and remove ASoC: SOF: HDA: power down all DSP cores at suspend and remove Apr 28, 2021
@ranj063
Copy link
Collaborator

ranj063 commented Apr 28, 2021

@keyonjie I disucssed this with @plbossart and I think whats missing is that this PR only updates the enabled_cores_mask and doesnt actually power down the secondary enabled cores. So I think we should first send the IPC power down those cores before powering down the host_managed_cores.

exactly, if you like that decent cleanup and fix, we actually should not call snd_sof_dsp_core_power_up/down at all, we should go with _get/put() helpers in #2850 .

And here is the simple version, it could help e.g. for TGL Chrome kernel multi-core scenarios without dynamic pipeline support, in case they are in post-PV stage and don't want to take the risk of back-porting big series.

No @keyonjie. We need to fix it properly for the static pipeline case before adding refcounting for the cores with dynamic pipeline. The ask is simply to not update the enabled_cores_mask without powering off the cores.

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 28, 2021

@ranj063 wrote:

No @keyonjie. We need to fix it properly for the static pipeline case before adding refcounting for the cores with dynamic pipeline. The ask is simply to not update the enabled_cores_mask without powering off the cores.

But isn't that an additional patch? I mean in the static case, we have no logic currently to power down DSP-managed cores from driver. If we want to add this, this would be a separate PR (but is this needed, is this actually creating problems now)?

Patch from @keyonjie does not power down anything, it just correctly updates the driver state so that upon resume, the IPC to enable a core, gets sent.

@keyonjie
Copy link
Author

keyonjie commented Apr 28, 2021

@ranj063 wrote:

No @keyonjie. We need to fix it properly for the static pipeline case before adding refcounting for the cores with dynamic pipeline. The ask is simply to not update the enabled_cores_mask without powering off the cores.

But isn't that an additional patch? I mean in the static case, we have no logic currently to power down DSP-managed cores from driver. If we want to add this, this would be a separate PR (but is this needed, is this actually creating problems now)?

Thanks for chime in @kv2019i . Yes, today after suspended, those slave cores are already powered off by the FW (or at least not functional), so the updating of the enabled_cores_mask with my PR is only make it aligned with the real status. Without the change, the slave cores won't be powered up anymore and no chance to run anymore.

Patch from @keyonjie does not power down anything, it just correctly updates the driver state so that upon resume, the IPC to enable a core, gets sent.

@plbossart
Copy link
Member

Thanks for chime in @kv2019i . Yes, today after suspended, those slave cores are already powered off by the FW (or at least not functional), so the updating of the enabled_cores_mask with my PR is only make it aligned with the real status. Without the change, the slave cores won't be powered up anymore and no chance to run anymore.

Well that looks even worse. It means the status of the cores were not updated at an earlier point, so we keep a window of time where this status is inconsistent. I say this is not the right fix, only a band-aid.

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 28, 2021

@plbossart wrote:

Well that looks even worse. It means the status of the cores were not updated at an earlier point, so we keep a window of time where this status is inconsistent. I say this is not the right fix, only a band-aid.

It's true we are missing the whole "disable core" path in SOF core. Until the dynamic pipelines was merged, this was not really a problem as cores were incrementally powered up, and then at suspsend, all powered down.

Looking at the history again, it would seem this is actually not related to dynamic pipelines, but rather "ASoC: SOF: update dsp core power status in common APIs" from @bardliao . Before that patch, we had a "sdev->enabled_cores_mask = 0;" in SOF core sof_suspend(). So this basicly reset the enabled_cores_mask in SOF core for all platforms, and after resume, we could start incrementally powering up cores again. @keyonjie Can you confirm that? I.e. the bug with suspend-resume was there already before dynamic pipeline PR:

@ranj063
Copy link
Collaborator

ranj063 commented Apr 28, 2021

@keyonjie Can you confirm that? I.e. the bug with suspend-resume was there already before dynamic pipeline PR:

@kv2019i absolutely yes, this bug pre-dates the dynamic pipeline PR.

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 28, 2021

@keyonjie @ranj063 @plbossart @bardliao Reproduced bug #2824 on TGL HDA machine and verified this patch fixes the problem. Also provided an alternative fix (PR2880) and verified that fixes the problem as well.

@keyonjie
Copy link
Author

@plbossart wrote:

Well that looks even worse. It means the status of the cores were not updated at an earlier point, so we keep a window of time where this status is inconsistent. I say this is not the right fix, only a band-aid.

It's true we are missing the whole "disable core" path in SOF core. Until the dynamic pipelines was merged, this was not really a problem as cores were incrementally powered up, and then at suspsend, all powered down.

Looking at the history again, it would seem this is actually not related to dynamic pipelines, but rather "ASoC: SOF: update dsp core power status in common APIs" from @bardliao . Before that patch, we had a "sdev->enabled_cores_mask = 0;" in SOF core sof_suspend(). So this basicly reset the enabled_cores_mask in SOF core for all platforms, and after resume, we could start incrementally powering up cores again. @keyonjie Can you confirm that? I.e. the bug with suspend-resume was there already before dynamic pipeline PR:

Confirmed, thanks for figuring out the culprit @kv2019i . Yes, we verified multi-core with TGL-010 release(last October) but with lack of CI coverage the regression happened without observation.

@keyonjie
Copy link
Author

@keyonjie @ranj063 @plbossart @bardliao Reproduced bug #2824 on TGL HDA machine and verified this patch fixes the problem. Also provided an alternative fix (PR2880) and verified that fixes the problem as well.

Good, let me close the one here.

@keyonjie keyonjie closed this Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multicore Features and Issues related to multiple DSP core support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants