Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: add option to disable system agent #1998

Merged
merged 1 commit into from
Oct 25, 2019

Conversation

tlauda
Copy link
Contributor

@tlauda tlauda commented Oct 24, 2019

Adds config option to disable system agent. It's helpful
on the still unstable systems, which cannot guarantee that
agent will execute on time.

Signed-off-by: Tomasz Lauda tomasz.lauda@linux.intel.com

src/lib/agent.c Outdated Show resolved Hide resolved
@@ -245,4 +245,12 @@ config SYSTICK_PERIOD
as a timeout check value for system agent.
Value should be provided in microseconds.

config AGENT_DISABLE
bool "Disable system agent"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we have here CONFIG_HAVE_AGENT which defaults to true.

Adds config option to enable system agent. It can be disabled
on the still unstable systems, which cannot guarantee that
agent will execute on time.

Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
@tlauda
Copy link
Contributor Author

tlauda commented Oct 24, 2019

SOFCI TEST

Copy link
Contributor

@xiulipan xiulipan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, test with local qemu test and CI test.

@tlauda tlauda merged commit 6d6f61c into thesofproject:master Oct 25, 2019
@paulstelian97
Copy link
Collaborator

paulstelian97 commented Oct 25, 2019

If 9e0d994 breaks my system on a long running song (found via Git Bisect, no actual traces seen including verbose ones), would this option fix it? (leaving it here, will test it now)

EDIT: Some quick testing shows that indeed it seems to fix it. But why don't I see anything in the etrace buffer related to the agent intervening?

@dbaluta
Copy link
Collaborator

dbaluta commented Oct 26, 2019

@paul, disabling the agent is not an option. It should be there and report us when there are problems.

Disabling the agent is an option for Qemu for example.

So perhaps we can have a look at the faulty commit.

@paulstelian97
Copy link
Collaborator

@dbaluta What confuses me is that while disabling the agent fixes the problem and I can play many times (I see 193 times and then I got issues with the USB port again) when it is enabled and it stops working I don't see any signs of an actual panic (that's why I had to bisect in the first place).

I see that PLATFORM_IDLE_TIME used to be 750ms. Now CONFIG_SYSTICK_PERIOD seems to be used instead, which appears to be much more stringent if I understood the refactor itself properly. CONFIG_SYSTICK_PERIOD seems to default to 1000 which is 1ms. So the agent didn't intervene before but does now.

Of course I may be misunderstanding this but I need to look at it more.

@tlauda
Copy link
Contributor Author

tlauda commented Oct 28, 2019

@paulstelian97 Previously agent's last_check has been updated on passive level after processing any interrupt. It was supposed to verify whether DSP is alive or not, but the way it was done was faulty and with long EDF tasks it caused panics. Now the update and check are done every systick on timer interrupt. The limit for panic is currently set to 100%, so if you're having trouble, this means some tasks are too long and blocking your timer irq.

@paulstelian97
Copy link
Collaborator

paulstelian97 commented Oct 28, 2019

How should I best debug this? Should I just switch back to the default config and add a trace both in panic_rewind and in the timer irq handler? (or perhaps in the agent). This may flood my etrace buffer (dtrace isn't working yet on imx).

Considering that the pipeline task isn't EDF anymore (that also caused me some funny issues which we discussed and fixed by enabling caches) I'm not sure how the agent should work.

Perhaps the systick period is incorrect or something on my platform? But if so, why does it take two playthroughs of a 17-minute song to reproduce?

@tlauda
Copy link
Contributor Author

tlauda commented Oct 28, 2019

@paulstelian97 Try to switch platform_timer to IRQ_NUM_TIMER1. I see it's on the higher interrupt level. If it's going to help, then it means your DMA transfers are blocking timer interrupt, so probably increasing CONFIG_SYSTICK_PERIOD should help.

@paulstelian97
Copy link
Collaborator

Will try now. The thing is, within the context of the DMA interrupt the entire pipeline is being run from what I understand, that's why without caching I had those serious issues only after your DMA scheduler domain introduction.

Seems like nothing is broken by switching to it, will report probably after lunch (my issue shows up after 30 minutes on average so I will report if after lunch -- quite a bit more than 30 minutes -- it's playing or stuck)

@paulstelian97
Copy link
Collaborator

paulstelian97 commented Oct 28, 2019

Reporting, using TIMER1 instead of TIMER0 (simply changed that data structure) delayed the issue. The DSP died within the third playthrough (song length: 17:06).

Will try various other ways to debug and if I find anything acceptable which fixes it I will submit it.

@tlauda tlauda deleted the topic/agent-config-disable branch November 22, 2019 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants