-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: add option to disable system agent #1998
agent: add option to disable system agent #1998
Conversation
e73b81b
to
d7feadb
Compare
src/platform/Kconfig
Outdated
@@ -245,4 +245,12 @@ config SYSTICK_PERIOD | |||
as a timeout check value for system agent. | |||
Value should be provided in microseconds. | |||
|
|||
config AGENT_DISABLE | |||
bool "Disable system agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we have here CONFIG_HAVE_AGENT which defaults to true.
Adds config option to enable system agent. It can be disabled on the still unstable systems, which cannot guarantee that agent will execute on time. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
d7feadb
to
1f8f853
Compare
SOFCI TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, test with local qemu test and CI test.
If 9e0d994 breaks my system on a long running song (found via Git Bisect, no actual traces seen including verbose ones), would this option fix it? (leaving it here, will test it now) EDIT: Some quick testing shows that indeed it seems to fix it. But why don't I see anything in the etrace buffer related to the agent intervening? |
@paul, disabling the agent is not an option. It should be there and report us when there are problems. Disabling the agent is an option for Qemu for example. So perhaps we can have a look at the faulty commit. |
@dbaluta What confuses me is that while disabling the agent fixes the problem and I can play many times (I see 193 times and then I got issues with the USB port again) when it is enabled and it stops working I don't see any signs of an actual panic (that's why I had to bisect in the first place). I see that PLATFORM_IDLE_TIME used to be 750ms. Now CONFIG_SYSTICK_PERIOD seems to be used instead, which appears to be much more stringent if I understood the refactor itself properly. CONFIG_SYSTICK_PERIOD seems to default to 1000 which is 1ms. So the agent didn't intervene before but does now. Of course I may be misunderstanding this but I need to look at it more. |
@paulstelian97 Previously agent's last_check has been updated on passive level after processing any interrupt. It was supposed to verify whether DSP is alive or not, but the way it was done was faulty and with long EDF tasks it caused panics. Now the update and check are done every systick on timer interrupt. The limit for panic is currently set to 100%, so if you're having trouble, this means some tasks are too long and blocking your timer irq. |
How should I best debug this? Should I just switch back to the default config and add a trace both in panic_rewind and in the timer irq handler? (or perhaps in the agent). This may flood my etrace buffer (dtrace isn't working yet on imx). Considering that the pipeline task isn't EDF anymore (that also caused me some funny issues which we discussed and fixed by enabling caches) I'm not sure how the agent should work. Perhaps the systick period is incorrect or something on my platform? But if so, why does it take two playthroughs of a 17-minute song to reproduce? |
@paulstelian97 Try to switch platform_timer to IRQ_NUM_TIMER1. I see it's on the higher interrupt level. If it's going to help, then it means your DMA transfers are blocking timer interrupt, so probably increasing CONFIG_SYSTICK_PERIOD should help. |
Will try now. The thing is, within the context of the DMA interrupt the entire pipeline is being run from what I understand, that's why without caching I had those serious issues only after your DMA scheduler domain introduction. Seems like nothing is broken by switching to it, will report probably after lunch (my issue shows up after 30 minutes on average so I will report if after lunch -- quite a bit more than 30 minutes -- it's playing or stuck) |
Reporting, using TIMER1 instead of TIMER0 (simply changed that data structure) delayed the issue. The DSP died within the third playthrough (song length: 17:06). Will try various other ways to debug and if I find anything acceptable which fixes it I will submit it. |
Adds config option to disable system agent. It's helpful
on the still unstable systems, which cannot guarantee that
agent will execute on time.
Signed-off-by: Tomasz Lauda tomasz.lauda@linux.intel.com