-
Notifications
You must be signed in to change notification settings - Fork 59
case-lib: apause: Make sure the PCM is running before pausing #1287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
case-lib: apause: Make sure the PCM is running before pausing #1287
Conversation
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR deals with an xrun that happened when in the pause/resume cycle?
Which xrun did you observe, underrun or overrun or both? Can you elaborate?
This looks like a fairly complex issue, I feel like a new https://github.com/thesofproject/sof-test/issues/new/choose would not hurt.
Shouldn't the test fail anyway when an xrun happened? Ideally, with an error message less cryptic than "file descriptor in bad state" but still fail? Doesn't this PR hide the xrun and make the test pass when it shouldn't?
No, an.xrun doesn't mean the test should fail immediately. Only when the application cannot recover successfully from an xrun should the test fail. When that happens, you'd see the "input/output error". |
|
Thanks, good to know. The code change makes sense to me. I still think it would be useful to have a short description of when the xrun happens and why. If not in a new sof-test bug, then in the commit message. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
So it looks like all the tests finally completed... more than 15h after the git push? 5be12d0 was pushed on July 24th, 23:50 UTC. The screenshot above with unfinished jobs was captured on July 25th, 15:08 UTC Maybe there was just a very long backlog... I would recommend taking at look at the test logs to make sure. |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so clearcut whether we should fail on the xruns, but given we don't have --fatal-errors in other tests at the moment so this PR is aligned with other tests we have. I'd say let's proceed with this change.
|
What is (or... should be) the "canonical" way to detect xruns? If it is for instance scanning logs (kernel or firmware), then there could be some sort "universal" toggle that works the same across all tests. Doesn't this test scan logs already? Most do. If it does, then isn't that PR not enough? |
|
|
||
| set _delay [substract_time_since_last_space $_record_for] | ||
|
|
||
| # wait 50ms for the PCM status to be RUNNING before pausing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # wait 50ms for the PCM status to be RUNNING before pausing | |
| # pool for approximately 50ms for the PCM status to be RUNNING before pausing |
I bet it's quite a lot more than 50ms in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think it would be a lot less 50ms is really like the worst case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we are talking across each other. From experience, after 1 waits at least 1 but a lot more in reality. Hence my suggested change.
Unlike me, I suspect you are talking about audio, not about expect?
BTW take a look at the comment on line 63.
case-lib/apause.exp
Outdated
| after 1 | ||
| } | ||
| if {$attempt >= $max_attempts} { | ||
| log 0 "ERROR: timeout waiting for PCM status to be RUNNING before pause" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should print some "WARN: likely XRUN" if $attempt > 0
Then this could more easily be converted to an error, either temporarily by editing the code locally, or based on some command line option or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ERROR is appropriate as we fail, but would be nice to print out the state which the PCM is in when we drop our hand in the air.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi updated with the current status print during the error
|
@marc-hb The application can detect xruns via ALSA API and it's upto the app to decide whether it's lethal or not (stremaing stopped, or does it proceed with a possible glitch in played/captured audio). I agree this PR is in effect is reducing the coverage of this test case and it can hide fails we had earlier flagged as fails. OTOH, the xruns are a smaller impact issue compared to an IPC timeout (= one might need to reboot the DUT to get any audio functionality back). In practise we have disabled pause in the SOF drivers towards applications and we primarily use this test case to stress test the pipeline state machines and root out IPC timeout scenarios. In this context, I think this PR makes sense as it will filter out the xruns (that we anyways are not fixing in context of pause), and will get more reliable pass/fail w.r.t. to IPC timeouts. |
I see nothing wrong with prioritizing some failures above others - but there should IMHO at least be a EDIT: a "flag" is a big ask. A comment in the source explaining how to quickly and locally edit it would be enough. |
|
@ranj063, do we have idea why we go to xrun during the pause_push/pause_release ? |
@ujfalusi its not that we're running into xruns during pause/release, the problem is with timing in the test I think. we pause for a a very very short time, release and then try to pause within a very short time right after. In this sequence, when we pause, I think it makes sense to wait until the PCM is actually in the correct RUNNING state before pausing. |
This change addresses the following error seen when doing pause with the multiple-pause-resume test: aplay: do_pause:1586: pause push error: File descriptor in bad state When an xrun happens during the test, the application tries to recover from the xrun by preparing and restarting the stream. There could be a race between when this happens and when the script tries to pause the stream. To avoid this, make sure that the stream state is RUNNING before going ahead with a subsequent pause. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
5be12d0 to
2ce6ddc
Compare
redzynix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
SOFCI TEST |
|
|
||
| set _delay [substract_time_since_last_space $_record_for] | ||
|
|
||
| # wait 50ms for the PCM status to be RUNNING before pausing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we are talking across each other. From experience, after 1 waits at least 1 but a lot more in reality. Hence my suggested change.
Unlike me, I suspect you are talking about audio, not about expect?
BTW take a look at the comment on line 63.
| log 0 "ERROR: timeout waiting for PCM to be in RUNNING state before pause" | ||
| log 0 "Current state: $pcm_status" | ||
| exit 1 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging the number of attempts (at a high log level) would not hurt.
|
|
||
| # wait 50ms for the PCM status to be RUNNING before pausing | ||
| # this is to make sure that in the case of an xrun the application | ||
| # successfully recovers and restarts the stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # successfully recovers and restarts the stream. | |
| # successfully recovers and restarts the stream. | |
| # change `max_attempts` to zero when observing xruns is desired. |
This change addresses the following error seen when doing pause with the multiple-pause-resume test:
aplay: do_pause:1586: pause push error: File descriptor in bad state
When an xrun happens during the test, the application tries to recover from the xrun by preparing and restarting the stream. There could be a race between when this happens and when the script tries to pause the stream. To avoid this, make sure that the stream state is RUNNING before going ahead with the pause.