case-lib: apause: Make sure the PCM is running before pausing #1287

ranj063 · 2025-07-24T23:50:03Z

This change addresses the following error seen when doing pause with the multiple-pause-resume test:
aplay: do_pause:1586: pause push error: File descriptor in bad state

When an xrun happens during the test, the application tries to recover from the xrun by preparing and restarting the stream. There could be a race between when this happens and when the script tries to pause the stream. To avoid this, make sure that the stream state is RUNNING before going ahead with the pause.

marc-hb

This PR deals with an xrun that happened when in the pause/resume cycle?

Which xrun did you observe, underrun or overrun or both? Can you elaborate?

This looks like a fairly complex issue, I feel like a new https://github.com/thesofproject/sof-test/issues/new/choose would not hurt.

Shouldn't the test fail anyway when an xrun happened? Ideally, with an error message less cryptic than "file descriptor in bad state" but still fail? Doesn't this PR hide the xrun and make the test pass when it shouldn't?

ranj063 · 2025-07-25T06:32:41Z

This PR deals with an xrun that happened when in the pause/resume cycle?

Which xrun did you observe, underrun or overrun or both? Can you elaborate?

This looks like a fairly complex issue, I feel like a new https://github.com/thesofproject/sof-test/issues/new/choose would not hurt.

Shouldn't the test fail anyway when an xrun happened? Ideally, with an error message less cryptic than "file descriptor in bad state" but still fail? Doesn't this PR hide the xrun and make the test pass when it shouldn't?

No, an.xrun doesn't mean the test should fail immediately. Only when the application cannot recover successfully from an xrun should the test fail. When that happens, you'd see the "input/output error".

marc-hb · 2025-07-25T15:05:10Z

Thanks, good to know.

The code change makes sense to me.

I still think it would be useful to have a short description of when the xrun happens and why. If not in a new sof-test bug, then in the commit message.

marc-hb · 2025-07-25T16:38:59Z

So it looks like all the tests finally completed... more than 15h after the git push?

5be12d0 was pushed on July 24th, 23:50 UTC.

The screenshot above with unfinished jobs was captured on July 25th, 15:08 UTC

Maybe there was just a very long backlog... I would recommend taking at look at the test logs to make sure.

kv2019i

It's not so clearcut whether we should fail on the xruns, but given we don't have --fatal-errors in other tests at the moment so this PR is aligned with other tests we have. I'd say let's proceed with this change.

marc-hb · 2025-07-28T19:03:02Z

What is (or... should be) the "canonical" way to detect xruns? If it is for instance scanning logs (kernel or firmware), then there could be some sort "universal" toggle that works the same across all tests.

Doesn't this test scan logs already? Most do. If it does, then isn't that PR not enough?

marc-hb · 2025-07-25T15:02:42Z

case-lib/apause.exp


                set _delay [substract_time_since_last_space $_record_for]
+
+                # wait 50ms for the PCM status to be RUNNING before pausing


Suggested change

# wait 50ms for the PCM status to be RUNNING before pausing

# pool for approximately 50ms for the PCM status to be RUNNING before pausing

I bet it's quite a lot more than 50ms in practice.

I'd think it would be a lot less 50ms is really like the worst case

I'm afraid we are talking across each other. From experience, after 1 waits at least 1 but a lot more in reality. Hence my suggested change.

Unlike me, I suspect you are talking about audio, not about expect?

BTW take a look at the comment on line 63.

marc-hb · 2025-07-28T19:04:57Z

case-lib/apause.exp

+                    after 1
+                }
+                if {$attempt >= $max_attempts} {
+                    log 0 "ERROR: timeout waiting for PCM status to be RUNNING before pause"


I think this should print some "WARN: likely XRUN" if $attempt > 0

Then this could more easily be converted to an error, either temporarily by editing the code locally, or based on some command line option or similar.

I think the ERROR is appropriate as we fail, but would be nice to print out the state which the PCM is in when we drop our hand in the air.

@ujfalusi updated with the current status print during the error

marc-hb · 2025-07-29T22:51:48Z

but given we don't have --fatal-errors in other tests at the moment

Actually: you can already do this with some tests: #489, #1120, SOF_ALSA_OPTS

Work in progress.

kv2019i · 2025-08-05T07:57:06Z

@marc-hb The application can detect xruns via ALSA API and it's upto the app to decide whether it's lethal or not (stremaing stopped, or does it proceed with a possible glitch in played/captured audio).

I agree this PR is in effect is reducing the coverage of this test case and it can hide fails we had earlier flagged as fails. OTOH, the xruns are a smaller impact issue compared to an IPC timeout (= one might need to reboot the DUT to get any audio functionality back). In practise we have disabled pause in the SOF drivers towards applications and we primarily use this test case to stress test the pipeline state machines and root out IPC timeout scenarios. In this context, I think this PR makes sense as it will filter out the xruns (that we anyways are not fixing in context of pause), and will get more reliable pass/fail w.r.t. to IPC timeouts.

marc-hb · 2025-08-05T22:28:29Z

I agree this PR is in effect is reducing the coverage of this test case and it can hide fails we had earlier flagged as fails. OTOH, the xruns are a smaller impact issue compared to an IPC timeout

I see nothing wrong with prioritizing some failures above others - but there should IMHO at least be a ~~flag~~ easy way to restore the previous behavior. Or at the very least a WARNING printed = literally just one line of code. Also, your (crystal-clear as usual) explanation of coverage priorities belongs to the commit message and comments in the source. EDIT: not just lost in a PR comment.

EDIT: a "flag" is a big ask. A comment in the source explaining how to quickly and locally edit it would be enough.

ujfalusi · 2025-10-08T05:29:09Z

@ranj063, do we have idea why we go to xrun during the pause_push/pause_release ?

ranj063 · 2025-10-08T17:08:15Z

@ranj063, do we have idea why we go to xrun during the pause_push/pause_release ?

@ujfalusi its not that we're running into xruns during pause/release, the problem is with timing in the test I think. we pause for a a very very short time, release and then try to pause within a very short time right after. In this sequence, when we pause, I think it makes sense to wait until the PCM is actually in the correct RUNNING state before pausing.

This change addresses the following error seen when doing pause with the multiple-pause-resume test: aplay: do_pause:1586: pause push error: File descriptor in bad state When an xrun happens during the test, the application tries to recover from the xrun by preparing and restarting the stream. There could be a race between when this happens and when the script tries to pause the stream. To avoid this, make sure that the stream state is RUNNING before going ahead with a subsequent pause. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

redzynix

Looks good.

redzynix · 2025-10-15T09:26:51Z

SOFCI TEST

marc-hb · 2025-10-15T18:20:23Z

case-lib/apause.exp


                set _delay [substract_time_since_last_space $_record_for]
+
+                # wait 50ms for the PCM status to be RUNNING before pausing


I'm afraid we are talking across each other. From experience, after 1 waits at least 1 but a lot more in reality. Hence my suggested change.

Unlike me, I suspect you are talking about audio, not about expect?

BTW take a look at the comment on line 63.

marc-hb · 2025-10-15T18:21:40Z

case-lib/apause.exp

+                    log 0 "ERROR: timeout waiting for PCM to be in RUNNING state before pause"
+                    log 0 "Current state: $pcm_status"
+                    exit 1
+                }


Logging the number of attempts (at a high log level) would not hurt.

marc-hb · 2025-10-15T18:23:45Z

case-lib/apause.exp

+
+                # wait 50ms for the PCM status to be RUNNING before pausing
+                # this is to make sure that in the case of an xrun the application
+                # successfully recovers and restarts the stream.


Suggested change

# successfully recovers and restarts the stream.

# successfully recovers and restarts the stream.

# change `max_attempts` to zero when observing xruns is desired.

ranj063 requested review from a team, golowanow, lgirdwood and marc-hb as code owners July 24, 2025 23:50

ranj063 mentioned this pull request Jul 24, 2025

[BUG] multiple platforms and topologies: multiple-pause-resume failures with "nothing to copy" thesofproject/sof#10116

Closed

ranj063 requested review from kv2019i and lyakh July 24, 2025 23:51

marc-hb reviewed Jul 25, 2025

View reviewed changes

This comment was marked as off-topic.

Sign in to view

kv2019i approved these changes Jul 28, 2025

View reviewed changes

marc-hb reviewed Jul 28, 2025

View reviewed changes

ranj063 force-pushed the fix/multiple_pause_resume branch from 5be12d0 to 2ce6ddc Compare October 14, 2025 16:13

ranj063 requested review from kv2019i and ujfalusi October 14, 2025 16:14

ujfalusi approved these changes Oct 14, 2025

View reviewed changes

redzynix approved these changes Oct 15, 2025

View reviewed changes

marc-hb reviewed Oct 15, 2025

View reviewed changes

redzynix merged commit a9f04af into thesofproject:main Oct 16, 2025
4 of 8 checks passed


		set _delay [substract_time_since_last_space $_record_for]

		# wait 50ms for the PCM status to be RUNNING before pausing

	# wait 50ms for the PCM status to be RUNNING before pausing
	# pool for approximately 50ms for the PCM status to be RUNNING before pausing

	# successfully recovers and restarts the stream.
	# successfully recovers and restarts the stream.
	# change `max_attempts` to zero when observing xruns is desired.

case-lib: apause: Make sure the PCM is running before pausing #1287

case-lib: apause: Make sure the PCM is running before pausing #1287

Uh oh!

Conversation

ranj063 commented Jul 24, 2025

Uh oh!

marc-hb left a comment

Choose a reason for hiding this comment

Uh oh!

ranj063 commented Jul 25, 2025

Uh oh!

marc-hb commented Jul 25, 2025

Uh oh!

This comment was marked as off-topic.

marc-hb commented Jul 25, 2025

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

marc-hb commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marc-hb Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

ranj063 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

marc-hb Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc-hb Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ujfalusi Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ranj063 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

marc-hb commented Jul 29, 2025

Uh oh!

kv2019i commented Aug 5, 2025

Uh oh!

marc-hb commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ujfalusi commented Oct 8, 2025

Uh oh!

ranj063 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

redzynix left a comment

Choose a reason for hiding this comment

Uh oh!

redzynix commented Oct 15, 2025

Uh oh!

marc-hb Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc-hb Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

marc-hb Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

marc-hb commented Jul 28, 2025 •

edited

Loading

marc-hb Oct 15, 2025 •

edited

Loading

marc-hb Jul 28, 2025 •

edited

Loading

marc-hb commented Aug 5, 2025 •

edited

Loading

ranj063 commented Oct 8, 2025 •

edited

Loading

marc-hb Oct 15, 2025 •

edited

Loading