[WIN] close/reopen the wasapi audio device when lost #731

Closed
wants to merge 12 commits into
from

Projects

None yet

7 participants

@CrystalP
Contributor

This is an attempt to fix http://forum.xbmc.org/showthread.php?t=121689

The error code returned by WASAPI is clear, so chances are pretty good this fix works (ie release the device interfaces and recreate them later)

Unfortunately I am unable to repro the issue, and don't receive the error code when unplugging outputs or changing device parameters while paused.

@elupus
Member
elupus commented Feb 26, 2012

Looks good to me.

@a11599
Contributor
a11599 commented Mar 5, 2012

I can reproduce the issue. My HTPC is connected to the TV through an AVR. If I turn the TV off, then back on, I get the device invalidated error. At this point, most of the times (but not always) the dvdplayeraudio thread crashes. Unfortunately the patch does not fix it. Although the patch is necessary I think, maybe something is still trying to mess with the audio buffer and that's why it crashes?

@a11599 a11599 and 1 other commented on an outdated diff Mar 5, 2012
xbmc/cores/AudioRenderers/Win32WASAPI.cpp
CLog::Log(LOGERROR, __FUNCTION__": ReleaseBuffer failed (%i)", hr);
+ if (hr == AUDCLNT_E_DEVICE_INVALIDATED)
+ Close();
@a11599
a11599 Mar 5, 2012 Contributor

Should not we also reset m_bIsAllocated to false here? In one case during testing when the audio thread didn't crash I got the error that the sink cannot be opened twice.

@CrystalP
CrystalP Mar 5, 2012 Contributor

I don't think so. I meant to not change the use of m_bIsAllocated so that it's true when dvdplayeraudio thinks the sink is initialized.
m_bInitialized is new and true when the references to the device are valid. Basically it's a shortcut to "m_pRenderClient != NULL && m_pAudioClient != NULL && m_pDevice != NULL).
Close() sets it to false and the following AddPackets() should try to reopen and eventually set it to true again,.

@a11599
a11599 Mar 5, 2012 Contributor

Actually your patch moved m_bIsAllocated = false to Deinitialize(). That's why I'm asking. Perhaps it should be just moved back to Close()?

@CrystalP
CrystalP Mar 5, 2012 Contributor

Argh yes obviously the problem is with m_bInitialized on reinitialization. Let me see what to do.
Still it shouldn't trip an exception later.

@CrystalP
CrystalP Mar 5, 2012 Contributor

For testing you can remove if(m_bIsAllocated) and its block at the beginning of Initialize(), I doubt it's really needed anyway 'cause WASAPI would return an error if we tried.

@a11599
a11599 Mar 5, 2012 Contributor

I tried setting m_bIsAllocated to false here to get around this earlier, but dvdplayeraudio still crashed (another thing: in paplayer audio is lost, but it does not seem to crash). Biggest issue now is to find a free timeslot in the living room for a more in-depth experiments but I doubt this will happen before the weekend.

@CrystalP
Contributor
CrystalP commented Mar 5, 2012

It's good that you can repro. Can the device eventually be successfully reopened? Does it take many tries (ie many calls to AddPackets() after the invalidation?
I changed the handling of errors in GetBuffer/ReleaseBuffer and instead of pretending it ate the buffer, a 0 is returned. I don't know if that's important.
Maybe dvdplayeraudio calls functions other than AddPackets() and are not ready to handle invalid device references?

@a11599
Contributor
a11599 commented Mar 5, 2012

Here is a log when the reinit was attempted, but failed due to sink believing the endpoint is already initialized: http://pastebin.com/vW2X2nSM
Is far as I remember when the thread dies there are no entries from CWin32WASAPI::Initialize in the log, the access violation is straight after "CWin32WASAPI::AddPackets: GetBuffer failed (-2004287484)". Unfortunately I have not kept such a log and family occupied the living room at the moment. :)
Will see what I can do because this is driving me nuts, I cannot turn off the TV even if I am just listening to some radio or music.

@CrystalP
Contributor
CrystalP commented Mar 8, 2012

These commits should fix the problem. I didn't want to mess much with the structure of the current code, being close to a release and with AE replacing all that anyway.
I messed up my rebase and ended up force-pushing, sorry about that.
Simulating device loss/recovery worked fine so I'm hopeful you'll be ok on actual hardware.

@a11599
Contributor
a11599 commented Mar 9, 2012

Unfortunately it does not fix the issue here. What I observed is when I turn the TV back on, the AVR passes the TV EDID for a split second, and then sends a new one with its own data. With some additional code in CWin32WASAPI I managed to work around it and now it can recover and works fine with paplayer (my previous report was wrong, paplayer also crashed). However dvdplayeraudio still crashes before CWin32WASAPI::AddPackets() has a chance to catch the INVALIDATED error.

Here is the current status (including your commits): http://pastebin.com/VwhJydur. Just ignore the patch quality at the moment, this is just a very rough work in progress version, but showing just to get an idea what I have done. I had to add a lot of checks for m_Initialized in various methods where m_bIsInitialized is checked, and also created m_Recover bool to decide in Initialize() how to handle errors. So for example if m_Recover is true, Initialize() will not fallback to the default device.

Issue is that I cannot replicate on my notebook (I think the display driver misses out on the TV EDID because it is busy switching to/from the notebook LCD), it only happens with my dedicated HTPC, but I am refusing to install VC++ on it. So I just change the code, compile, copy over the .exe and watch the debug log... A bit time consuming process.

But I am not sure if it is worth fixing this now if AE really changes this piece of code.

@CrystalP
Contributor
CrystalP commented Mar 9, 2012

Similar changes will be needed for AE, so knowing how to fix is still important. And who knows how long it will take for a release including AE :) In the mean time more and more people will get HDMI receivers and hit the problem in Eden.
I suggest remote debugging to avoid installing the whole VS suite on your HTPC and still have an easy-to-work-with environment.

When does the INVALIDATED error code happen? when switching the TV back on, from what you're saying?

I'm not surprised that more functions using one of the device pointers must be protected with a m_Initialized test. In addittion to that I hesitated about catching the INVALIDATED error message for all calls and call Close() in response. That would be better. Most m_Validated tests are probably redundant with m_Initialized.

Seems to me using m_Validated == true in Initialize() would remove the need for m_Recover.

How does DirectSound react to the TV switching by the way?

@CrystalP
Contributor

Here are some more commits, incorporating most of your ideas.

While reading your patch I wondered a few things:

  • there will be a problem if the wasapi sink fell back on the default device the first time around. It won't recover?
  • what's the point of the Sleep line 188

I'm unsure about the last commit, you may try with and without. It's necessary to catch all error codes, but I don't know what dvdplayeraudio expects/how it reacts.

@a11599
Contributor
a11599 commented Mar 15, 2012

Tried your latest commits. I only had to safeguard UpdateCacheStatus() to make this patch stable with paplayer: http://pastebin.com/wwatUqx9. While not strictly required, I also safeguarded CheckPlayStatus() for completeness sake.

However, dvdplayeraudio still fails, but not always. Sometimes it survives and reinitializes as it should. I am looking into this (although have no idea yet what is causing the access violation).

I however found a way to reproduce this on my notebook and this might work for you as well: select WASAPI output in XBMC, start playback, then in Control Panel > Hardware and Sound > Sound right click on the audio device and disable/enable until the thread crashes.

Fallback to default device was circumvented for convenience. When I turn the TV back on, the HDMI audio device becomes unavailable for a second and then Initialize() would fallback to the default device which is the onboard audio in my case. Alternatively I could just disable the onboard audio, but I think we should try to recover to the same playback device and not try any fallbacks in these scenarios (fallback is OK on playback start, however).

The Sleep's purpose was just to mute the log entries while debugging (my scenario adds 700KB to the debug log). There are other ways around but probably we should limit the time elapsed between reinit attempts (not necessarily with a sleep). Not only for log size, but also because it will likely take some time until the device becomes available again. A few (1-10) reinit attempts per second should suffice I guess.

@CrystalP
Contributor

Well DVDPlayerAudio died of an access violation a nanosecond after disabling the audio device from the control panel. Not a thing about lost devices though. We'll see what comes out of this.
Hey! how do I bring it back now? nvm, found it...

@CrystalP
Contributor

Adding an error code for UpdateCacheStatus is a good thing, as Close()ing sets m_uiChunkSize to 0, which caused a division by zero... Though it was reported as a different exception...
paplayer can still easily be crashed though for me, almost every time.

There are still a few different failure modes but the most common is in the WASAPI GetCurrentPadding method. Log statements before and after show that. It could be that something else corrupts the memory earlier and the crash only happens in GetCurrentPadding, but what...
The call stacks look weird, adding the critical section in more places result in different stack traces.
There is an event mechanism for WASAPI, I don't know if that would let us catch the invalidation early enough.

Same results with ATI HDMI audio and onboard realtek spdif. foobar with WASAP output is also crashable, though much less.

edit: just got a failure where GetBuffer returns success, yet the buffer pointer is to unaccessible memory! WTF!

@CrystalP
Contributor

With those final commits, I think the problem is handled as best as we can, and even though not perfect, it's better than nothing.
It's not clear what to do about the remaining crashes.

@CrystalP
Contributor

@DDDamian do you have an opinion/ideas about all this? If not in Eden, something like this will need to be done for AE in the WASAPI.
Maybe you can even repro the issue?

@DDDamian
Contributor

Hi CrystalP - I have followed the conversation and it will need to be done in AE as well. Things like an unplugged cable I guess we just need to fail gracefully (after all, we don't look for a second TV if you unplug the first), but the long pause disconnect I will try reproduce and report back. Chnaces are good that on some hardware or other a connection will eventually be lost on pause, so dealing with it on resume makes sense.

@jmarshallnz
Member

AE is in now, so this will need to be redone - have fun :)

@DDDamian
Contributor

CrystalP, this has been on the back of my mind since you submitted. Can I suggest we let some good feedback come in now that AE is in nightlies and then see where to go from there?

Obviously much has changed in the code, and I think a good deal of the work here has probably been duplicated but I'm not naive enough to think we're "done".

@jmarshallnz
Member

@CrystalP, @a11599 With AE in, all this needs reevaluating. Do either of you have the time or inclination to take a look?

@MartijnKaijser
Member

Is this PR still needed for current master (or Frodo+1)?

@DDDamian
Contributor
DDDamian commented Dec 7, 2012

I'm pretty sure I fixed this in July - HDMI devices can now be shut-off / switched mid-stream without crashing and a new device can then be selected. By forum reports it worked after the fix. Post-Frodo I have added a fallback to the "Default" device which will help further.

@a11599
Contributor
a11599 commented Dec 7, 2012

I can confirm with Frodo b2 there are no lockups on my system anymore (audio is still killed but that is just fine, it is happening with all Windows applications, not just XBMC).

@da-anda
Member
da-anda commented Mar 9, 2013

This PR can be closed, right? @DDDamian

@DDDamian
Contributor
DDDamian commented Mar 9, 2013

Yes, it's obsolete now.

@DDDamian DDDamian closed this Mar 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment