Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PVR] Fix crash in PVR::CPVRClient::GetDriveSpace (Trac #15942) #7216

Merged
merged 1 commit into from
Jun 2, 2015

Conversation

ksooo
Copy link
Member

@ksooo ksooo commented May 31, 2015

I'm not 100% sure about the root cause, but the various stack traces I got show that the CPVRClient instance on which GetDriveSpace gets called is NULL. Thus, my simple fix should prevent the crash.

Root cause could be a multithreading issue. One piece of code looks suspicious to me:

CSingleLock lock(m_critSection);
 ...
  {
    unsigned int iPrevious = m_iAddonInfoToggleCurrent;
    if (((int) ++m_iAddonInfoToggleCurrent) > m_iActiveClients - 1)
      m_iAddonInfoToggleCurrent = 0;

For a short moment m_iAddonInfoToggleCurrent can have a value that is too large.

This piece of code reads m_iAddonInfoToggleCurrent without a mutex.

/* safe to read unlocked */
    for (unsigned int i = 0; i < m_iAddonInfoToggleCurrent; i++)
      activeClient++;

    if (activeClient->second->GetDriveSpace(&iBackendkBTotal, &iBackendkBUsed) == PVR_ERROR_NO_ERROR)

So far so bad, activeClient might point activeClients.end() due to the potential multithreading issue here. But I was unable to actually find out the use case where the writing code runs in another thread than the reading code... strange.

@opdenkamp mind taking a look?

@ksooo ksooo added Type: Fix non-breaking change which fixes an issue Component: PVR v15 Isengard labels May 31, 2015
@stefansaraev
Copy link
Contributor

for the record: #6732

@Paxxi
Copy link
Member

Paxxi commented Jun 1, 2015

That comment in the second piece of code seems wrong, if it can be mutated by another thread it's not safe to read and absolutely no safe to loop over it

@Paxxi
Copy link
Member

Paxxi commented Jun 1, 2015

At best this fix will make the error be more random but it won't fix the underlying issue that you're looping over shared state without a lock

@ksooo
Copy link
Member Author

ksooo commented Jun 1, 2015

Sure. Protecting the reading code will avoid the iterator running out of scope. This is probably a better way to take with the problem. But what is the actual root cause?

Under which circumstances do multiple cguiinfo threads run in parallel? This imo must be the case to run into the crash. But having more than one of this threada makes no sense for me... Honestly, trying my best, but all this makes not too much sense for me. This drives me nuts!

@Paxxi
Copy link
Member

Paxxi commented Jun 1, 2015

The code in void CPVRGUIInfo::UpdateBackendCache(void) makes no sense.
It calls into AddonInfoToggle which locks before updating the variable, it then calls lots of methods on a pointer completely unprotected and then takes a lock again to update string variables. Why not protect that which will make us crash horribly?

Also I'm not really sure that the idea to use a counter to keep track of which client to update is a good idea when storing the clients in a map. Items can change places from under you if something's inserted depending on it's key.

activeClients comes from g_PVRClients and is using raw pointers so any change in g_PVRClients can kill this code.

That's what I've found so far

@Paxxi
Copy link
Member

Paxxi commented Jun 1, 2015

oh, just thought of it, since the data lives in g_PVRClients no locking in UpdateBackendCache will work, needs to be locked in g_PVRClients during the whole op or copied using smart pointers to avoid deletion
The typedef fooled me, I see that it's actually using std::shared_ptr

@Paxxi
Copy link
Member

Paxxi commented Jun 1, 2015

quick and dirty possible fix Paxxi/xbmc@83f29c4
AddonToggleInfo uses m_iActiveClients which may not match what we got as iActiveClients and we end up with a nullptr this is to show the issue, it's not a production ready fix as there's still a race condition present

@ksooo it seems your first fix could work fine, I was reading on the phone and it looked like it was part of the for loop.

@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

@Paxxi Did a little more brain debugging and now I understand it. This is not a multithreading issue. Happens also if there is only one guiinfo thread involved and at some certain point in time count of active clients drops.

Following scenario:
Very first call to UpdateBackendCache:

  • assume iActiveClients is 2.
  • AddonInfoToggle gets called and sets m_iAddonInfoToggleCurrent to 0
  • back in UpdateBackendCache: activeClient will be set to first vector element, m_iActiveClients will be set to 2
  • => all good

2nd call to UpdateBackendCache:

  • assume iActiveClients is still 2.
  • AddonInfoToggle gets called and sets m_iAddonInfoToggleCurrent to 1
  • back in UpdateBackendCache: activeClient will be set to second vector element, m_iActiveClients will be set to 2 (no change in value)
  • => still all good

3rd call to UpdateBackendCache:

  • assume iActiveClients now is 1.
  • AddonInfoToggle gets not called, thus m_iAddonInfoToggleCurrent is unchanged (current value is 1!)
  • UpdateBackendCache: activeClient will be set to second vector element, which is not actually present
  • => deref of activeClients.end() => Booom!

@ksooo ksooo closed this Jun 2, 2015
@ksooo ksooo reopened this Jun 2, 2015
@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

Oops! Close was not intended.

@Paxxi: Does my previous comment make sense for you?

@Paxxi
Copy link
Member

Paxxi commented Jun 2, 2015

yup @ksooo that's it, you're right that it's actually broken even if there's just one thread involved

@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

@Paxxi Here is my take on actually fixing the root cause of this bug. Looking okay?

@Paxxi
Copy link
Member

Paxxi commented Jun 2, 2015

@ksooo it'll work but I think it would be nicer and easier to read to modify AddonInfoToggle to take the active clients as a param and store it in m_activeClients before doing it's stuff. Should probably be renamed to something else, GetNextUpdateTarget? not that familiar with the code so there's probably a more suitable name for it.

@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

@Paxxi ofc you are right wrt "nicer and easier to read", but for Isengard at this stage we should avoid any code changes that are not absolutely necessary. Thus, may I suggest to beautify this fix in a separate PR for J****?

@Paxxi
Copy link
Member

Paxxi commented Jun 2, 2015

Yeah you can go ahead and merge

@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

jenkins build this please

@ksooo
Copy link
Member Author

ksooo commented Jun 2, 2015

jenkins kodi build successful on all platforms, it was "just" audioencoder.wav that failed to build.

ksooo added a commit that referenced this pull request Jun 2, 2015
[PVR] Fix crash in PVR::CPVRClient::GetDriveSpace (Trac #15942)
@ksooo ksooo merged commit 6ce1b4d into xbmc:master Jun 2, 2015
@ksooo ksooo deleted the trac15942 branch June 3, 2015 07:37
@opdenkamp
Copy link
Member

broke due to earlier refactoring, I suppose when the shared ptrs were introduced here. the idea was that the same thread that registered add-ons were flipping the pointers, and then the rest would be safe to read unlocked. looks like that broke, and this "fix" is a work around for 1 of the situations that can occur now only.

@ksooo
Copy link
Member Author

ksooo commented Jun 6, 2015

@opdenkamp you are right. Got the crash today with latest master where this PR already got merged. Must have been one of the other cases you mentioned. :-(

@MilhouseVH
Copy link
Contributor

Also had a report on the forum using a tip of master build on RPi/OpenELEC.

crash log

@ksooo
Copy link
Member Author

ksooo commented Jun 7, 2015

#7239

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: PVR Type: Fix non-breaking change which fixes an issue v15 Isengard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants