Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of memory crash in CArchive #9225

Merged
merged 3 commits into from Jul 12, 2016
Merged

Fix out of memory crash in CArchive #9225

merged 3 commits into from Jul 12, 2016

Conversation

Paxxi
Copy link
Member

@Paxxi Paxxi commented Feb 27, 2016

This should fix this issue

Thread 1 (Thread 0x7f6369ff3700 (LWP 13304)):
#0  0x00007f63da306cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f63da30a0d8 in __GI_abort () at abort.c:89
#2  0x00007f63da90b535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f63da9096d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f63da909703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f63da909922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f63da909e0d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f63da909ea9 in operator new[](unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x0000000000ff265c in CArchive::operator>>(std::string&) ()
#9  0x0000000000e6fb6f in CFileItemList::Archive(CArchive&) ()
#10 0x0000000000ff19ff in CArchive::operator>>(IArchivable&) ()
#11 0x0000000000e67bb3 in CFileItemList::Load(int) ()
#12 0x0000000000f5942f in UPNP::CUPnPServer::OnBrowseDirectChildren(NPT_Reference<PLT_Action>&, char const*, char const*, unsigned int, unsigned int, char const*, PLT_HttpRequestContext const&) ()

size_t is variable sized and if someone copies userdata from a 32-bit to a 64-bit system some string sizes could be quite large as 4 bytes from the string are added onto the size.

I limited everything to 32bit as we likely won't need large items in our archives, also it seems safest compatibility wise.

@Paxxi Paxxi added the Type: Fix non-breaking change which fixes an issue label Feb 27, 2016
@Paxxi
Copy link
Member Author

Paxxi commented Feb 27, 2016

@fritsch what do you think?

CArchive::CArchive(CFile* pFile, int mode)
{
m_pFile = pFile;
m_iMode = mode;

m_pBuffer = new uint8_t[CARCHIVE_BUFFER_MAX];
memset(m_pBuffer, 0, CARCHIVE_BUFFER_MAX);
m_pBuffer = std::make_unique<uint8_t[]>(CARCHIVE_BUFFER_MAX);

This comment was marked as spam.

This comment was marked as spam.

@Paxxi
Copy link
Member Author

Paxxi commented Feb 27, 2016

jenkins build this please

@fritsch
Copy link
Member

fritsch commented Feb 27, 2016

@Paxxi thx for looking into that!

@Paxxi
Copy link
Member Author

Paxxi commented Feb 28, 2016

jenkins build this please

@tamland you were right, we should up the level to c++14 on non-windows so we can use make_unique

@anaconda gave me some valuable info on why the issue arises and that got me thinking that we should redesign the format a bit.
I added a header tag as some basic verification and following that a version tag. The version is checked against the version in the interface, in the case of a mismatch an exception is thrown.
@MilhouseVH could you include this in your builds?

@Paxxi Paxxi added the Type: Improvement non-breaking change which improves existing functionality label Feb 28, 2016
@MilhouseVH
Copy link
Contributor

@MilhouseVH could you include this in your builds?

I can, although I already delete the cache files every time Kodi is started so there may be no benefit, or testing of whatever needs testing, unless I stop deleting the caches... which presumably this is meant to fix (make unnecessary)?

@MilhouseVH
Copy link
Contributor

I could be totally off-base here: adding a version (which is hard coded - shouldn't it be in version.txt?) is great, but remembering to bump the version just seems to be asking for future problems... what about basing the version on something unique about the version of Kodi that created the cache (hash of (platform + git hash + build date/time + filesize), maybe even MAC address) - then if a different version of Kodi is installed (or the cache files copied to a different device) they automatically won't be re-used.

@Paxxi
Copy link
Member Author

Paxxi commented Feb 28, 2016

@MilhouseVH yes this is supposed to make your cache deletion obsolete.

Having a version tag as you describe is more robust but also more work to implement and debug, I feel this is a reasonable compromise to protect against bad things and simplicity. It also allows for independent versioning of archives, not sure if it's worth anything in the real world though.

Hopefully people can get in the habit of keeping it up to date like we do with the db version numbers.

@MilhouseVH
Copy link
Contributor

Maybe Kodi needs a "GUID" property that you could access. Not sure if such a thing is needed for UPnP?

I'll drop the cache-clearing patch in the next build.

@wsnipex
Copy link
Member

wsnipex commented Mar 9, 2016

I'm pretty sure bumping this version will be forgotten very soon. Something automatic like @MilhouseVH suggested would really make sense.

@Paxxi
Copy link
Member Author

Paxxi commented Mar 10, 2016

After thinking about it after @MilhouseVH comments, yes it needs something better, something that's fixed length. Also the current design is bad and only partially solves it. Need to to rework it and try again

@MilhouseVH
Copy link
Contributor

OK I'll drop this form my builds and go back to my old delete-cache-on-start scheme - let me know if/when you have something else to test!

@popcornmix
Copy link
Member

CSysInfo::GetVersion() returns something like 17.0-ALPHA1 Git:2016-03-10-47675e4. Using that as a version check for the cache files seems like a reasonable option.
Just discard the cache files if the version of the cached file doesn't match the running version.

@Montellese
Copy link
Member

This issue has come up by some user who upgraded from 15.X to 16.0, see http://trac.kodi.tv/ticket/16668.

@ProfYaffle
Copy link

Merely commenting as I'm the originator of the ticket mentioned above - ill-informed opinion only, therefore!

From my perspective, doesn't this all depend on how much effort it takes to regenerate the cache? Making it persistent through restarts and upgrades is fine if it's a significant cost to regenerate - otherwise, just delete it as above and treat it just as in-flight/runtime only.

That's particularly the case for version upgrades if that's going to present a problem (as it did for me). I really don't see any downside with deleting and forcing a rebuild of a temp file after either major or minor upgrades - we already have a first-run pause to upgrade db tables after a major version bump, and it's acceptable to me to do some other housekeeping at the same time (maybe that's what the version-hash conversation above implies: identify a version change, trigger a deletion. Perhaps piggy-back onto a broader "it's an upgrade!" routine? Do you ever get a db upgrade without a version change, for example?).

I suppose it also depends on whether the cache files ever expire naturally and what else triggers a rebuild. If they ever go stale and get rebuilt, then that answers the "cost" question and forcing a rebuild more often or under other circumstances is fine.

Thanks for everything you guys do, though. Much appreciated.

</ $0.02 worth>

@Paxxi
Copy link
Member Author

Paxxi commented Jul 6, 2016

@ProfYaffle sorry about the radio silence, ran out of free time and forgot about this.
All valid points.
Recreating the list of file items can be expensive as we currently don't do any lazy loading so for large lists it can be noticeable impact.
Version upgrades is not the only issue to protect against. Because we serialize types which can vary in size between platforms it can crash even between same version if moving the whole userdata folder. This is part of this PR but a separate issue and could be fixed separately.

I agree that getting rid of the caching would be the best solution but it's probably a fair bit of work to speed up the file listing enough to consider it so fixing the cache is the short term stop gap solution for now.

I really should try and get some time to fix this before v17 is due.

@DaveTBlake
Copy link
Member

Having been caught out (yet again) with old cache causing odd behaviour, could not the simple solution of clearing the cache as part of installation of Kodi be implemented? Maybe not so simple, not an area I am familiar with. Or perhaps as LE does clearing cache on start-up?

Version control of cache sounds way too complicated when it could simply be deleted and recreated when (and if) needed. Cache is temporary after all. And we really do need something in v17 or users upgrading from Jarvis will hit problems and not everyone can cope with deleting *.fi files.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 8, 2016

@daveblake deleting the cache only solves some issues, imagine someone restoring a userdata folder from before an upgrade boom, copy userdata to another system boom copy to another system with different version boom, also it's not very complicated, just need to do the work

@MilhouseVH
Copy link
Contributor

imagine someone restoring a userdata folder from before an upgrade boom, copy userdata to another system boom copy to another system with different version boom

Not a problem if *.fi is deleted at application startup.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 8, 2016

@MilhouseVH true but that somewhat diminishes the use of the cache.
Anyway I have a redesigned solution now that's simpler and hopefully should solve all of the issues

@Montellese
Copy link
Member

Personally I think the cache should not survive restarts. For me it's ok to wait a few seconds for a list once after having started Kodi.
And copying cached userdata from a different machine which might be 10 times slower than the new machine doesn't make any sense.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 8, 2016

What makes sense and what people do are not the same thing :) but concensus so far seems to be nuke the cache so I'll give it another go

@MartijnKaijser
Copy link
Member

In the very remote past the temp folder also got nuked after restart and it seems that has been "broken" along the way.

@Montellese
Copy link
Member

@Paxxi: agreed but I don't think users copy the cache on purpose. It just gets copied along with the userdata they care about (database, profiles, ...).

…s to avoid crazy memory allocations.

Also add some rudimentary boundary checking to avoid crashing on corrupt archives
@Paxxi
Copy link
Member Author

Paxxi commented Jul 10, 2016

To keep it simple I changed that path for *.fi files to live in special://temp/archive_cache and at startup we nuke that folder and recreate it empty.

@MilhouseVH
Copy link
Contributor

As a name, archive_cache suggests long term storage (ie. archive). Maybe session_cache would be more accurate if it is being nuked each time the app is started.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 11, 2016

Picked the name because there's only CArchive serialized stuff on it

@DaveTBlake
Copy link
Member

CArchive is probably badly named too! Still prefer Milhouse's suggestion, but hey not going to argue over a name. Great to get the cache nuked :)

@fritsch
Copy link
Member

fritsch commented Jul 11, 2016

I don't care about the name too much, but much more about this bugfix. As
it is technically correct, the name can be easily changed with a follow PR
by whoever wants.

2016-07-11 10:40 GMT+02:00 Dave Blake notifications@github.com:

CArchive is probably badly named too! Still prefer Milhouse's suggestion,
but hey not going to argue over a name. Great to get the cache nuked :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9225 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABCfHcdlZo_OuXjk3NJG7XSkDq41eCh8ks5qUgFigaJpZM4HkiqL
.

               Key-ID:     0x1A995A9B
               keyserver: pgp.mit.edu

Fingerprint: 4606 DA19 EC2E 9A0B 0157 C81B DA07 CF63 1A99 5A9B

@Paxxi
Copy link
Member Author

Paxxi commented Jul 11, 2016

How about "transient files with a heightened risk of deletion"? :)
Session cache can be interpreted in the sense of browsers and users wondering why we delete their sessions all the time. In short naming is hard

@DaveTBlake
Copy link
Member

It is in a folder called "temp" doesn't that say it all? Everything under temp is temporary :)

Maybe I'm old fashioned but long folder names with spaces seem very naff to me. I get you want a folder to delete rather than have the *.fi under temp folder directly, so just call it "cache" maybe?

@Paxxi
Copy link
Member Author

Paxxi commented Jul 12, 2016

jenkins build this please

@Paxxi Paxxi merged commit 89ef699 into xbmc:master Jul 12, 2016
@Paxxi Paxxi deleted the archive branch July 12, 2016 20:35
@Paxxi Paxxi added this to the Krypton 17.0-alpha3 milestone Jul 12, 2016
@MilhouseVH
Copy link
Contributor

This PR is responsibel for all artwork being cached as f/ffffffff.jpg (or f/ffffffff.png).

With this PR:

000001|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:15|2016-07-13 19:32:15|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/The Adjustment Bureau (2011)[BDRip]-poster.jpg
000002|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:15|2016-07-13 19:32:15|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Classics/12 Angry Men (1957)[BDRip]-poster.jpg
000003|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:15|2016-07-13 19:32:15|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Transformers Collection/Transformers Age of Extinction (2014)[BDRip]-poster.jpg
000004|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:15|2016-07-13 19:32:15|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Rocky Collection/Creed (2015)[BDRip]-poster.jpg
000005|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:15|2016-07-13 19:32:15|nfs://192.168.0.3/mnt/share/media/Video/MoviesSD/Amy's Films/Paper Towns (2015)[BDRip]-poster.jpg
000006|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:16|2016-07-13 19:32:16|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Sunshine on Leith (2013)[BDRip]-poster.jpg
000007|f/ffffffff.jpg|0720|0480|0001|2016-07-13 18:32:16|2016-07-13 19:32:16|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/The Lady in the Van (2015)[BDRip]-poster.jpg

Without this PR:

000001|b/b2566e2f.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/The Adjustment Bureau (2011)[BDRip]-poster.jpg
000002|c/c48961be.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Classics/12 Angry Men (1957)[BDRip]-poster.jpg
000003|e/ea03a928.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Transformers Collection/Transformers Age of Extinction (2014)[BDRip]-poster.jpg
000004|6/62e4595f.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Rocky Collection/Creed (2015)[BDRip]-poster.jpg
000005|a/a6bb3aad.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesSD/Amy's Films/Paper Towns (2015)[BDRip]-poster.jpg
000006|c/cee61e74.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Sunshine on Leith (2013)[BDRip]-poster.jpg
000007|8/86db36fe.jpg|0720|0480|0001|2016-07-13 18:30:45|2016-07-13 19:30:45|nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/The Lady in the Van (2015)[BDRip]-poster.jpg

Tested on Ubuntu, LibreELEC RPi2 and LibreELEC Generic.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 13, 2016

fix in the works

@MilhouseVH
Copy link
Contributor

MilhouseVH commented Jul 28, 2016

Is this working (nuking the cache at startup?)

I've rebooted my RPi2 system many times since 24 July but this file remains:

rpi22:~ # ls -la /storage/.kodi/temp/archive_cache/
total 112
drwxr-xr-x    2 root     root          4096 Jul 28 21:31 .
drwxr-xr-x    6 root     root          4096 Jul 28 21:34 ..
-rw-r--r--    1 root     root        103419 Jul 24 03:06 vdb-57406aea.fi
rpi22:~ # date
Thu Jul 28 21:35:45 BST 2016

The above cache is stale which has been causing problems with the TV view (new series added yesterday wasn't appearing in the "TV shows" list, but was appearing in "Recently added").

I had to manually remove /storage/.kodi/temp/archive_cache/* and restart Kodi to fix the problem.

I've tested with latest master (84d55c1) in builds of LibreELEC RPi2 (build #728) and also Ubuntu x86_64. Neither is nuking archive_cache.

If I touch ~/.kodi/temp/archive_cache/vdb-57406aea.fi then restart Kodi (in LE or Ubuntu), the vdb-57406aea.fi file will still be there when Kodi has restarted so it doesn't look like the archive_cache is being nuked.

@MilhouseVH
Copy link
Contributor

From Ubuntu:

22:16:11 T:140346572294528  NOTICE: Host CPU: AMD FX(tm)-8350 Eight-Core Processor, 8 cores available
22:16:11 T:140346572294528  NOTICE: special://xbmc/ is mapped to: /home/neil/projects/kodi.git
22:16:11 T:140346572294528  NOTICE: special://xbmcbin/ is mapped to: /home/neil/projects/kodi.git
22:16:11 T:140346572294528  NOTICE: special://masterprofile/ is mapped to: /home/neil/.kodi/userdata
22:16:11 T:140346572294528  NOTICE: special://envhome/ is mapped to: /home/neil
22:16:11 T:140346572294528  NOTICE: special://home/ is mapped to: /home/neil/.kodi
22:16:11 T:140346572294528  NOTICE: special://temp/ is mapped to: /home/neil/.kodi/temp
22:16:11 T:140346572294528  NOTICE: special://logpath/ is mapped to: /home/neil/.kodi/temp
22:16:11 T:140346572294528  NOTICE: The executable running is: /home/neil/projects/kodi.git/kodi.bin
22:16:11 T:140346572294528  NOTICE: Local hostname: nm-linux

If I delete archive_cache with rm -fr /home/neil/.kodi/temp/archive_cache then it will be created when Kodi starts, so it appears that

  CDirectory::Create(archiveCachePath);

is working as expected.

However, when the archive_cache directory already exists it is not being deleted when Kodi is started, so it would seem that:

  CDirectory::Remove(archiveCachePath);

is not working as expected, or perhaps not working at all - I tried creating archive_cache as a file (rather than directory) with touch /home/neil/.kodi/temp/archive_cache and then started Kodi but the archive_cache file remained rather than be deleted then recreated as a directory.

@MilhouseVH
Copy link
Contributor

MilhouseVH commented Jul 28, 2016

OK, the problem appears to be that CDirectory::Remove(archiveCachePath); won't remove a directory that already contains files - https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/Directory.cpp#L333 will return false when the directory contains one or more files.

Although this should be logged as an error, https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/Directory.cpp#L344, nothing is actually logged, presumably because the logging system isn't fully initialised at this point so the error is silently ignored.

Note also that CDirectory::Remove(archiveCachePath); will not delete archive_cache when it is an ordinary file, which is an extremely unlikely (but potential) issue, and I don't know what or where the caching system will try and create its cache files in such a scenario.

Edit: I believe this to be the crux of the matter. When the archive_cache directory contains one or more files, rmdir() in https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/posix/PosixDirectory.cpp#L126 will - on Ubuntu - fail with ENOTEMPTY (see http://pubs.opengroup.org/onlinepubs/009695399/functions/rmdir.html). rmdir() will also fail with ENOTDIR - not a directory - when deleting an ordinary file.

So it looks like the CPosixDirectory::Remove() method needs to support removal of directories that are not empty OR the directory must be emptied of files using some other method (there'd then be no point calling CDirectory::Remove(), once the directory is already empty). Deletion of any ordinary file called archive_cache would be a nice-to-have, but not essential.

@Paxxi
Copy link
Member Author

Paxxi commented Jul 29, 2016

What I get for making assumptions about the method instead of reading the code. Thanks for the investigation, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Fix non-breaking change which fixes an issue Type: Improvement non-breaking change which improves existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet