CXBMCTinyXML: use fast direct file reading #1445

Merged
merged 16 commits into from Aug 6, 2013

Projects

None yet

6 participants

@Karlson2k
Member

This is a future improvement of #1301, now with some numbers:

Some test:
original code, 1126 KB XML file, 5 times "create object, open file":
6001 milliseconds
5573 milliseconds
6613 milliseconds
5746 milliseconds
5589 milliseconds
5032 milliseconds
6703 milliseconds
5511 milliseconds
5739 milliseconds
5950 milliseconds
6388 milliseconds
6011 milliseconds

patched code,1126 KB XML file, 5 times "create object, open file":
2528 milliseconds
2065 milliseconds
2086 milliseconds
2423 milliseconds
2122 milliseconds
2490 milliseconds
2122 milliseconds
1875 milliseconds
1897 milliseconds
2074 milliseconds
1935 milliseconds
1859 milliseconds
2051 milliseconds
Patched code is faster by 60-70%.

original code, 516 B XML file, 10000 times "create object, open file":
7605 milliseconds
5874 milliseconds
7998 milliseconds
7590 milliseconds
8030 milliseconds
8388 milliseconds
5789 milliseconds
7849 milliseconds
7455 milliseconds
5951 milliseconds
7691 milliseconds

patched code, 516 B XML file, 10000 times "create object, open file":
3511 milliseconds
3058 milliseconds
3116 milliseconds
3044 milliseconds
3034 milliseconds
3052 milliseconds
3033 milliseconds
3265 milliseconds
2935 milliseconds
2894 milliseconds
2919 milliseconds
2909 milliseconds
2934 milliseconds
So on small files we save about 50-60%.

@elupus
Member
elupus commented Sep 19, 2012

If file is 8 gig, this won't work too well. The point of streamin is to avoid reading the full file if it isn't a proper xml.

@elupus
Member
elupus commented Sep 19, 2012

Just put an upper limit on the allocation. like 1 meg or so.

@Karlson2k
Member

@elupus But we will load all file into "data", if xml is OK so 8 gig will not be loaded anyway and full size string will be allocated.
In normal situation we're working with correct XML file, but when using "streamIn", file is actually parsed twice.
"StreamIn" isn't optimized for reading files, I'm working on solution with much more optimized base LoadFile instead of streamIn.

@elupus
Member
elupus commented Sep 22, 2012

Oh right.. this stuff was already broken. Previously we could stream in from a file, and it aborted if it found it wasn't a proper xml. Seem that is not possible anymore. StreamIn function is not even part of public API anymore :/

@MartijnKaijser
Member

@jmarshallnz
Is this something we still want/need?
looking at the test numbers it looks promising

@jmarshallnz
Member

@Karlson2k: Compare the read method in JpegIO.cpp with this - perhaps the one in JpegIO.cpp could be moved to FileUtils.cpp and reused here? It's basically the same idea.

@wsoltys
Member
wsoltys commented Apr 8, 2013

The file read method in jpegio.cpp got obsolete with the implementation of the Image Factory. But IF uses the same code just at a different place (somewhere in textures AFAIR)

Am 08.04.2013 um 09:52 schrieb jmarshallnz notifications@github.com:

@Karlson2k: Compare the read method in JpegIO.cpp with this - perhaps the one in JpegIO.cpp could be moved to FileUtils.cpp and reused here? It's basically the same idea.


Reply to this email directly or view it on GitHub.

@Karlson2k
Member

@wsoltys Do you mean part of code starting from https://github.com/xbmc/xbmc/blob/master/xbmc/guilib/Texture.cpp#L298 ?
Yes, idea is basically the same. And I'm sure it can be used in many more places in XBMC code. So OK, I'll move it to FileUtils.cpp and reuse it in CXBMCTinyXML.

@jmarshallnz
Member

Yup - that part of the code.

@wsoltys OT: It's still in JpegIO.cpp as well by the looks, maybe a leftover?

@wsoltys
Member
wsoltys commented Apr 9, 2013

@jmarshallnz: yes. I left that in because there were some other unused methods in afair and I thought one want to reuse it one time. But they could be misleading indeed.

@MartijnKaijser
Member

ping

@Karlson2k
Member
@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 10, 2013
xbmc/utils/FileUtils.cpp
+#elif defined(TARGET_DARWIN_IOS)
+/* less limited memory, accurate allocation */
+#define MINCHUNKSIZE (32U*1024U)
+#define MAXCHUNKSIZE (512U*1024U)
+#elif defined(TARGET_ANDROID)
+/* avoid large allocation/deallocation */
+#define MINCHUNKSIZE (64U*1024U)
+#define MAXCHUNKSIZE (512U*1024U)
+#elif defined (__arm__) || defined (_M_ARM)
+/* ARM targets usually have less memory than x86 targets */
+#define MINCHUNKSIZE (128U*1024U)
+#define MAXCHUNKSIZE (1024U*1024U)
+#else
+#define MINCHUNKSIZE (128U*1024U)
+#define MAXCHUNKSIZE (2048U*1024U)
+#endif
@jmarshallnz
jmarshallnz May 10, 2013 Member

Honestly, we really don't need all this. Keep it simple with a reasonable initial chunksize (which is only used if we don't know the size of the file anyway) of say 64k, and a max chunksize to avoid memory running away of say 2MB or so.

@theuni
theuni May 11, 2013 Member

And drop the ifdefs. If you can't use numbers available at runtime, you shouldn't be guessing at build-time. My ARM phone has more mem than some of my older desktops.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 10, 2013
xbmc/utils/FileUtils.cpp
+ 3. Some value smaller than the real size of file. This is the case for an expanding file.
+
+ In order to handle all three cases, we read the file in chunks, relying on Read()
+ returning 0 at EOF. To minimize (re)allocation of the buffer, the chunkSize in
+ cases 1 and 3 is set to one byte larger than the value returned by GetLength().
+ The chunksize in case 2 is set to the lowest value larger than MINCHUNKSIZE and aligned
+ to GetChunkSize().
+
+ We fill the buffer entirely before reallocation. Thus, reallocation never occurs in case 1
+ as the buffer is larger than the file, so we hit EOF before we hit the end of buffer.
+
+ To minimize reallocation, we double the chunkSize each read while chunkSize is lower
+ than MAXCHUNKSIZE.
+ */
+ int64_t fileSize = file.GetLength();
+ if (fileSize >= SIZE_MAX)
@jmarshallnz
jmarshallnz May 10, 2013 Member

magic constant not defined in the XBMC codebase?

@Karlson2k
Karlson2k May 11, 2013 Member

Usually defined in stdint.h (limits.h on windows).
On windows header is already included, not sure about it on other platforms.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 10, 2013
xbmc/utils/FileUtils.cpp
+
+ if (totalRead + 1 < inputBuffSize)
+ {
+ unsigned char *tempPtr = (unsigned char *)realloc(inputBuff, (size_t)totalRead);
+ if (!tempPtr)
+ {
+ CLog::Log(LOGERROR, "%s unable to reallocate buffer for file \"%s\"", __FUNCTION__, filename.c_str());
+ free(inputBuff);
+ return -1;
+ }
+ }
+
+ outputBuffer = (void *) inputBuff;
+ return totalRead;
+}
+
@jmarshallnz
jmarshallnz May 10, 2013 Member

What was wrong with the original routine that made you decide you needed to make it 10 times more complicated? Keep code simple, even if very slightly inefficient, simpler is better.

@Karlson2k
Karlson2k May 11, 2013 Member

Original code didn't work with large file (no check for size overflow in several places) and didn't align chunk size to file chink size.
OK, I'll simplify code.

@jmarshallnz
Member

Please split the Add new function to FileUtils into at least 2 commits:

  1. Move the existing code in Texture.cpp.
  2. Add your improvements on top.

That way it's easy to check for screwups.

@jmarshallnz
Member

Code looks good other than those minor comments. Nice find on CFile::GetChunkSize() :)

@Karlson2k
Member

@jmarshallnz Done, with splitting.

@jmarshallnz
Member

Note that JpegIO.cpp still uses the old method (c+p) as well I think.

@Karlson2k
Member

At this leftover was intentional, I decide to left it as is.
Remove it too?

@jmarshallnz
Member

Just leave it - I can easily just chop it out later.

@Karlson2k
Member

@jmarshallnz Done, hopefully good now.

@jmarshallnz
Member

Thanks, but still not quite what I was intending. Have done up a branch based on yours here:

https://github.com/jmarshallnz/xbmc/tree/win32_xml_fix

This has all your commits minus a few things I thought weren't really needed, plus a couple of other cosmetics.

@Karlson2k
Member

@jmarshallnz Nice, thanks!
Is "while(true) -> if (!read) break" better than "do while(read)"?

@jmarshallnz
Member

I don't have a preference personally - the advantage of while (true) {} is that it's generally more familiar to most programmers than do...while(), as well as restricting the read param to within the loop. Minor considerations perhaps, but it's not as if it increases the amount of code or introduces inefficiences.

@Karlson2k
Member

OK, remove bf11261a42bfd55411f7f63dc541c0d45e5d355e ?

@jmarshallnz
Member

Yeah IMO i doesn't really add anything.

On Mon, May 13, 2013 at 9:10 PM, Karlson2k notifications@github.com wrote:

OK, remove bf11261bf11261?


Reply to this email directly or view it on GitHubhttps://github.com/xbmc/xbmc/pull/1445#issuecomment-17800577
.

@Karlson2k
Member

@jmarshallnz Done.

@MartijnKaijser
Member

ping

@Karlson2k
Member

Rebased, one cosmetics commit added.

@MartijnKaijser
Member

jenkins build this please

@MartijnKaijser
Member

I would like to get this in. Using it for several days now without any noticeable issues. embedded platform should benefit from this.
Any code comments left before pulling in?

@Karlson2k
Member

@MartijnKaijser I think it's ready to go.

@MartijnKaijser MartijnKaijser merged commit bd5f5bd into xbmc:master Aug 6, 2013

1 check passed

default Build #49 succeeded in 39 min
Details
@Karlson2k Karlson2k deleted the Karlson2k:win32_xml_fix branch Oct 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment