Improve `read_numeric()` to vastly increase `parse()` performance for all tags #150

MestreLion · 2021-10-27T14:56:52Z

When doing some profiling loading NBT files, trying to optimize loading times, read_numeric() stands at the top by a large margin. Taking a closer look at it, it seems this is the culprit:

def get_format(fmt, string):
    """Return a dictionary containing a format for each byte order."""
    return {"big": fmt(">" + string), "little": fmt("<" + string)}

BYTE = get_format(Struct, "b")
SHORT = get_format(Struct, "h")
...
def read_numeric(fmt, fileobj, byteorder="big"):
    """Read a numeric value from a file-like object."""
    try:
        fmt = fmt[byteorder]
        return fmt.unpack(fileobj.read(fmt.size))[0]
        ...

And that is universally used in all tag classes using a similar pattern:

tag_id = read_numeric(BYTE, fileobj, byteorder)
length = read_numeric(INT, fileobj, byteorder)
tag = cls.get_tag(read_numeric(BYTE, fileobj, byteorder))
data = fileobj.read(read_numeric(INT, fileobj, byteorder) * item_type.itemsize)
...

The problem is: read_numeric creates a new Struct instance on every read. That is a very expensive operation. There should probably be a way to pre-build (or cache) such instances, so either read_numeric or get_format or even BYTE/INT... contain/return the same struct instances, while still keeping the ability to select byteorder on a per-call basis.

I can submit a PR to fix this, and I'm sure reading (and writing) times will vastly improve. I'll do so in a way it does not change the API of any of the tag classes (i.e, keep Compound.parse(cls, fileobj, byteorder="big") signature for all write/parse of all tags), and possibly keep read_numeric() signature too (so no changes to the Tag classes at all), but most likely get_format() will change signature and/or internal structure, and the underlying BYTES/INT/... will most likely change their internal values, but I'll do my best to keep them still byteorder-agnostic constants .

Is such improvement welcome?

The text was updated successfully, but these errors were encountered:

MestreLion · 2021-10-27T15:11:57Z

I just noticed that pre-made Struct instances are already saved on BYES/INT/..., so read_numeric() is not creating new instances per-call. Great!

But, still, are improvements to this crucial function welcome?

vberlier · 2021-11-03T11:02:04Z

At runtime, read_numeric should only perform a dictionary lookup to grab the appropriate struct format and then read and unpack the data. Of course it's in the hot path when parsing so performance improvements would be very welcome but I'm not sure if there's any opportunity for easy wins here. But feel free to experiment with it if you have something in mind!

MestreLion · 2021-11-03T14:34:59Z

That's why I created the benchmarks with other NBT implementations... no point doing experiments if I can't accurately measure the gains. And little point trying to improve what is already pretty damn good. My initial assumption that it was slow and could be "vastly improved" turned out to be wrong.

But still, one experiment I might try is to use an (attribute?) assignment once per File|Root.parse() that sets the endianness, instead of a "run-time" dictionary lookup for every tag. So when Compound.parse() says read_numeric(BYTE, ...), that BYTE would not be a big/little dictionary anymore, but already one of those values/Structs. The job of fmt[endian] would have already being performed by File. read_numeric would take not a dict, but a Struct (or whatever) of a given endianness that was set prior to that. And Compound, as now, would be completely unaware of all of this.

The point is that there is little point allowing endianess to be set on a per-tag basis. Either the whole file is little endian or big endian, so we can take advantage of this assumption.

Humm, perhaps Compound would have to be a little aware, as it may have to use self.BYTE instead of a module-wise BYTE. Humm, class attribute lookup. Bad tradeoff?

Benchmarks. We need benchmarks.

Or skip all of that and go Cython. Please!

MestreLion · 2021-11-14T08:10:36Z

An interesting optimization approach taken by Minecraft: it caches all 256 possible Byte values as pre-built instances.

MestreLion changed the title ~~read_numeric() is way more inefficient than it could be~~ Improve read_numeric() to vastly increase parse() performance for all tags Oct 27, 2021

vberlier mentioned this issue Nov 2, 2021

nbtlib 2.0 #156

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `read_numeric()` to vastly increase `parse()` performance for all tags #150

Improve `read_numeric()` to vastly increase `parse()` performance for all tags #150

MestreLion commented Oct 27, 2021 •

edited

Loading

MestreLion commented Oct 27, 2021

vberlier commented Nov 3, 2021

MestreLion commented Nov 3, 2021

MestreLion commented Nov 14, 2021

Improve read_numeric() to vastly increase parse() performance for all tags #150

Improve read_numeric() to vastly increase parse() performance for all tags #150

Comments

MestreLion commented Oct 27, 2021 • edited Loading

MestreLion commented Oct 27, 2021

vberlier commented Nov 3, 2021

MestreLion commented Nov 3, 2021

MestreLion commented Nov 14, 2021

Improve `read_numeric()` to vastly increase `parse()` performance for all tags #150

Improve `read_numeric()` to vastly increase `parse()` performance for all tags #150

MestreLion commented Oct 27, 2021 •

edited

Loading