Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve read_numeric() to vastly increase parse() performance for all tags #150

Open
Tracked by #156
MestreLion opened this issue Oct 27, 2021 · 4 comments
Open
Tracked by #156

Comments

@MestreLion
Copy link
Contributor

MestreLion commented Oct 27, 2021

When doing some profiling loading NBT files, trying to optimize loading times, read_numeric() stands at the top by a large margin. Taking a closer look at it, it seems this is the culprit:

def get_format(fmt, string):
    """Return a dictionary containing a format for each byte order."""
    return {"big": fmt(">" + string), "little": fmt("<" + string)}

BYTE = get_format(Struct, "b")
SHORT = get_format(Struct, "h")
...
def read_numeric(fmt, fileobj, byteorder="big"):
    """Read a numeric value from a file-like object."""
    try:
        fmt = fmt[byteorder]
        return fmt.unpack(fileobj.read(fmt.size))[0]
        ...

And that is universally used in all tag classes using a similar pattern:

tag_id = read_numeric(BYTE, fileobj, byteorder)
length = read_numeric(INT, fileobj, byteorder)
tag = cls.get_tag(read_numeric(BYTE, fileobj, byteorder))
data = fileobj.read(read_numeric(INT, fileobj, byteorder) * item_type.itemsize)
...

The problem is: read_numeric creates a new Struct instance on every read. That is a very expensive operation. There should probably be a way to pre-build (or cache) such instances, so either read_numeric or get_format or even BYTE/INT... contain/return the same struct instances, while still keeping the ability to select byteorder on a per-call basis.

I can submit a PR to fix this, and I'm sure reading (and writing) times will vastly improve. I'll do so in a way it does not change the API of any of the tag classes (i.e, keep Compound.parse(cls, fileobj, byteorder="big") signature for all write/parse of all tags), and possibly keep read_numeric() signature too (so no changes to the Tag classes at all), but most likely get_format() will change signature and/or internal structure, and the underlying BYTES/INT/... will most likely change their internal values, but I'll do my best to keep them still byteorder-agnostic constants .

Is such improvement welcome?

@MestreLion MestreLion changed the title read_numeric() is way more inefficient than it could be Improve read_numeric() to vastly increase parse() performance for all tags Oct 27, 2021
@MestreLion
Copy link
Contributor Author

I just noticed that pre-made Struct instances are already saved on BYES/INT/..., so read_numeric() is not creating new instances per-call. Great!

But, still, are improvements to this crucial function welcome?

@vberlier vberlier mentioned this issue Nov 2, 2021
9 tasks
@vberlier
Copy link
Owner

vberlier commented Nov 3, 2021

At runtime, read_numeric should only perform a dictionary lookup to grab the appropriate struct format and then read and unpack the data. Of course it's in the hot path when parsing so performance improvements would be very welcome but I'm not sure if there's any opportunity for easy wins here. But feel free to experiment with it if you have something in mind!

@MestreLion
Copy link
Contributor Author

That's why I created the benchmarks with other NBT implementations... no point doing experiments if I can't accurately measure the gains. And little point trying to improve what is already pretty damn good. My initial assumption that it was slow and could be "vastly improved" turned out to be wrong.

But still, one experiment I might try is to use an (attribute?) assignment once per File|Root.parse() that sets the endianness, instead of a "run-time" dictionary lookup for every tag. So when Compound.parse() says read_numeric(BYTE, ...), that BYTE would not be a big/little dictionary anymore, but already one of those values/Structs. The job of fmt[endian] would have already being performed by File. read_numeric would take not a dict, but a Struct (or whatever) of a given endianness that was set prior to that. And Compound, as now, would be completely unaware of all of this.

The point is that there is little point allowing endianess to be set on a per-tag basis. Either the whole file is little endian or big endian, so we can take advantage of this assumption.

Humm, perhaps Compound would have to be a little aware, as it may have to use self.BYTE instead of a module-wise BYTE. Humm, class attribute lookup. Bad tradeoff?

Benchmarks. We need benchmarks.

Or skip all of that and go Cython. Please!

@MestreLion
Copy link
Contributor Author

An interesting optimization approach taken by Minecraft: it caches all 256 possible Byte values as pre-built instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants