Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root creates an spurious Compound that is not in actual NBT data #145

Open
Tracked by #156
MestreLion opened this issue Oct 19, 2021 · 19 comments
Open
Tracked by #156

Root creates an spurious Compound that is not in actual NBT data #145

MestreLion opened this issue Oct 19, 2021 · 19 comments

Comments

@MestreLion
Copy link
Contributor

I've mentioned this a few years back, and now I'm familiar enough with NBT in binary form to raise this again, more confident that this is indeed a design bug in this awesome library.

Currently, when loading an NBT file like this (uncompressed for clarity):

00000000: 0a00 000b 0008 506f 7369 7469 6f6e 0000  ......Position..
00000010: 0002 ffff fff0 ffff ffef 0300 0b44 6174  .............Dat
00000020: 6156 6572 7369 6f6e 0000 0aaa 0900 0845  aVersion.......E
00000030: 6e74 6974 6965 730a 0000 0001 0900 064d  ntities........M
00000040: 6f74 696f 6e06 0000 0003 0000 0000 0000  otion...........
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0200 0648 6561 6c74 6800 0501 000c  .....Health.....
00000070: 496e 7675 6c6e 6572 6162 6c65 0002 0003  Invulnerable....
00000080: 4169 7201 2c01 0008 4f6e 4772 6f75 6e64  Air.,...OnGround
00000090: 0003 000e 506f 7274 616c 436f 6f6c 646f  ....PortalCooldo
000000a0: 776e 0000 0000 0900 0852 6f74 6174 696f  wn.......Rotatio
000000b0: 6e05 0000 0002 4191 f848 0000 0000 0500  n.....A..H......
000000c0: 0c46 616c 6c44 6973 7461 6e63 6500 0000  .FallDistance...
000000d0: 000a 0004 4974 656d 0800 0269 6400 0e6d  ....Item...id..m
000000e0: 696e 6563 7261 6674 3a73 616e 6401 0005  inecraft:sand...
000000f0: 436f 756e 7401 0009 0003 506f 7306 0000  Count.....Pos...
00000100: 0003 c06f e0cb b28e cfd4 404b e000 0000  ...o......@K....
00000110: 0000 c070 8aaa 4f56 97f6 0200 0b50 6963  ...p..OV.....Pic
00000120: 6b75 7044 656c 6179 0000 0200 0446 6972  kupDelay.....Fir
00000130: 6500 0008 0002 6964 000e 6d69 6e65 6372  e.....id..minecr
00000140: 6166 743a 6974 656d 0b00 0455 5549 4400  aft:item...UUID.
00000150: 0000 045c 04d8 3949 9646 96b4 0160 7235  ...\..9I.F...`r5
00000160: acc2 cf02 0003 4167 6513 e900 00         ......Age....

We have this result:

{
    "": {
        Position: [I; -16, -17], 
        DataVersion: 2730, 
        Entities: [
            {
                Motion: [0.0d, 0.0d, 0.0d], 
                Health: 5s, 
                Invulnerable: 0b, 
                Air: 300s, 
                OnGround: 0b, 
                PortalCooldown: 0, 
                Rotation: [18.246231079101562f, 0.0f], 
                FallDistance: 0.0f, 
                Item: {
                    id: "minecraft:sand", 
                    Count: 1b
                }, 
                Pos: [-255.02486541645942d, 55.75d, -264.6665795691073d], 
                PickupDelay: 0s, 
                Fire: 0s, 
                id: "minecraft:item", 
                UUID: [I; 1543821369, 1234585238, -1274978190, 900514511], 
                Age: 5097s
            }
        ]
    }
}

It looks like we have a Compound as root, and then another (unnamed) Compound inside it. But that's not true, the binary data clearly shows there is a single (unnamed) Compound in the beginning. So the actual parsing result should be:

{
    Position: [I; -16, -17], 
    DataVersion: 2730, 
    Entities: [
        {
            Motion: [0.0d, 0.0d, 0.0d], 
            Health: 5s, 
            ...
            UUID: [I; 1543821369, 1234585238, -1274978190, 900514511], 
            Age: 5097s
        }
    ]
}

Notice the root name is not represented in the case. And it does not matter, as a tag's name is parsed by (and belongs to) a tag's parent. A tag by itself has no idea about its own name (and tags in lists don't even have one).

Ok, the root tag of an NBT data does have a name, even if 99% (all?) of real world NBT have it empty. But having a name does not make it a Compound with a single child named as itself. This is wrong! If forces some weird syntax to acess the content:

tag = nbtlib.load("somefile.dat")
tag['']["DataVersion"]  # or tag.root["DataVersion"]

Instead of a much simpler (and correct) tag["DataVersion"]

If Root is a Compound, it should not require extra syntax to access its contents. No other tag requires so. If preserving the root name is important for saving/loading integrity, it should be stored elsewhere (Root.name perhaps?)

If displaying this name whenever printing the root tag is needed (why would it be?), then I suggest this format:

"rootname": {
    Position: [I; -16, -17], 
    DataVersion: 2730, 
    Entities: [
        {
            Motion: [0.0d, 0.0d, 0.0d], 
            Health: 5s, 
            ...
            UUID: [I; 1543821369, 1234585238, -1274978190, 900514511], 
            Age: 5097s
        }
    ]
}

This does not imply there are two nested compounds in the beginning. Much cleaner, easier to use, and correctly reflects the NBT data. You can even completely omit the name for empty names, and start right away with { (as my previous example)

This format could also be used to display names for non-compound root tags too. A case that Minecraft seems not to use, and nbtlib does not support, but given the NBT spec there is no technical limitation:

"Age": 5097s

Anyway, allowing root to be a non-compound is a challenge for another day. But for now, removing the fake extra Compound would be very very nice!!!

MestreLion added a commit to MestreLion/mcworldlib that referenced this issue Oct 19, 2021
@MestreLion
Copy link
Contributor Author

The fix itself is quite simple:

class Root(Compound):
    __slots__ = (
        'root_name',
    )

    # Optional, but safer than relying on .parse() to be always invoked
    def __init__(self, root_name: str = "", *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.root_name: str = root_name

    # Delete all other properties and its setters
    # or leave a temporary "DeprecationWarning":
    @property
    def root(self):
        print("Root.root is deprecated, just access its contents directly") 
        return self

    @classmethod
    def parse(cls, buff, byteorder='big'):
        tag_id = read_numeric(BYTE, buff, byteorder)
        if not tag_id == cls.tag_id:
            # Possible issues with a non-Compound root:
            # - root_name might not be in __slots__(), and thus not assignable
            # - not returning a superclass might be surprising and have side-effects
            raise TypeError("Non-Compound root tags is not supported:"
                            f"{cls.get_tag(tag_id)}")
        name = read_string(buff, byteorder)
        self = super().parse(buff, byteorder)
        self.root_name = name
        return self

    def write(self, buff, byteorder='big'):
        write_numeric(BYTE, self.tag_id, buff, byteorder)
        write_string(getattr(self, 'root_name', "") or "", buff, byteorder)
        super().write(buff, byteorder)

The biggest challenging in implementing this API breakage considerations. But if needed I'd happily provide a PR

@vberlier
Copy link
Owner

The documentation, at least at the time, was a bit fuzzy. It said that nbt files are implicitly always compound tags. In python since we have dictionaries nbtlib doesn't see nbt compounds as a collection of named tags, where each tag has an intrisic name, but rather as a proper dictionary where tags don't have any intrisic name. This makes the interpretation of the root compound ambiguous. Since in this implementation tags don't have any intrisic names, we can make the assumption that the file is an implicit compound in the sense that we can assume it's prefixed with the TAG_Compound marker. This makes the root name a key of the implicit compound.

That's the reasoning that led to the way this is currently handled, but I know this just ends up adding unnecessary noise so I'd be happy to get rid of it. But yeah due to API breakage this would need to be published in a 2.0 release.

@MestreLion
Copy link
Contributor Author

While I'm still digesting your great points, and elaborating an extended answer, I've noticed you dropped Root class as a whole. Just to note this wasn't the proposed fix to the spurious Compound. File, as it is, will still create one.

And Root is still welcome as a class, because some tags, namely the chunks in an mca file, are not a File but still not a regular Compound.

@MestreLion
Copy link
Contributor Author

There are 2 important and distinct design choices involved here, both are open to interpretation by the "official" docs:

1 - Who "owns" a tag's name? Itself or its parent?
2 - Is the root tag of an NBT data always a Compound?

For 1, some libraries, such as the old pymclevel and its Amulet-NBT successor, adopted the "tags have their own names" direct interpretation. It makes the API really clunky and awkward to use, having to set tag.name and the infamous tag.value everywhere.

nbtlib OTOH, thankfully adopted another interpretation: like in Python dicts, a tag's name is defined in (and by) its parent. A tag itself is unaware of its own name. Tags in a list don't even have names. And that's true in the NBT format too. It was a wise design decision that I'm really glad you did, it makes working with nbtlib a real joy, with its seamless and fluid integration with Python's builtins

@MestreLion
Copy link
Contributor Author

MestreLion commented Oct 23, 2021

But a tag not owning its own name leaves nbtlib with a (small) problem: What about the root? In NBT, the root tag (whatever type it is), does have a name. Minecraft might always set it to an empty string, but the data is still there and it has to be parsed.

In nbtlib's data model, that name belongs to the parent. But the root tag has no parent, so what to do with the parsed name? I see 2 options:

  • Discard it. It's always blank anyway. Assume so when writing. It'll work, but it feels uncomfortable to discard data, right?
  • Keep it. Somewhere. Just for the sake of using it again on write(). But where? That was the idea behind the Root class: a wrapper on Compound that simply stores the parsed name. A plain, read/write self.root_name: str attribute.

In my library (and in the example above), I did the latter. Root exists solely for parsing the initial NBT bytes and storing the root name.

Can that name be saved at File instead? Sure. But that would "force" all root tags to be a file. That's fine for .dat files. But a chunk in an mca file is not a file, not it a sense you could Chunk.load(otherchunk.filename). Chunks should not have self.filename. So it makes sense to have an intermediate Root between Fileand Compound, if even just for the sake of allowing a Compound to store a root_name while not having a filename attribute or a .load() method

@MestreLion
Copy link
Contributor Author

So handling a root's name while keeping nbtlib's data model is easy, and can be addressed in many ways. Now for the hard part, the other major design decision:

  • Is the root tag always a Compound?

In practice, yes. And virtually all libraries assume so, in one way or the other. nbtlib assumed so when it created the "extra" Compound in File.parse(): by invoking super().parse(buff, ...) prior to parsing any bytes, it will create a Compound that thinks the root tag is its (only) child (whatever the type root actually is). On write(), that extra Compound will do what Compounds do: loop and write only their children, not themselves. And thus that (sole) child, the actual root tag, gets written, and everything works fine.

@MestreLion
Copy link
Contributor Author

So, NBT-wise, nbtlib is not assuming the root is a compound. If it were something else, it would be handled properly and losslessly. But to do so it wraps the real root in that extra Compound, exposing that awkward tag[''][...] (or tag.root[...]) to clients, something this issue is proposing to eliminate. We should be able to handle a root tag like any other tag, the return value of File.load(...) should be a tag we can directly say tag[...]. Yes, it would be nameless, like all tags are. If you want to retrieve the original name, for whatever reason, go elsewhere. tag.root_name for example. But the root's name should not be an item of the the tag, being root should not create that extra layer.

@MestreLion
Copy link
Contributor Author

MestreLion commented Oct 23, 2021

In order to avoid the extra layer and preserve the root name, we must parse it separately from other tags, as each tag's .parse() assume its own name (and tag type!) was already processed by its parent. So someone, File itself or someone else, must read the initial NBT bytes and:

  • Parse the root name, then save or discard it
  • Parse the root tag type, then either:
    • Impose it must be a Compound. That's the Amulet approach. And mine too, in an extent. But I'm not happy with it.
    • Discard it, and on write() assume it was a Compound. Ouch! But, if you did the above, that's no problem.
    • Let it be anything, without creating an extra layer. That would be amazing, but hard to do. Keep reading...
  • Invoke and return the type's .parse(). Seems easy, right?

Problem is... a tag itself not only does not know its own name, it also does not write its own tag type on .write(). So whatever tag was returned by File.load(), it is unable to write itself as a root tag! root.write() would be a broken NBT if root was an ordinary tag. Writing root's tag type is somebody else's job. Its parent's job. And for root, who is that?

We need a way to save a tag, any tag, as root. That is, to write its own type and its own name, unlike other tags. So a root, conceptually, is not an entity by itself, but rather a trait. If we allow non-compounds as root, then "rootness" should be inherited by any tag. Root(or File) in this case is not a Compound subclass, but rather a Mixin class.

@MestreLion
Copy link
Contributor Author

In this scenario, class File/Root inheriting from Compound becomes a problem. It means the root tag is always a Compound. And if we go that road, life is easy and removing the extra layer is simple:

  • File|Root.parse() reads the root tag type, aborts if not a Compound
  • Read and save the name.
  • write() the saved name and its own tag type
  • invoke super().parse(), which is actually Compound.parse(), which will loop and write its children

Life is then good and simple. But we're imposing a restriction on NBT. Spec implicitely says its fine, but technically the NBT data does allow more.

@MestreLion
Copy link
Contributor Author

We need a way to save a tag, any tag, as root. That is, to write its own type and its own name, unlike other tags. So a root, conceptually, is not an entity by itself, but rather a trait. If we allow non-compounds as root, then "rootness" should be inherited by any tag. Root(or File) in this case is not a Compound subclass, but rather a Mixin class.

Just had this crazy (or brilliant) idea: why not make Root behave like your List? I.e., indexing it would create a new class on-the-fly. Just like List[Int] means "A List of type (or "flavor") Int", so would Root[Int] would mean "A Root (tag) of type Int". And there you go: any tag can now be a root tag!

  • Root's metaclass, like List, would restrict the subclasses types to only concrete tags (i.e. no Base, Array, Numeric, or similar generic superclasses)
  • It could, like List, keep a dictionary of the already created subclasses, and reuse them. So all Root[Compound]s would be the same class.
  • Root, as a parent, would be responsible for:
    • Initial NBT parsing of tag type and name in .parse()
    • Create (or reuse) a Root[type] class and instantiate it. tag_type = X (or similar name) would be a class attribute
    • Store the root name in the a root_name instance attribute
    • On .write(), use root_name and tag_type to write the initial bytes, then delegate to super().write()
  • It would not have load() or save() or filename(). that's a task for File
  • File could inherit from Root (base, without any index/subtype)
  • Root, unlike List, would not allow direct instantiation without indexing a subclass
  • It could have an __init__(*args, root_name: str = "", **kwargs) that pass on all other arguments to super()

So, what do you think of this idea?

@MestreLion
Copy link
Contributor Author

I've just found this:

http://web.archive.org/web/20110723210920/http://www.minecraft.net/docs/NBT.txt

An NBT file consists of a single GZIPped Named Tag of type TAG_Compound.

Decoding example:
(Use http://www.minecraft.net/docs/test.nbt to test your implementation)
First we start by reading a Named Tag.
After unzipping the stream, the first byte is a 10. That means the tag is a TAG_Compound (as expected by the specification).

Which I guess settles the question. No need for Root to be a mixin (although I still believe it would be nice to), it can directly inherit Compound, and File can assume so when reading and writing.

That said, can I make 2 requests?

  • Can you resurrect the Root class? It's a very useful class to distinguish between some NBT data from an actual file.
  • Can you reopen this issue while we discuss this?

@vberlier vberlier reopened this Nov 2, 2021
@vberlier vberlier mentioned this issue Nov 2, 2021
9 tasks
@vberlier
Copy link
Owner

vberlier commented Nov 3, 2021

In my mind it didn't make much sense to keep Root around as you can just as well use File.parse() for extracting chunks in .mca files. I guess the awkward thing would be the useless save() and load() methods.

The idea of making a Root class that could be specialized for all tag types is really neat but it's probably over-engineered for something that will realistically only be used for compound tags. I have an idea that would probably be simpler to implement and just as flexible. I could make it so that some_tag.write(fileobj, root="hello") prefixes the actual value with the tag name. Or an additional pair of methods available for all tags like parse_root() and write_root() that would handle the root name and then delegate to the regular parse() and write() methods.

Of course the issue is that you could still call parse() or write() directly by mistake, so it's a weaker alternative to the Root class. In the end, I think the most straight-forward solution is probably the best here, I'll restore the Root class.

@MestreLion
Copy link
Contributor Author

MestreLion commented Nov 3, 2021

In my mind it didn't make much sense to keep Root around as you can just as well use File.parse() for extracting chunks in .mca files. I guess the awkward thing would be the useless save() and load() methods.

The useless (and potentially misleading) save and load in chunks are the exact reasons why I wanted a separate class for non-file root compounds.

The idea of making a Root class that could be specialized for all tag types is really neat but it's probably over-engineered for something that will realistically only be used for compound tags.

I agree. Specially now that we have "official" docs saying it's always Compounds. Not only realistically, but per the spec. But still... oh my... so tempting...

I have an idea that would probably be simpler to implement and just as flexible.

Sweet! I'm listing...

I could make it so that some_tag.write(fileobj, root="hello") prefixes the actual value with the tag name.

... tag name and tag type too, don't forget it. Hum, I like it. Gives me a shiver when I see a tag handles its own name (pymclevel-style of things which I mentioned in another issue), but this would be a honorable, worthwhile exception.

Or an additional pair of methods available for all tags like parse_root() and write_root() that would handle the root name and then delegate to the regular parse() and write() methods. [...]

Ewww, noooo, no, no. nbtlib is too elegant for this.

In the end, I think the most straight-forward solution is probably the best here, I'll restore the Root class.

Thanks!!!! :-D

(And if I fail to resist the temptation, I in the future extended Root with a meta like List does (I really like that pattern you did) to convert Root from a hardcoded Compound to a "dynamic mixin" class that could inherit from any class (and from itself, so Root[Compound] is stll a plain Root too))

But I will resist this temptation. There are none non-compound roots in the wild. It's in the damn spec. It's useless. I'm better than that.

@MestreLion
Copy link
Contributor Author

Full disclaimer here: in the meantime since I've opened this 12 days ago, I've played with some ideas about the chunk/region relationship that are completely unrelated to this File/Root discussion.

I'm trying to decouple Region from the NBT implementation, so Region acts more like a storage system delivering raw NBT data for someone else to instantiate as a Chunk/Compound, rather than the first class that Dimension holds, meaning API-wise Chunk would "talk" directly to World/Dimension: A Dimension would be a dictionary** of Chunks, not regions, using absolute (cx, cz) chunk coordinates, and the Anvil .mca files would a Chunk implementation detail.

*: well, not Chunks, but rather "[category][Chunk]", as of 1.17 we have region/, entities/ and poi/ subdirs.

In this new model, it does make sense for a Chunk to have a .save() and .load(), a specialized one that invokes Region's (cached) parsers for loading and saving.

This is all to say that, in the end, it is possible that my Chunks end up inheriting from File, so I won't use Root after all, lol.

But I still think they have a reason to exist, both conceptually and in my heart :)

@MestreLion
Copy link
Contributor Author

I've been silent as I've spent the last 10 weeks or so playing Minecraft instead of coding for it. Well, actually 10% playing and 90% creating mods and resourcepacks. Inevitably I stumbled on MCP and Minecraft's source code, and it gave many useful answers and insights:

  • Root/File is always a Compound. Strictly typed as such in arguments and return types of NbtIo.read/write() and *Compressed() counterparts, throwing IOException on read if not (tag instanceof CompoundTag)
  • Root/File tag is always unnamed . NbtIo.read() simply discards whatever name is in the file data with a data.readUTF() call without any assignments, and NbtIo.write() calls data.writeUTF("") as name.

@Netherwhal
Copy link

I am seeing this error:

TypeError: Non-Compound root tags is not supported: <class 'nbtlib.tag.End'>

on a couple of player.dat files that I know are not corrupt.

@MestreLion
Copy link
Contributor Author

@Netherwhal , care to post some example files? I'm assuming they're empty players, correct?

In any case, I believe this would be better handled as a separate issue

@Netherwhal
Copy link

@MestreLion / @vberlier - no those are proper nbt files that I can open without any issues with other tools.

I am happy to pay a commission for this to be fixed in this library.

@MestreLion
Copy link
Contributor Author

@Netherwhal , what version are you using? I've tested latest 2.0.4 from Pypi and it handles your test case just fine:

~/minecraft/test $ venv
Cache entry deserialization failed, entry ignored
Collecting pip
  Using cached https://files.pythonhosted.org/packages/08/e3/57d4c24a050aa0bcca46b2920bff40847db79535dc78141eb83581a52eb8/pip-23.1.2-py3-none-any.whl
Collecting setuptools
  Using cached https://files.pythonhosted.org/packages/c7/42/be1c7bbdd83e1bfb160c94b9cafd8e25efc7400346cf7ccdbdb452c467fa/setuptools-68.0.0-py3-none-any.whl
Collecting wheel
  Using cached https://files.pythonhosted.org/packages/61/86/cc8d1ff2ca31a312a25a708c891cf9facbad4eae493b3872638db6785eb5/wheel-0.40.0-py3-none-any.whl
Installing collected packages: pip, setuptools, wheel
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
  Found existing installation: setuptools 39.0.1
    Uninstalling setuptools-39.0.1:
      Successfully uninstalled setuptools-39.0.1
Successfully installed pip-23.1.2 setuptools-68.0.0 wheel-0.40.0

(venv) ~/minecraft/test $ pip install nbtlib
Collecting nbtlib
  Downloading nbtlib-2.0.4-py3-none-any.whl (28 kB)
Collecting numpy (from nbtlib)
  Using cached numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy, nbtlib
Successfully installed nbtlib-2.0.4 numpy-1.24.4

(venv) ~/minecraft/test $ python
Python 3.8.0 (default, Feb 28 2023, 16:22:29) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nbtlib
>>> data = nbtlib.load('../mcworldlib/data/6d26bff4.dat')
>>> type(data)
<class 'nbtlib.nbt.File'>
>>> len(data)
55

Maybe this was fixed already by eda4d32 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants