Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should .children be a dictionary instead of a tuple? #3

Closed
TomNicholas opened this issue Aug 18, 2021 · 2 comments
Closed

Should .children be a dictionary instead of a tuple? #3

TomNicholas opened this issue Aug 18, 2021 · 2 comments
Milestone

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Aug 18, 2021

A tree-like structure must have nodes, each of which can contain multiple children, and those children have to be selectable via some kind of name. However, those names can either be keys to access the child objects, or inherent properties of the child objects.

In the former case we would have a node.children=tuple(child1, child2), where child1.name = 'steve', child2.name = 'mary' etc. In the latter case we would have node.children=dict('steve': child1, 'mary': child2), where each child need not have a name. It's not clear to me which of these approaches is better in our case.

It's easy to ensure that all nodes have names (and if we make nodes inherit from Dataset they will inherit a name), but storing children in tuples leads to annoying code like child_we_want = next(c for c in node.children if c.name == name_we_want), instead of just child_we_want = node[name_we_want]. A DataTree is also quite intuitively represented by a nested dictionary where keys are parts of a path and values are either datasets or child nodes, and in that description we would not say that the name key is an inherent property of the value.

Using a dictionary also means that the path to an object is distinct from the name of that object.

This also means that a node doesn't need a name at all, and becomes defined only in terms of its parent and children. In effect, the name of the node would be the key for which self.parent.children[key] returns self. Parentless nodes would be nameless.

A disadvantage of this is that a stored Dataset object has no idea who its parent is.

None of the tree implementations I've seen work like this, and it appears to deviate from the way that a "tree" is defined mathematically.

The anytree library uses named nodes and tuples to store the children, so to use dictionaries we would need to reimplement the NodeMixin class to use a dictionary instead.

@TomNicholas
Copy link
Collaborator Author

A related question is whether the set of child nodes is an ordered set or not. In the mathematical definition of a tree it is unordered, but I'm not sure whether order of nodes matters for certain filetypes or not. By using a tuple or list to store children we are implicitly ordering the tree, compared to using a set (or pre-python 3.6 dict).

Even if we stick with an ordered type for storing the children we still have to decide if our trees are ordered or not, because it matters when checking equivalence between trees.

It might make sense to just choose the more general option (i.e.ordered), and then have flags to treat trees as unordered when it matters.

@TomNicholas
Copy link
Collaborator Author

Closed by #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant