Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow keys in key-value pairs to be paths #499

Closed
pradyunsg opened this issue Nov 23, 2017 · 86 comments
Closed

Allow keys in key-value pairs to be paths #499

pradyunsg opened this issue Nov 23, 2017 · 86 comments

Comments

@pradyunsg
Copy link
Member

The only remaining idea from #292 that has not been decided upon and does not have a dedicated issue.

I mean, I don't know how much I like it myself but, hey, this needs discussion so, here's a dedicated issue for it.

[document]
title = "Hello!"
meta.charset = "utf-8"
@pradyunsg pradyunsg mentioned this issue Nov 23, 2017
@lmna
Copy link

lmna commented Nov 23, 2017

Compare (this is a slightly modified example from the spec):

[[catalogue."Cash & Carry".fruit]]
  name = "apple"

  [catalogue."Cash & Carry".fruit.physical]
    color = "red"
    shape = "round"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "red delicious"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "granny smith"

[[catalogue."Cash & Carry".fruit]]
  name = "banana"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "plantain"

versus

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical.color = "red"
physical.shape = "round"
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
    { name = "plantain" },
]

First version is harder to read because it is cluttered with repeating (and absolutely meaningless) catalogue."Cash & Carry".fruit prefix.

I believe that proposed feature gives a huge boost in readability for complex, deeply-nested configurations.

@lmna
Copy link

lmna commented Nov 23, 2017

Proposed feature enables intuitive syntax for some simple cases of array-of-tables issue #309

@pradyunsg
Copy link
Member Author

Thanks for a nice example @lmna. Also for @dstufft's example from #413:

[a]
value = 1

[a.b]
value = 2

[a.c]
value = 3

[a.c.d]
value = 4

[a.e]
value = 5

It becomes:

[a]
value = 1
b.value = 2
c.value = 3
c.d.value = 4
e.value = 5

Much nicer! ^>^

@mojombo
Copy link
Member

mojombo commented Nov 23, 2017

This could be a very nice and powerful addition to TOML. Let's go through a few ramifications to see if there are any traps.

This would allow any TOML document to be expressed without any bracket-style tables at all. The last example above could also be expressed as:

a.value = 1
a.b.value = 2
a.c.value = 3
a.c.d.value = 4
a.e.value = 5

More realistically, you'd be repeating longer key names. Perhaps something like this is better to see what that would feel like in reality:

3dprinter.extruder1.material = "PLA"
3dprinter.extruder1.temp.max = 242
3dprinter.extruder1.temp.min = 238
3dprinter.extruder1.temp.unit = "F"
3dprinter.extruder1.color = "red"
3dprinter.extruder1.feed_rate = "23"

The repetition becomes annoying in this case and it would be natural to switch to bracket tables to reduce that repetition, so I don't think that's a hit against the proposal.

To remain consistent with tables, we would need tables expressed this way to adhere to the same non-re-opening restriction. Thus, the following would be invalid:

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3 # INVALID - reopens table [a.b]

That's easy enough to say and enforce, no different than tables already behave.

@lmna is absolutely right in that this proposal could be used to work around the confusing quirks of array table syntax and clean those up, which would be very nice because that is indeed TOMLs least elegant bit. I'm guessing most situations could be represented cleanly with thoughtful use of "path keys" and inline tables. A big win for TOML.

I can't think of any big downsides. TOML remains unambiguous, as this is simply an alternate table syntax along with regular tables and inline tables. It's quite obvious what's going on and since "." is already forbidden in keys, would be backwards compatible with 0.4.0.

Perhaps one could argue that this addition would make TOML less minimal (OMG 3 ways to define tables!!!!), but it would help clean up some TOML docs that would otherwise be more verbose and less obvious, a tradeoff worth serious consideration.

Let me draw up a PR to see what this might look like in the spec/ABNF.

@pradyunsg
Copy link
Member Author

I can't think of any big downsides.

+1

Let me draw up a PR to see what this might look like in the spec/ABNF.

Maybe #446 would come into play here?

a.key = 1
unrelated-table.key = 1
a.b.key = 1

If the above is invalid, which it is IMO, so should it's table equivalent.

@pradyunsg
Copy link
Member Author

Aside, https://github.com/pradyunsg/toml/tree/dotted-keys. :)

@mojombo
Copy link
Member

mojombo commented Nov 24, 2017

@pradyunsg Ah, excellent, please submit as a PR, I didn't start on one yet.

@pradyunsg pradyunsg mentioned this issue Nov 25, 2017
@alexcrichton
Copy link
Contributor

This is a pretty neat idea! It may be helpful to take a look at where existing projects may use this to see what the impact could be perhaps? I'm personally most familiar with Cargo, so I'll stick with that :)

The first thing that comes to mind for Cargo is the [dependencies] section:

[dependencies]
libc = "0.2"
serde = { git = "https://github.com/serde-rs/serde" }
my-crate = { path = "path/to/my-crate", version = "0.2" }

Today I (and I think a number of others) like how dependencies tend to be easily scannable top to bottom, one line each. With this extension I could imagine some people may switch idioms to maybe do something (pessimistically) like:

[dependencies]
libc = "0.2"
serde.git = "https://github.com/serde-rs/serde"
my-crate.path = "path/to/my-crate" 
my-crate.version = "0.2"

Readability-wise I think that unfortunately a conversion like this is a net-loss (subjectively at least). Scanning the dependency list it's not clear if "serde.git" is the name of a dependency or not, you'd have to have prior knowledge to mentally strip away after the . to know that the dependency name is "serde". Similarly for "my-crate" I think (personally) it looks a little worse as it's now spread over two lines.

Now that of course doesn't mean we shouldn't accept a change like this! This sounds very similar to the old inline tables discussion where some things can definitely get worse, yet many patterns get much better. I remember that way-back-when we basically designed the features of Cargo.toml around the syntax and features of TOML itself, and I'd suspect that most consumers of TOML would do similarly. I think that means for Cargo we wouldn't show examples and otherwise wouldn't recommend syntax like this in the [dependencies] section, and that would probably do us fine!

Now one place where I think Cargo could benefit greatly is the [profile] section:

[profile.dev]
opt-level = 1

[profile.release]
debug = true
lto = true

That I think actually looks better as:

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

So I do think there's possible areas for us to use this in Cargo!

Overall I'm 👍 on this feature, it seems like a natural extension of the [a.b.c] syntax in table headers and then, like before, the onus is on authors to leverage and recommend TOML patterns for "looking nice", which doesn't mean aggressively using or not using this, just where appropriate!

@ahmedcharles
Copy link

ahmedcharles commented Feb 2, 2018

I think the biggest downside here is specifying when the table closes for modification. This proposal doesn't seem to make that clear.

For example, if we assume this is valid:

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

Is this also valid:

[profile]
release.debug = true
dev.opt-level = 1
release.lto = true

If it's valid, then why have tables close at all and if it's invalid, then how do you explain that to users effectively? The 2 current ways of specifying tables force locality when defining tables and do so in an obvious way. Exchanging key/value pairs within a table section never changes the validity of a file. In order to keep that invariant and add this functionality, you have to give up the locality of table definitions.

Note, the 'pro' side examples above could be written as:

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
    { name = "plantain" },
]
[profile]
dev = { opt-level = 1 }
release = { debug = true, lto = true }

The current specification seems to allow for reasonable readability while avoiding confusion and risking adding a feature without implementation experience.

@StefanKarpinski
Copy link

There are two ways I can see addressing your concerns, @ahmedcharles:

  1. The profile.release table is closed when profile is closed. The general rule would be that tables written with the dotted key syntax are closed when their enclosing table that is not written with dotted key syntax closes.

  2. Require that all dotted key entries with the same prefix appear together, so the second example where dev.opt-level appears between release.debug and release.lto would be illegal. Then the profile.release table would be closed after seeing the last release. entry in the profile section.

The latter approach doesn't violate the principle that sorting a table should not affect its meaning or validity since sorting would keep dotted keys with the same prefix together. It would, however, mean that randomizing the order of key-value pairs could cause it to become illegal if it separates dotted keys with the same prefix. I'm not sure that's a problem though – I can see why being allowed to sort the keys is useful, I have a hard time seeing why randomizing the keys would be useful.

@StefanKarpinski
Copy link

StefanKarpinski commented Feb 2, 2018

Note, the 'pro' side examples above could be written as:

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

I think it's key to note that this only looks reasonable because the keys and values in the physical table are quite short. If it was this instead, the inline table is less acceptable:

physical = { color = "redredredredredredredredredredredredredredredredredredredredredred", shape = "roundroundroundroundroundroundroundroundroundroundroundroundround" }

Of course, another solution would be to allow multiline inline tables, e.g.:

physical = {
    color = "redredredredredredredredredredredredredredredredredredredredredred",
    shape = "roundroundroundroundroundroundroundroundroundroundroundroundround"
}

I'm not sure if that's preferable to what's being proposed here, however. For example, it means that you can't scan through a section looking for ^\s*\w+\s*= and be sure that you're finding a key in that table since the shape = line for example looks like that but is actually an entry in a subtable. The physical.shape = syntax doesn't have that problem.

@ahmedcharles
Copy link

'Sorting' was the wrong word, I meant 'exchanging'. I think the property that keys can be shuffled within a section while retaining meaning is important, not because one wants to do that but because explaining the errors caused by not doing that no longer fits the definition of being simple. Saying that you can't duplicate section headers or key names is really simple by comparison.

Additionally, the motivation for restricting inline tables to a single line is explicitly because their intended use is for small, simple tables. Larger tables benefit less from inline syntax just as they would from the proposed path syntax. I.e. you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity.

The current spec has two properties:

  1. Table keys/values (which aren't tables themselves) have good locality.
  2. The table reopening restriction is easy to explain, because it simply disallows duplicated sections and keys.

This proposal forces a choice between those two properties, because you can't keep both.

@pradyunsg
Copy link
Member Author

@mojombo this should be reopened then. =)

@mojombo
Copy link
Member

mojombo commented Feb 5, 2018

Dotted keys have been merged, but we should still clarify when tables close.

@mojombo mojombo reopened this Feb 5, 2018
@lmna
Copy link

lmna commented Feb 7, 2018

you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity

Yep. Related values are to be put in the same table. And any forms of "table reopening" should be forbidden.

explaining the errors caused by not doing that no longer fits the definition of being simple

Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."

@ahmedcharles
Copy link

Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."

I suppose it depends on your definition of simple. Given what TOML strives to be, yes, this is far from simple, in my opinion.

@eksortso
Copy link
Contributor

The notion of "closing" a table applies to non-table assignments. Assigning sub- or super-tables is offered more latitude when standard table definitions are used. After all, it was in this context that this rule applies: "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it."

But what if the tables are defined with key-path notation? Or with inline notation, which raises similar questions? In other words, are these valid?

Key-path assignments and subtables

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

[profile.release.misc]  # Is this section valid?
alpha = "A"
beta = "B"

Inline tables and subtables

[profile]
dev.opt-level = 1
release = {debug = true, lto = true}

[profile.release.misc]  # Valid? Even though `profile.release` was defined inline?
alpha = "A"
beta = "B"

Inline tables and key-path assignments

[profile]
dev.opt-level = 1
release = {debug = true, lto = true}
release.misc.alpha = "A"  # Can we define `profile.release.misc` this way?
release.misc.beta = "B"   # Is this valid?

I think all three examples ought to be considered invalid. The first one visually breaks up the set of profile.release assignments. The others gunk up one-liner definitions, which should be kept short and succinct if used at all.

In order to keep things obvious and minimal, we may insist that the definitions of subtables be restricted on these two types of table definitions. Mainly:

  • No additional keys or subtables may be assigned to an inline table.
  • Standard table notation may not be used to define subtables of tables defined by key-path assignments.

These two proposed rules, along with the non-reopening restriction, ought to settle the issue of when tables are "closed," and can be extended to address table arrays.

@falcon71
Copy link

falcon71 commented Feb 26, 2018

I find the concept of "closing" a table quite difficult to grasp.
With the dotted key syntax, there are now so many different ways to navigate through tables, it makes it difficult to figure out when you are allowed to append to a table and when not.

If you want a concept of "closing", then why is this allowed?:

[a.b]
c = "a.b.c"
[a]
d = "a.d"

I feel that the concept of "a value can only be assigned once" is much easier to understand and should be sufficient. For primitives it's simple and arrays can be appended to anytime. You should be able to add new keys to a table anytime as well, as long a key has not been defined before.
The [a.b] and the [a] in the previous example can be interpreted as merely specifying a path creating referenced tables implicitly if needed. Once the key a is a table, it can't be assigned another value. However, it can be referenced and expanded again.

Another point that is not clear to me is how arrays are currently supposed to be handled. The the first part in the following example appears to be currently valid. When thinking in terms of paths, any key, included a dotted key, should reference the last element of an array. All versions below would be equivalent:

[[a.b]]
[a]
x0 = "a.x0"
[a.b.c]
d = "a.b[0].c.d"

[[a.b]]
[a]
x1 = "a.x1"
b.c.d = "a.b[1].c.d"

[[a.b]]
[a]
x2 = "a.x2"
[a.b]
c.d = "a.b[2].c.d"

[[a.b]]
[a]
x3 = "a.x3"
b = { c.d = "a.b[3].c.d" }

The only surprise is, that the [[]] syntax always creates a new element in an array and does not merely specify a path.
The conclusion to thinking in paths is, that the following should be valid as well:

[a]
b = "a.b"
[a.c] 
c = "a.c.c"
[a] #currently not possible
c.d = "a.c.d"
[] #currently definitely not possible
a.d = "a.d"

The "assign a value only once" rule is easy to understand, the paths work consistently in all cases and should be equally simple to implement in parsers.

@eksortso
Copy link
Contributor

@falcon71 Let me address questions that you had in your examples. A second comment post will follow.

You asked why this was allowed.

[a.b]
c = "a.b.c"
[a]
d = "a.d"

The rules for opening and closing tables are more flexible for table and table-array values. The spec says "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it." That's why you can write a and a.b in either order. This is valid TOML because nothing has been assigned to a yet, except for the table value a.b.

It is ugly. It needs to be sorted for legibility's sake. But it's legal.

And I ought to put in a PR to re-write the rule in the spec, because "specific" isn't specific enough.

Table arrays are confusing enough as they are. Let me comment the code in your example, because something doesn't seem right about it. Not sure if you realize that each instance of [[a.b]] defines the next element of the table array.

[[a.b]]  # Defines table array `a.b`, opens its FIRST element,...
         # ...and leaves it empty?
[a]      # Opens the table `a`, which already holds the array `a.b`
x0 = "a.x0"    # (that's right)
[a.b.c]  # Opens a new table `c` in the first element of `a.b`.
d = "a.b[0].c.d"    # (that's right)

[[a.b]]  # Opens SECOND element of table array `a.b`,...
         # ...and leaves it empty?
[a]      # INVALID AT THIS POINT. `a` was already defined above.
         # Like I said, I'll address your central point in another post.
#...

Does this example clear up how the table array a.b works?

[a]      # There's only one table `a`.
x0 = "a.x0"
x1 = "a.x1"
x2 = "a.x2"
x3 = "a.x3"

[[a.b]]  # FIRST element of table array `a.b` (index 0, from your POV)
y0 = "a.b[0].y0"
[a.b.c]  # This is `c` in FIRST element. `a.b.c` is implicitly `a.b[0].c`.
d = "a.b[0].c.d"

[[a.b]]  # SECOND element (index 1)
y1 = "a.b[1].y1"
c.d = "a.b[1].c.d"    # We're already in `a.b[1]`.

[[a.b]]  # THIRD element (index 2)
y2 = "a.b[2].y2"
c.d = "a.b[2].c.d"

[[a.b]]  # FOURTH element (index 3)
y3 = "a.b[3].y3"
c.d = "a.b[3].c.d"

@eksortso
Copy link
Contributor

@falcon71 As much as I can appreciate a general "assign a value only once" rule, I think that it would not work in TOML.

A human-readable configuration format does require some restrictions on how flexible it can be, in order to preserve readability. Key paths were introduced for that purpose. Using them improperly could lead to unreadable files, though.

I would prefer that all non-table basic-type assignments in a table be kept in the same place. Note that we have precedent for this. Say we configure a nested table x.a like this:

[x.a]
b = 1

[x.a]  # INVALID: The table `x.a` was already defined.
c = 2

We didn't re-assign anything to x.a, but that doesn't matter. The second [x.a] is considered a re-definition of x.a. This has the nice effect of keeping all non-table values in x.a defined in one place, the standard section [x.a]. And it places no limitations on any later-defined subtables, or on the supertable x.

I previously recommended that all inline table assignments be closed to both new basic values and subtables, to keep inline tables entirely self-contained. I stand by that recommendation. Key paths and standard subtable definitions should not touch inline tables.

@mojombo's past statement implies that a table whose basic values are assigned using key-path notation must necessarily have all such assignments grouped together, even if subtables and supertables are defined elsewhere.

But I also recommended that standard table notation should not be used to add subtables to tables defined by key-path assignments. The existing rules close off new basic value additions to key-path-defined tables once they are no longer being referenced, and my recommendation closes off new subtables in the same context.

For the sake of error reporting, all of this put together implies that each table in the configuration is defined in one continuous set of lines. An error message can thus state that "Line N invalid; table x.y.z was defined in lines A-Z." The user can take this hint and transfer line N's contents in between lines A and Z inclusive. For subtable restrictions, a similar message can be provided. Parsers would need to keep track of which lines defined which tables, but each table would always be a continuous range.

@falcon71
Copy link

Thank you for your answers.
Yes, you are right, my proposal focused on implementation simplicity without providing any value for human users apart from obfuscation.
Based on my understanding of your rules, the following would be a valid toml?:

[a.b] #closes empty, opens a.b
c = "a.b.c"

[a] #closes a.b, opens a
#b.d = "a.b.d" #invalid, a.b is already closed
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
d.e  "a.d.e" #closes a.b.d, opens a.d

#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
d.e = "a.d.e"

[[b.a]] #closes a.d, opens b.a[0]
a = "b.a[0].a"
[b] # closes b.a[0], opens b
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c

[[b.a]] #closes b.a[0].c, opens b.a[1]

[a.x] #closes b.a[1], opens a.x

@eksortso
Copy link
Contributor

Let me start by noting that you could have more than one table open at a time. Two tables can be open at one time if you are using dotted keys. With inline tables, you may have several tables open, if only briefly.

What I have in mind is a hierarchy of the definition styles. Sections contain bare keys, quoted keys, and groups of dotted-key-defined tables. They all can contain inline subtables for values, which may also contain dotted keys in inline subtables.

More explicitly:

  • The BOF opens the root table. The EOF closes it after its subtables are closed.
  • The root section of the document precedes all others, and all basic assignments (i.e. bare-key and quoted-key assignments) within it are applied to the root table.
  • The section headers, denoted by [] and [[]], close off basic assignments to the root table.
  • Section headers also open a subtable to basic assignments, beginning with the header line and ending with either the next header line or the EOF. All assignments within the section are applied to the section's table. Section header tables are closed after key-path subtables (i.e. tables defined and populated by dotted keys) are closed.
  • Within sections, dotted-key assignments are grouped together by their key path. A group of key path assignments defines a key-path subtable that opens with the first assignment and closes on the either the next key assignment that does not refer to the same key path, or on the next section header line, or on the EOF. Note that basic assignments break up key path assignment groups.
    [a]             # This opens the table `a` inside the root.
    a1 = "a.a1"
    b.c = "a.b.c"   # This is the only assignment, basic or otherwise, to `a.b`.
    a2 = "a.a2"     # This is valid, and closes the table `a.b`.
    #b.z = "a.b.z"  # INVALID
  • When an inline table value is assigned, the inline table that is created is immediately opened, populated, and closed, all on the same line, independent of other open tables. So inline tables don't break up key-path assignment groups. @falcon71, your code contains a good example of that.

This is getting very elaborate. But I think it's been an enlightening process so far, and I hope you think so too.

## Here's your original code.
## My comments are double-hashed and refer to prior lines.

[a.b] #closes empty, opens a.b
    ## Yes. The root table can only accept subtables and subtable arrays from
    ## this point forward. The section table `a.b` is opened.
c = "a.b.c"

[a] #closes a.b, opens a
    ## Yes, exactly. Subtables of `a.b` may later be defined.
#b.d = "a.b.d" #invalid, a.b is already closed
    ## That's right.
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
    ## No; section `[a]` keeps table `a` open.
    ## But Yes; the dotted keys open `a.b.d` here.
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
    ## No; section `[a]` keeps the table `a` open.
    ## The missing key path would have closed `a.b.d`.
    ## But since this is commented out, let's move on.
d.e  "a.d.e" #closes a.b.d, opens a.d
    ## INVALID, because you forgot the "=" sign!

#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
    ## The dotted keys close the table `a.b.d` and open `a.d` here.
    ## The inline table value opens and closes `a.d.f` on a single line.
    ## `a` is still open for basic assignments.
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
    ## Not exactly; the table `a.d.f` is already closed.
d.e = "a.d.e"

## We're at a new section header.
## Open dotted-key tables (`a.d`) are closed.
## The old section table (`a`) is closed. `a` may have subtables defined later.
[[b.a]] #closes a.d, opens b.a[0]
    ## The section does open `b.a[0]`. But `a.d` was already closed.
    ## (TOML doesn't guarantee 0-indexing, but I get what you mean.)
a = "b.a[0].a"
[b] # closes b.a[0], opens b
    ## It is very strange to open a table after opening the first element of an
    ## array of tables within it. But it's valid.
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
    ## Yes.
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c
    ## The table `b` isn't closed until the next section header.
    ## But the key-path table `b.a[0].c` is opened

[[b.a]] #closes b.a[0].c, opens b.a[1]
    ## The table `b.a[0].c` is closed first, then `b` is closed.
    ## But Yes, the table `b.a[1]` is opened.

[a.x] #closes b.a[1], opens a.x
    ## Yes, that's right.

## At EOF, `a.x` is closed, and the root table is closed.

@falcon71
Copy link

Thank you for taking your time to annotate the example. I indeed find this very enlightening.

The root table can only be accessed between BOF and the first table or arraytable declaration, so I think it can be treated like a normal table declaration (think []).

You would allow this:

a = "a"
b.c = "b.c"
d = "d" #valid, root is still open
        #my interpretation of only allowing a single open table would have forbidden this

If I understand you correctly, you would keep track of three open tables:

  1. Root or table declaration
  2. Dotted keys
  3. Inline tables

This would lead to the following being invalid, which might seem confusing:

a.a = "a.a" #opens a, root still open
a.b.c = "a.b.c" #closes a, opens a.b
#a.c = "a.c" #invalid, a already closed

If this was to be allowed, then an arbitrary number of tables would need to be kept open for dotted keys and inline tables with dotted keys (I assume the rules would be exactly the same for inline tables. The order would matter as well).
In any case, while these rules might work, I find them quite far from being "obvious" like the previous rules before dotted keys were introduced. They could simply be remembered as "don't assign [table] twice". Now users will be busy rearranging keys until the parser accepts the file, because sometimes keys need to be grouped together, except for when they don't.

@eksortso
Copy link
Contributor

eksortso commented Nov 9, 2018

Getting back to the central topic, would the following be legal under the strictest interpretation? I'm inclined to think it's not, but perhaps it actually is. In the latter case, the openness of subtables introduced by dotted key/value pairs is still in play. And in either case, we may need to add language to the spec addressing the ordering of dotted key assignments.

a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"

# And btw, we do need to update TOML syntax highlighting, in jneen/rouge I think.

@StefanKarpinski If that's true, then the above is perfectly valid, since no inline table values are involved.

@StefanKarpinski
Copy link

It seems fine to me since tables are being built up incrementally in any case. What is the purpose of a more strict interpretation? This is a real question. Is the purpose to allow an implementation to "close" a table earlier? Is closing a table early actually a significant benefit in any implementations?

@ChristianSi
Copy link
Contributor

@eksortso:

@ChristianSi The strictest interpretation is very good. But to clarify, would the following still be legal? That is to say, can headers still be written subtable-first (ugly as that may be)? ...

[a.b.c]  # An empty table
[a.b]    # Its parent, with no key/value pairs (not counting a.b.c)

Sure, that remains legal. Order of table blocks (introduced by [...]) doesn't matter in TOML, except where arrays of tables (introduced by [[...]]) are concerned.

Getting back to the central topic, would the following be legal under the
strictest interpretation? ...

a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"

Sure, that remains legal. Order of key/value pairs within a table block doesn't matter in TOML v0.5. (Some months ago there was a discussion about prohibiting such an ordering in future versions of TOML, but that would clearly be an additional restriction which is not yet part of the spec. The strict interpretation, on the other hand, is only about making explicit what's already implicit in the TOML v0.5 spec, not about introducing new restrictions.)

@ChristianSi
Copy link
Contributor

ChristianSi commented Nov 10, 2018

To help clarifying things, here is an attempt to explain the strict interpretation in an unambiguous manner and with examples. If this interpretation is accepted as the correct one, a suitable rewrite of this attempt could be incorporated into a future version of the spec (v0.5.1 or so).

Ways of defining tables

TOML has two ways of defining tables: table blocks and inline tables. TOML forbids defining the same table twice, therefore you can use either of these for any table, but you cannot use both for the same table. Moreover, you are not allowed to define the same table in two different table blocks or in two inline table literals.

Table blocks start with a table header line: [table.name] for stand-alone tables, or [[table.name]] for members of a table array. They continue with a (possibly empty) list of key-value pairs and end right in front of the next table header line (or, if there is none, at the end of the document). A special case is the root table block: it contains any key-value pairs between the start of the document and the first table header line; these key-value pairs belong to the (unnamed) root table.

A table block does not only define its main table (whose name is given in its table header line – if there is none, it defines the unnamed root table), but also any nested tables mentioned in dotted keys listed within the table block. To give an example:

# in root table
vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.bools.t = true
vals.bools.f = false

This fragment defines four tables: the root table ('') and the nested tables 'vals', 'vals.nums', 'vals.bools'. (No values are inserted into the 'vals' table directly, but it is nevertheless defined because it appears within a dotted key.)

Tables must not be defined twice, therefore the following table header lines are now ILLEGAL:

[vals]        # ILLEGAL, defined in root table!
[vals.nums]   # ditto
[vals.bools]  # ditto

But tables defined within table blocks are only assumed to be semi-complete: nested tables and table arrays may be defined in other table blocks (obviously, since all tables are direct or indirect children of the root table). So, to return to the above example, all other syntactically correct table header lines which haven't yet been used as keys remain allowed, including

[misc]                # another child of the root table
[vals.literals]       # a new, not yet defined child of 'vals'
[vals.nums.specials]  # a new, not yet defined child of 'vals.nums'
# ... and anything else you can think of, except stuff like
[vals.nums.one]       # ILLEGAL, since that's already a key

Alternatively you can define tables as inline table literals. You could rewrite the above example as:

# in root table
vals = { nums = { one = 'One', two = 'Two' }, bools = { t = true, f = false } }

Inline tables, however, are values, and like other values (anything that appears on the right side of an equals sign) they are supposed to be immutable and complete. If you define 'vals' as an inline table, you are therefore NOT allowed to define any nested tables outside the inline table literal (neither as table block nor as another inline table literal).

# still in root table
vals.literals = { ... }              # ILLEGAL since 'vals' is an immutable inline table
vals.nums.specials = { ... }         # ditto
[vals.literals]                      # ditto, the chosen syntax doesn't matter
[vals.nums.specials]                 # ditto
[vals.nums.something.deeply.nested]  # ditto

The principle is simple: Anything you want to go inside an inline table must be written into the table literal.

# This is allowed, but the line will probably get too long to be really readable.
vals = { nums = { one = 'One', two = 'Two', specials = { ...} }, bools = { t = true, f = false }, literals = { ... } }
# Consider switching to table block or dotted syntax instead!

Anything said here likewise applies to inline table arrays (including arrays of inline table arrays and so on) which work in exactly the same way as inline tables.

@eksortso
Copy link
Contributor

We have a good example that would help to clarify the standard regarding dotted keys and when implicitly defined tables are introduced. It's important to resolve this, because between three different Python TOML parsers in PyPI, one of them (uiri/toml) raises an error, and two others (sdispater/tomlkit and alethiophile/qtoml) raise no errors and define both c and d in a.b.

The example comes from python-poetry/tomlkit#37. I'm hoping that I am interpreting this right.

a.b.c = 12

[a.b]
d = 34

My take is, this is invalid under TOML v0.5.0, because the table a.b is defined in two different locations: implicitly in the root block with the dotted-key definition, and explicitly in the [a.b] block. The key/value pairs do not conflict with each other, but to be valid, they must be declared in the same block.

I imagine that @ChristianSi would agree with this interpretation and would call for explicit language clearing up all confusion in a future TOML version (and also that the table a is defined in the root block); but that @bitwalker, and maybe @StefanKarpinski, would say that the TOML in the example is valid in v0.5.0, maybe with varying interpretations to allow for "scope merging." But I'm just speculating.

So to anyone interested, what is your take? Is this example valid TOML v0.5.0? What, if anything, belongs in the next version of TOML to clarify what we see happening here?

@ChristianSi
Copy link
Contributor

ChristianSi commented Jan 15, 2019

@eksortso I believe that all arguments in favor of either interpretation have been exchanged, so now would be the time to Make A Decision. Sadly, since TOML's founder is an absentee owner 999 days out of 1000, such a decision is unlikely to be made. Unless somebody else with sufficient decision-making power jumps in – @pradyunsg maybe? – I fear this issue will remain unresolved, leaving the TOML world sadly fragmented 😢

@eksortso
Copy link
Contributor

This is administrative stuff at heart, but it must be addressed. Differing implementations is not good.

Would it speed things up if a decision pending tag were slapped onto every issue where the only thing necessary going forward is for someone with the rubber stamps like @mojombo or, as was suggested, @pradyunsg, to read the ticket, consider the arguments, and make a binding decision?

@pradyunsg
Copy link
Member Author

I've been swamped by a lot of things in the past bit of time. I'll try to catch up on this over the coming weekend.

@eksortso which issues specifically?

@eksortso
Copy link
Contributor

@pradyunsg, I was speaking generally, thinking that having a dedicated tag on issues or PRs might speed up response times on critical issues. Specifically I'm referring to this issue, because we're seeing divergent interpretations in the parsers. Though it could be applied to others like #553 which have been talked through thoroughly but aren't as immediately critical to the standard.

The idea behind this is that our top decision makers could focus on decision pending issues and respond to them first. But depending on what the TOML standard's actual governance model is, such tagging would be redundant.

@pradyunsg
Copy link
Member Author

My OSS time situation isn't good. (pip 19.0 rollout hasn't been "smooth") :/

If someone could summarize the possible positions the specification could take wrt restrictions, as discussed above, it would be greatly appreciated. :)

@bitwalker
Copy link

bitwalker commented Feb 3, 2019

@pradyunsg I'll summarize my position at least, and let others cover theirs:

In essence, there is ambiguity in the spec regarding reopening/extending tables to define new keys, namely via dotted keys vs bracketed keys, with inline tables in the mix as well.

My argument is that if the core data model is a hash table, then any combination of table syntax should be permitted to define tables, or extend previous definitions of tables as long as the restriction that redefining keys with non-table values is not violated. This keeps implementation straightforward and the rules simple for those writing TOML to remember. As I see it, any other option results in conflicting rules which are arbitrarily resolved, which does not seem to vibe with TOMLs stated goal of minimalism.

In my view, the following is valid:

# produces { a = { b = 1, c = { d = 2}}
a = {}
a.c = {}
a.c.d = 2 # extends a.c

[a] # only opens the table, reopens if it exists
b = 1

The discussion in this thread is long, but I think is worth the read, because we identify all the issues and possible solutions in detail.

See my comment below for some additional thoughts.

@AndrewSav
Copy link

@pradyunsg My point was that while many people in this thread feel that the following:

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

should be invalid; the spec explicitly allow this by saying:

As long as a key hasn't been directly defined, you may still write to it and to names within it.

It needs to be clarified if that's not the case.

I would also like to echo @bitwalker by saying that this thread is definitely worth reading in its entirety.

@bitwalker
Copy link

The example given by @AndrewSav reminded me of a point I would like to clarify. If that example or any of the others in this thread are actually supposed to be invalid, then it is not only important to clarify the specification, but clarify why it is invalid in the first place, beyond just "we choose to resolve conflicting rules in this specific way".

Cognitive load is just as important a metric as syntactic complexity in my opinion, and having a framework from which to reason about the rules reduces that load, as long as there is some unifying framework.

Put another way, if TOML maps unambiguously to an arbitrarily nested hash table, what do the rules described in the specification do to support that mapping or support the goal of minimalism. If any are contradictory, why? If we want to place restrictions on how the syntax allows you to describe a hash table, users and implementors alike expect those restrictions to come as a trade off, for a benefit that is worth more than the loss of flexibility. That trade off should be explained to help both users and implementors of TOML to properly reason about its use. If there is no trade off, then such restrictions probably should be lifted, or at least reconsidered.

I'll stop posting now to avoid cluttering this thread further, but I feel like the above condenses my thoughts best.

@eksortso
Copy link
Contributor

eksortso commented Feb 4, 2019

@bitwalker Your example isn't getting you what your comment says. The introduction of [a] means, by your own standard, the code produces the following:

a.b = 1
a.a.c.d = 2

Or, {a = {b=1, a={c={d=2}}}}.

@eksortso
Copy link
Contributor

eksortso commented Feb 4, 2019

I thoroughly back the position laid out by @ChristianSi in his November 10, 2018 comment. I couldn't express it any more clearly. #499 (comment)

@ChristianSi
Copy link
Contributor

ChristianSi commented Feb 4, 2019

@eksortso Thanks, I still stand by that position and propose to add something like the text in that comment ("Ways of defining tables") to the next revision of the TOML spec. If further clarification is needed: it's an attempt to explain how dotted keys and inline tables interact with TOML's rule "You cannot define any table more than once".

I believe that such a clarification would not introduce any new restrictions but merely make explicit what's already implicit in the TOML v0.5 spec, as explained in an earlier comment.

@pradyunsg
Copy link
Member Author

Just noting that this is still on my radar -- I've just not been able to make time for this.

@pradyunsg
Copy link
Member Author

pradyunsg commented Jun 11, 2019

I finally managed to come around to reading this and spend some time thinking about this.

Geez y'all. This is a wonderful and dense conversation! Thanks a ton for providing your inputs here everyone! It's much appreciated. :)


Putting down my thoughts in a follow up post.

@pradyunsg
Copy link
Member Author

pradyunsg commented Jun 11, 2019

I was in the "strict" camp before it got a name. ;)


@ChristianSi's well written "Ways of defining tables" semantics, are exactly as what I had in mind, when writing up the specification for dotted keys.

To reiterate poorly, inline tables are immutable and tables directly defined by a dotted key can not be "redefined" by using the [table] syntax or the inline table syntax.

i.e. The following examples are invalid:

foo.bar = {}
foo.bar.baz = "true"  # INVALID
foo.bar.spam = {}  # INVALID
vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.nums = { three = 'Three' }  # INVALID

[vals.nums]  # INVALID
three = 'Three'

The following examples are valid:

vals.nums.one = 'One'
vals.nums.two = 'Two'

[vals.letters]
one = 'A'
two = 'B'
[profile]
release.debug = true

[profile.release.misc]
alpha = "A"
a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

I never intended that this last example be valid (and neither did @mojombo), but it is as per the language used. Now that I re-read the spec, it is clear to me that the intent to disallow this is not as obvious, as I thought it was when I wrote this.

We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.


@bitwalker It would help to add a clarification in the inline tables section -- inline-tables are basically a fancier "Value" and all values are immutable.

While I do think having some reference/guidance on why certain choices were made is helpful, I don't think adding that would be critical-path for getting to 1.0.


Action items here would be, at least:

  • Clarify inline tables are immutable (and dotted keys can't "inject" into them)
    (at the end of the "Inline Tables" section)
  • Clarify that a table defined by a dotted key can not be overridden via a regular table but addition of new sub-tables is allowed.
    (after the example of super-tables in "Tables" section)
  • Add advice to not define dotted keys out-of-order.
    (after the example of ASCII-float keys in "Keys" section)

If anyone can think of additional things we should do here, please do holler! :)

@pradyunsg
Copy link
Member Author

pradyunsg commented Jun 12, 2019

We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.

I'm on the fence on this TBH -- I don't want to break compatibility but I also really want to just straight up disallow this -- I don't see too many usecases where doing this out-of-order makes much sense anyway so maybe the breakage is fine?

I guess we should look into this in a follow up, better scoped, issue.

@ChristianSi
Copy link
Contributor

@pradyunsg

I was in the "strict" camp before it got a name. ;)

Happy to hear it 👍

If I understand you correctly, you definitively want to prohibit key injection into inline tables in TOML 1.0 (yeah!) but are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

While I don't have any strong feelings on the second issue (as opposed to the first one!), my viewpoint is that such out-of-order definitions, though bad style, are harmless and should not be prohibited in TOML 1.x. For one thing, they are clearly allowed in 0.5 and hence covered by our compatibility promise, and moreover, the rule that "order of keys within a single table block" (introduced by [...] or [[...]]) "doesn't matter" is pretty clear-cut and easy to remember.

@pradyunsg
Copy link
Member Author

are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?

Yep and yep.

though bad style, are harmless

Yea, this is basically where I'm split tbh. Allowing them in TOML 1.0 isn't a PITA but it is a quirk that I (really) don't want to have.


We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.

Let's just stick with this.

@pradyunsg
Copy link
Member Author

pradyunsg commented Jun 16, 2019

If anyone can think of additional things we should do here, please do holler! :)

No one did.


Opened #630, #631 and #632 as follow-ups. Going to go ahead and close this. Thanks again for the discussion here everyone! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests