-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow keys in key-value pairs to be paths #499
Comments
Compare (this is a slightly modified example from the spec):
versus
First version is harder to read because it is cluttered with repeating (and absolutely meaningless) I believe that proposed feature gives a huge boost in readability for complex, deeply-nested configurations. |
Proposed feature enables intuitive syntax for some simple cases of array-of-tables issue #309 |
This could be a very nice and powerful addition to TOML. Let's go through a few ramifications to see if there are any traps. This would allow any TOML document to be expressed without any bracket-style tables at all. The last example above could also be expressed as:
More realistically, you'd be repeating longer key names. Perhaps something like this is better to see what that would feel like in reality:
The repetition becomes annoying in this case and it would be natural to switch to bracket tables to reduce that repetition, so I don't think that's a hit against the proposal. To remain consistent with tables, we would need tables expressed this way to adhere to the same non-re-opening restriction. Thus, the following would be invalid:
That's easy enough to say and enforce, no different than tables already behave. @lmna is absolutely right in that this proposal could be used to work around the confusing quirks of array table syntax and clean those up, which would be very nice because that is indeed TOMLs least elegant bit. I'm guessing most situations could be represented cleanly with thoughtful use of "path keys" and inline tables. A big win for TOML. I can't think of any big downsides. TOML remains unambiguous, as this is simply an alternate table syntax along with regular tables and inline tables. It's quite obvious what's going on and since "." is already forbidden in keys, would be backwards compatible with 0.4.0. Perhaps one could argue that this addition would make TOML less minimal (OMG 3 ways to define tables!!!!), but it would help clean up some TOML docs that would otherwise be more verbose and less obvious, a tradeoff worth serious consideration. Let me draw up a PR to see what this might look like in the spec/ABNF. |
+1
Maybe #446 would come into play here?
If the above is invalid, which it is IMO, so should it's table equivalent. |
@pradyunsg Ah, excellent, please submit as a PR, I didn't start on one yet. |
This is a pretty neat idea! It may be helpful to take a look at where existing projects may use this to see what the impact could be perhaps? I'm personally most familiar with Cargo, so I'll stick with that :) The first thing that comes to mind for Cargo is the [dependencies]
libc = "0.2"
serde = { git = "https://github.com/serde-rs/serde" }
my-crate = { path = "path/to/my-crate", version = "0.2" } Today I (and I think a number of others) like how dependencies tend to be easily scannable top to bottom, one line each. With this extension I could imagine some people may switch idioms to maybe do something (pessimistically) like:
Readability-wise I think that unfortunately a conversion like this is a net-loss (subjectively at least). Scanning the dependency list it's not clear if "serde.git" is the name of a dependency or not, you'd have to have prior knowledge to mentally strip away after the Now that of course doesn't mean we shouldn't accept a change like this! This sounds very similar to the old inline tables discussion where some things can definitely get worse, yet many patterns get much better. I remember that way-back-when we basically designed the features of Now one place where I think Cargo could benefit greatly is the [profile.dev]
opt-level = 1
[profile.release]
debug = true
lto = true That I think actually looks better as:
So I do think there's possible areas for us to use this in Cargo! Overall I'm 👍 on this feature, it seems like a natural extension of the |
I think the biggest downside here is specifying when the table closes for modification. This proposal doesn't seem to make that clear. For example, if we assume this is valid:
Is this also valid:
If it's valid, then why have tables close at all and if it's invalid, then how do you explain that to users effectively? The 2 current ways of specifying tables force locality when defining tables and do so in an obvious way. Exchanging key/value pairs within a table section never changes the validity of a file. In order to keep that invariant and add this functionality, you have to give up the locality of table definitions. Note, the 'pro' side examples above could be written as: [[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
{ name = "red delicious" },
{ name = "granny smith" },
]
[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
{ name = "plantain" },
] [profile]
dev = { opt-level = 1 }
release = { debug = true, lto = true } The current specification seems to allow for reasonable readability while avoiding confusion and risking adding a feature without implementation experience. |
There are two ways I can see addressing your concerns, @ahmedcharles:
The latter approach doesn't violate the principle that sorting a table should not affect its meaning or validity since sorting would keep dotted keys with the same prefix together. It would, however, mean that randomizing the order of key-value pairs could cause it to become illegal if it separates dotted keys with the same prefix. I'm not sure that's a problem though – I can see why being allowed to sort the keys is useful, I have a hard time seeing why randomizing the keys would be useful. |
I think it's key to note that this only looks reasonable because the keys and values in the physical = { color = "redredredredredredredredredredredredredredredredredredredredredred", shape = "roundroundroundroundroundroundroundroundroundroundroundroundround" } Of course, another solution would be to allow multiline inline tables, e.g.: physical = {
color = "redredredredredredredredredredredredredredredredredredredredredred",
shape = "roundroundroundroundroundroundroundroundroundroundroundroundround"
} I'm not sure if that's preferable to what's being proposed here, however. For example, it means that you can't scan through a section looking for |
'Sorting' was the wrong word, I meant 'exchanging'. I think the property that keys can be shuffled within a section while retaining meaning is important, not because one wants to do that but because explaining the errors caused by not doing that no longer fits the definition of being simple. Saying that you can't duplicate section headers or key names is really simple by comparison. Additionally, the motivation for restricting inline tables to a single line is explicitly because their intended use is for small, simple tables. Larger tables benefit less from inline syntax just as they would from the proposed path syntax. I.e. you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity. The current spec has two properties:
This proposal forces a choice between those two properties, because you can't keep both. |
@mojombo this should be reopened then. =) |
Dotted keys have been merged, but we should still clarify when tables close. |
Yep. Related values are to be put in the same table. And any forms of "table reopening" should be forbidden.
Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y." |
I suppose it depends on your definition of simple. Given what TOML strives to be, yes, this is far from simple, in my opinion. |
The notion of "closing" a table applies to non-table assignments. Assigning sub- or super-tables is offered more latitude when standard table definitions are used. After all, it was in this context that this rule applies: "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it." But what if the tables are defined with key-path notation? Or with inline notation, which raises similar questions? In other words, are these valid? Key-path assignments and subtables
Inline tables and subtables
Inline tables and key-path assignments
I think all three examples ought to be considered invalid. The first one visually breaks up the set of In order to keep things obvious and minimal, we may insist that the definitions of subtables be restricted on these two types of table definitions. Mainly:
These two proposed rules, along with the non-reopening restriction, ought to settle the issue of when tables are "closed," and can be extended to address table arrays. |
I find the concept of "closing" a table quite difficult to grasp. If you want a concept of "closing", then why is this allowed?:
I feel that the concept of "a value can only be assigned once" is much easier to understand and should be sufficient. For primitives it's simple and arrays can be appended to anytime. You should be able to add new keys to a table anytime as well, as long a key has not been defined before. Another point that is not clear to me is how arrays are currently supposed to be handled. The the first part in the following example appears to be currently valid. When thinking in terms of paths, any key, included a dotted key, should reference the last element of an array. All versions below would be equivalent:
The only surprise is, that the [[]] syntax always creates a new element in an array and does not merely specify a path.
The "assign a value only once" rule is easy to understand, the paths work consistently in all cases and should be equally simple to implement in parsers. |
@falcon71 Let me address questions that you had in your examples. A second comment post will follow. You asked why this was allowed.
The rules for opening and closing tables are more flexible for table and table-array values. The spec says "As long as a super-table hasn't been directly defined and hasn't defined a specific† key, you may still write to it." That's why you can write It is ugly. It needs to be sorted for legibility's sake. But it's legal. †And I ought to put in a PR to re-write the rule in the spec, because "specific" isn't specific enough. Table arrays are confusing enough as they are. Let me comment the code in your example, because something doesn't seem right about it. Not sure if you realize that each instance of
Does this example clear up how the table array
|
@falcon71 As much as I can appreciate a general "assign a value only once" rule, I think that it would not work in TOML. A human-readable configuration format does require some restrictions on how flexible it can be, in order to preserve readability. Key paths were introduced for that purpose. Using them improperly could lead to unreadable files, though. I would prefer that all non-table basic-type assignments in a table be kept in the same place. Note that we have precedent for this. Say we configure a nested table
We didn't re-assign anything to I previously recommended that all inline table assignments be closed to both new basic values and subtables, to keep inline tables entirely self-contained. I stand by that recommendation. Key paths and standard subtable definitions should not touch inline tables. @mojombo's past statement implies that a table whose basic values are assigned using key-path notation must necessarily have all such assignments grouped together, even if subtables and supertables are defined elsewhere. But I also recommended that standard table notation should not be used to add subtables to tables defined by key-path assignments. The existing rules close off new basic value additions to key-path-defined tables once they are no longer being referenced, and my recommendation closes off new subtables in the same context. For the sake of error reporting, all of this put together implies that each table in the configuration is defined in one continuous set of lines. An error message can thus state that |
Thank you for your answers.
|
Let me start by noting that you could have more than one table open at a time. Two tables can be open at one time if you are using dotted keys. With inline tables, you may have several tables open, if only briefly. What I have in mind is a hierarchy of the definition styles. Sections contain bare keys, quoted keys, and groups of dotted-key-defined tables. They all can contain inline subtables for values, which may also contain dotted keys in inline subtables. More explicitly:
This is getting very elaborate. But I think it's been an enlightening process so far, and I hope you think so too.
|
Thank you for taking your time to annotate the example. I indeed find this very enlightening. The root table can only be accessed between BOF and the first table or arraytable declaration, so I think it can be treated like a normal table declaration (think []). You would allow this:
If I understand you correctly, you would keep track of three open tables:
This would lead to the following being invalid, which might seem confusing:
If this was to be allowed, then an arbitrary number of tables would need to be kept open for dotted keys and inline tables with dotted keys (I assume the rules would be exactly the same for inline tables. The order would matter as well). |
Getting back to the central topic, would the following be legal under the strictest interpretation? I'm inclined to think it's not, but perhaps it actually is. In the latter case, the openness of subtables introduced by dotted key/value pairs is still in play. And in either case, we may need to add language to the spec addressing the ordering of dotted key assignments. a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"
# And btw, we do need to update TOML syntax highlighting, in jneen/rouge I think. @StefanKarpinski If that's true, then the above is perfectly valid, since no inline table values are involved. |
It seems fine to me since tables are being built up incrementally in any case. What is the purpose of a more strict interpretation? This is a real question. Is the purpose to allow an implementation to "close" a table earlier? Is closing a table early actually a significant benefit in any implementations? |
Sure, that remains legal. Order of table blocks (introduced by
Sure, that remains legal. Order of key/value pairs within a table block doesn't matter in TOML v0.5. (Some months ago there was a discussion about prohibiting such an ordering in future versions of TOML, but that would clearly be an additional restriction which is not yet part of the spec. The strict interpretation, on the other hand, is only about making explicit what's already implicit in the TOML v0.5 spec, not about introducing new restrictions.) |
To help clarifying things, here is an attempt to explain the strict interpretation in an unambiguous manner and with examples. If this interpretation is accepted as the correct one, a suitable rewrite of this attempt could be incorporated into a future version of the spec (v0.5.1 or so). Ways of defining tables TOML has two ways of defining tables: table blocks and inline tables. TOML forbids defining the same table twice, therefore you can use either of these for any table, but you cannot use both for the same table. Moreover, you are not allowed to define the same table in two different table blocks or in two inline table literals. Table blocks start with a table header line: A table block does not only define its main table (whose name is given in its table header line – if there is none, it defines the unnamed root table), but also any nested tables mentioned in dotted keys listed within the table block. To give an example:
This fragment defines four tables: the root table ('') and the nested tables 'vals', 'vals.nums', 'vals.bools'. (No values are inserted into the 'vals' table directly, but it is nevertheless defined because it appears within a dotted key.) Tables must not be defined twice, therefore the following table header lines are now ILLEGAL:
But tables defined within table blocks are only assumed to be semi-complete: nested tables and table arrays may be defined in other table blocks (obviously, since all tables are direct or indirect children of the root table). So, to return to the above example, all other syntactically correct table header lines which haven't yet been used as keys remain allowed, including
Alternatively you can define tables as inline table literals. You could rewrite the above example as:
Inline tables, however, are values, and like other values (anything that appears on the right side of an equals sign) they are supposed to be immutable and complete. If you define 'vals' as an inline table, you are therefore NOT allowed to define any nested tables outside the inline table literal (neither as table block nor as another inline table literal).
The principle is simple: Anything you want to go inside an inline table must be written into the table literal.
Anything said here likewise applies to inline table arrays (including arrays of inline table arrays and so on) which work in exactly the same way as inline tables. |
We have a good example that would help to clarify the standard regarding dotted keys and when implicitly defined tables are introduced. It's important to resolve this, because between three different Python TOML parsers in PyPI, one of them (uiri/toml) raises an error, and two others (sdispater/tomlkit and alethiophile/qtoml) raise no errors and define both The example comes from python-poetry/tomlkit#37. I'm hoping that I am interpreting this right. a.b.c = 12
[a.b]
d = 34 My take is, this is invalid under TOML v0.5.0, because the table I imagine that @ChristianSi would agree with this interpretation and would call for explicit language clearing up all confusion in a future TOML version (and also that the table So to anyone interested, what is your take? Is this example valid TOML v0.5.0? What, if anything, belongs in the next version of TOML to clarify what we see happening here? |
@eksortso I believe that all arguments in favor of either interpretation have been exchanged, so now would be the time to Make A Decision. Sadly, since TOML's founder is an absentee owner 999 days out of 1000, such a decision is unlikely to be made. Unless somebody else with sufficient decision-making power jumps in – @pradyunsg maybe? – I fear this issue will remain unresolved, leaving the TOML world sadly fragmented 😢 |
This is administrative stuff at heart, but it must be addressed. Differing implementations is not good. Would it speed things up if a decision pending tag were slapped onto every issue where the only thing necessary going forward is for someone with the rubber stamps like @mojombo or, as was suggested, @pradyunsg, to read the ticket, consider the arguments, and make a binding decision? |
I've been swamped by a lot of things in the past bit of time. I'll try to catch up on this over the coming weekend. @eksortso which issues specifically? |
@pradyunsg, I was speaking generally, thinking that having a dedicated tag on issues or PRs might speed up response times on critical issues. Specifically I'm referring to this issue, because we're seeing divergent interpretations in the parsers. Though it could be applied to others like #553 which have been talked through thoroughly but aren't as immediately critical to the standard. The idea behind this is that our top decision makers could focus on decision pending issues and respond to them first. But depending on what the TOML standard's actual governance model is, such tagging would be redundant. |
My OSS time situation isn't good. (pip 19.0 rollout hasn't been "smooth") :/ If someone could summarize the possible positions the specification could take wrt restrictions, as discussed above, it would be greatly appreciated. :) |
@pradyunsg I'll summarize my position at least, and let others cover theirs: In essence, there is ambiguity in the spec regarding reopening/extending tables to define new keys, namely via dotted keys vs bracketed keys, with inline tables in the mix as well. My argument is that if the core data model is a hash table, then any combination of table syntax should be permitted to define tables, or extend previous definitions of tables as long as the restriction that redefining keys with non-table values is not violated. This keeps implementation straightforward and the rules simple for those writing TOML to remember. As I see it, any other option results in conflicting rules which are arbitrarily resolved, which does not seem to vibe with TOMLs stated goal of minimalism. In my view, the following is valid:
The discussion in this thread is long, but I think is worth the read, because we identify all the issues and possible solutions in detail. See my comment below for some additional thoughts. |
@pradyunsg My point was that while many people in this thread feel that the following:
should be invalid; the spec explicitly allow this by saying:
It needs to be clarified if that's not the case. I would also like to echo @bitwalker by saying that this thread is definitely worth reading in its entirety. |
The example given by @AndrewSav reminded me of a point I would like to clarify. If that example or any of the others in this thread are actually supposed to be invalid, then it is not only important to clarify the specification, but clarify why it is invalid in the first place, beyond just "we choose to resolve conflicting rules in this specific way". Cognitive load is just as important a metric as syntactic complexity in my opinion, and having a framework from which to reason about the rules reduces that load, as long as there is some unifying framework. Put another way, if TOML maps unambiguously to an arbitrarily nested hash table, what do the rules described in the specification do to support that mapping or support the goal of minimalism. If any are contradictory, why? If we want to place restrictions on how the syntax allows you to describe a hash table, users and implementors alike expect those restrictions to come as a trade off, for a benefit that is worth more than the loss of flexibility. That trade off should be explained to help both users and implementors of TOML to properly reason about its use. If there is no trade off, then such restrictions probably should be lifted, or at least reconsidered. I'll stop posting now to avoid cluttering this thread further, but I feel like the above condenses my thoughts best. |
@bitwalker Your example isn't getting you what your comment says. The introduction of a.b = 1
a.a.c.d = 2 Or, |
I thoroughly back the position laid out by @ChristianSi in his November 10, 2018 comment. I couldn't express it any more clearly. #499 (comment) |
@eksortso Thanks, I still stand by that position and propose to add something like the text in that comment ("Ways of defining tables") to the next revision of the TOML spec. If further clarification is needed: it's an attempt to explain how dotted keys and inline tables interact with TOML's rule "You cannot define any table more than once". I believe that such a clarification would not introduce any new restrictions but merely make explicit what's already implicit in the TOML v0.5 spec, as explained in an earlier comment. |
Just noting that this is still on my radar -- I've just not been able to make time for this. |
I finally managed to come around to reading this and spend some time thinking about this. Geez y'all. This is a wonderful and dense conversation! Thanks a ton for providing your inputs here everyone! It's much appreciated. :) Putting down my thoughts in a follow up post. |
I was in the "strict" camp before it got a name. ;) @ChristianSi's well written "Ways of defining tables" semantics, are exactly as what I had in mind, when writing up the specification for dotted keys. To reiterate poorly, inline tables are immutable and tables directly defined by a dotted key can not be "redefined" by using the i.e. The following examples are invalid: foo.bar = {}
foo.bar.baz = "true" # INVALID
foo.bar.spam = {} # INVALID vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.nums = { three = 'Three' } # INVALID
[vals.nums] # INVALID
three = 'Three' The following examples are valid: vals.nums.one = 'One'
vals.nums.two = 'Two'
[vals.letters]
one = 'A'
two = 'B' [profile]
release.debug = true
[profile.release.misc]
alpha = "A" a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3 I never intended that this last example be valid (and neither did @mojombo), but it is as per the language used. Now that I re-read the spec, it is clear to me that the intent to disallow this is not as obvious, as I thought it was when I wrote this.
@bitwalker It would help to add a clarification in the inline tables section -- inline-tables are basically a fancier "Value" and all values are immutable. While I do think having some reference/guidance on why certain choices were made is helpful, I don't think adding that would be critical-path for getting to 1.0. Action items here would be, at least:
If anyone can think of additional things we should do here, please do holler! :) |
I'm on the fence on this TBH -- I don't want to break compatibility but I also really want to just straight up disallow this -- I don't see too many usecases where doing this out-of-order makes much sense anyway so maybe the breakage is fine? I guess we should look into this in a follow up, better scoped, issue. |
Happy to hear it 👍 If I understand you correctly, you definitively want to prohibit key injection into inline tables in TOML 1.0 (yeah!) but are unsure about whether or not to prohibit out-of-order definition of dotted keys like this? a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3 While I don't have any strong feelings on the second issue (as opposed to the first one!), my viewpoint is that such out-of-order definitions, though bad style, are harmless and should not be prohibited in TOML 1.x. For one thing, they are clearly allowed in 0.5 and hence covered by our compatibility promise, and moreover, the rule that "order of keys within a single table block" (introduced by |
Yep and yep.
Yea, this is basically where I'm split tbh. Allowing them in TOML 1.0 isn't a PITA but it is a quirk that I (really) don't want to have.
Let's just stick with this. |
The only remaining idea from #292 that has not been decided upon and does not have a dedicated issue.
I mean, I don't know how much I like it myself but, hey, this needs discussion so, here's a dedicated issue for it.
The text was updated successfully, but these errors were encountered: