New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNF Monocle Party #34

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
10 participants
@steakknife

steakknife commented Feb 24, 2013

Top hats included at no extra charge.

@postmodern

This comment has been minimized.

Contributor

postmodern commented Feb 24, 2013

👍 helps stop parser bugs.

@steakknife

This comment has been minimized.

steakknife commented Feb 24, 2013

I looked at the ISO scanf format before writing the EBNF... the spec can become bloated quickly.

Update: if anyone has a copy of the C99 spec, check section 7.19.6.1

@steakknife

This comment has been minimized.

steakknife commented Feb 24, 2013

Morr tests (and specificity) needed:

.0        # float
-.0       # float
0.        # float
-0.       # float
0.0       # float
-0.0      # float
-.0e0     # float
-0.0e1    # float
-0.0e01   # float
-infinity # float
INF       # float
nan       # float
-qNaN     # float
snan      # float
NAN       # float

-0        # integer
0         # integer
10000     # integer
-100000   # integer

-01       # invalid
-.e0      # invalid
-0.0e     # invalid
-.e.      # invalid
-0e.      # invalid
-e.       # invalid
-e        # invalid

Example design decision dilemmas: Is + allowable if going down the strtod() / scanf() route? 1.#INF formats? (rabbit hole-edge case-itis)

@aaronblohowiak

This comment has been minimized.

Contributor

aaronblohowiak commented Feb 24, 2013

This is the (yertl-formatted) one I came up with:

https://github.com/aaronblohowiak/toml/blob/master/toml/toml.yrt

I don't catch a few of the character escapes in the grammer, those will be considered errors by the parser (\r and \0, namely).

I believe that +, INF, nan and scientific notation are explicitly disallowed.

Also, your EBNF does not catch spaces between KEY and '=', if I am reading it correctly.

@aaronblohowiak

This comment has been minimized.

Contributor

aaronblohowiak commented Feb 24, 2013

Your EBNF also would not allow empty arrays, if i am reading it correctly.

@C0mkid

This comment has been minimized.

Contributor

C0mkid commented Feb 24, 2013

There shouldn't be empty arrays as per #30

Barry Allard added some commits Feb 24, 2013

Barry Allard
Explicit whitespace distinction
Whitespace is not included in productions because it makes a mess.
Barry Allard
Even more explicit
// is a comment.
@steakknife

This comment has been minimized.

steakknife commented Feb 24, 2013

@aaronblohowiak Look at the pseudo terminals named WHITESPACE and ARRAY_WHITESPACE.

As a general observation: a sensibly-robust parser will have a 'pedantic parse for explicit conformity' mode and an 'try hard to parse liberally and ignore all garbage' (default) mode. During scanner/parser design, T/NT transitions should make sensible choices.

Post Script: Let's try not introduce obscure parser/generators semantics no one's familiar with or too far removed from realizability... they tend to create unnecessary hurdles to understanding and complicate implementations respectively. I've heard Bison with Flex, JavaCC may be good places to start if using codegen'ed scanner tables LL(k) / LALR(n<5)

@aaronblohowiak

This comment has been minimized.

Contributor

aaronblohowiak commented Feb 24, 2013

@C0mkid #30 only says no explicit values, it does not say that empty arrays are disallowed. For instance a = [] is different (in some languages) from a = nil.

@aaronblohowiak

This comment has been minimized.

Contributor

aaronblohowiak commented Feb 24, 2013

@steakknife ah, was not familiar with the pseudo-terminals (my brain just skipped the comment-looking part)

I dont think anyone was introducing obscure parser/generator semantics.. I did not mean to suggest the file I linked to should be taken as authoritative, just as an alternative.

@C0mkid

This comment has been minimized.

Contributor

C0mkid commented Feb 24, 2013

@aaronblohowiak As this is a config language, I guess the parser can just make the assumption that the key not existing is the same as empty array/null/nil.

@aaronblohowiak

This comment has been minimized.

Contributor

aaronblohowiak commented Feb 24, 2013

@C0mkid then how would you express a = [[1,2,3], [1,4], [], [4,5,6]

@C0mkid

This comment has been minimized.

Contributor

C0mkid commented Feb 24, 2013

@aaronblohowiak ah, I was more thinking on the lines of top level stuff like a = [] which wouldn't make that much of a difference, but now I see what you mean.

@steakknife

This comment has been minimized.

steakknife commented Feb 24, 2013

@aaronblohowiak Ah no worries. It's good to have a second clean-room implementation as a CRC.

@steakknife

This comment has been minimized.

steakknife commented Feb 24, 2013

@C0mkid, @aaronblohowiak I'm thinking per no nulls that empty sets would be also disallowed to avoid edge-cases similar to null.... Might need @mojombo to weigh in.

README.md Outdated
KEY = [^\.]+
KEYGROUPNAME = KEY ( '.' KEY )*
STRING = '"' ([^\"\\]|'\\'[0tnr"\\])* '"

This comment has been minimized.

@mrflip

mrflip Feb 24, 2013

Is there a ' quote mark missing at end of the STRING line?

README.md Outdated
WHITESPACE = [\ \t]
ARRAY_WHITESPACE = [\ \t\r\n]
KEY = [^\.]+

This comment has been minimized.

@mrflip

mrflip Feb 24, 2013

Keys shouldn't allow [ or ] either, as #51 points out

This comment has been minimized.

@pygy

pygy Feb 24, 2013

Contributor

What about forbidding =as well?

This comment has been minimized.

@88Alex

88Alex Apr 13, 2013

Yes, = should really be disallowed, as it would cause nightmares for parsers.

@kelvinst

This comment has been minimized.

kelvinst commented Aug 23, 2013

This seems to enter on stagnancy, but anyway, I'll express my opinions.

When I read the TOML specification I take attention first to one thing that @mojombo says: "Because we need a decent human-readable format that maps to a hash and the YAML spec is like 80 pages long and gives me rage". So, if the intention of @mojombo is keep the specification the more simple to read that we could, maybe this PR breaks this intention a little.

I understand that this can be veeery helpfull to stop parser errors between the many implementations of TOML, but put it in the general specification may not be the best way.

I suggest to think about in put this in other file and link it in a section of the README file (maybe a "Development" section)

@parkr

This comment has been minimized.

parkr commented Aug 23, 2013

I agree that this suggestion breaks the simplicity and human-readability of both the spec and of TOML files. I'm 👎 on this PR.

@postmodern

This comment has been minimized.

Contributor

postmodern commented Aug 23, 2013

I understand that this can be veeery helpfull to stop parser errors between the many implementations of TOML, but put it in the general specification may not be the best way.

TOML was created in the wake of the YAML vulnerabilities. It would seem that TOML should also ensure that implementations are correct and secure. English text is a horrible medium to describe subtle details, which leads to developers coming up with their own interpretations which leads to bugs, incompatibilities and sometimes vulnerabilities (eg: BlueTooth). If the EBNF specification has gotten to bit, then this might indicate that TOML has gotten to complex.

We gain nothing by rejecting a formal specification.

@BurntSushi

This comment has been minimized.

Member

BurntSushi commented Aug 23, 2013

@postmodern Absolutely. The only debate here should whether the EBNF and the English version of the spec are in agreement.

@steakknife

This comment has been minimized.

steakknife commented Aug 23, 2013

@BurntSushi @postmodern Yup.

TL;DR: Formal and informal are useful for different audiences.

Formal spec is necessary for implementers and neck-beard users. Informal is great for new users and to make sure the formal is in keeping with reality; divergence of the two as mentioned is bad news. Put another way, it's another "model" language to describe the same behavior in a different way.

For the formal part, one alternative to EBNF is to have a reference implementation in something like clear, simplistic C that accounts for all the corner cases that have been raised and decided upon. For any questions, whatever the reference implementation does may be viewed as the SSOT (single source of truth). Either way, it will change over time.

English is the language of business because of ambiguity. An English spec is helpful for new users and those unfamiliar with more formal specifications since not everyone can think in formal methods.

As an example of another codec that is easy, fast to understand (not necessarily to implement or unambiguity):

  1. json website - English and semi-formal together
  2. jsmn parser in 250 lines that does most things right https://bitbucket.org/zserge/jsmn/src/09ad3d4559ea3e5b1b93cd90350e1909534635dd/jsmn.c?at=default

Postscript: If you were not Musk and wanted to design Hyperloop's control systems, here's some tips from NASA:

@kelvinst

This comment has been minimized.

kelvinst commented Aug 25, 2013

@postmodern, as I say, I agree that english is not the best languages for the develop the implementarions of TOML, but one of the intentions of @mojombo with TOML is a simple and direct specification as I commented before

The question is: this is the best place to the EBNF? I thought in another file to this more specific specification and a link to it in the README

This only, I'm pretty sure that this EBNF is very important and need to be accessible by this specification, but not sure about this is the best place.

English is the language of business because of ambiguity.


Reply to this email directly or view it on GitHub.

@postmodern

This comment has been minimized.

Contributor

postmodern commented Aug 26, 2013

@kelvinst RFCs usually put the EBNF in a separate section. Perhaps this would be a good time to break down the README into separate files?

@steakknife

This comment has been minimized.

steakknife commented Aug 26, 2013

ZOMGCOPTER @ bikeshedding. Organization is orthogonal to whether a formal specification is useful.

If this PR were useful, it would've been merged. Since it has not to this point, all debates as such are moot anyhow.

Time to kick people out by stopping the music and cutting off the booze. Handing 'em trash bags usually gets 'em moving out pretty fast. : )

@steakknife steakknife closed this Aug 26, 2013

@postmodern

This comment has been minimized.

Contributor

postmodern commented Aug 26, 2013

Time to kick people out by stopping the music and cutting off the booze. Handing 'em trash bags usually gets 'em moving out pretty fast. : )

@steakknife Why didn't you merge the EBNF specification as it is? Having a EBNF specification would legitimize TOML. Without a formal specification that can be verified, TOML is a toy format.

@BurntSushi

This comment has been minimized.

Member

BurntSushi commented Aug 26, 2013

@postmodern - @steakknife is the one who opened the pull request, so he can close it. I have no idea why he closed it. The reasons he gave were pretty mysterious, particularly given that @mojombo has been allowing TOML to simmer for several months on its own anyway.

I'd recommend that you re-open a pull request of your own :-)

@kelvinst

This comment has been minimized.

kelvinst commented Aug 28, 2013

Heelp! A man fainted on his keyboard!

@parkr

This comment has been minimized.

parkr commented Sep 1, 2013

@kelvinst 😆 👍

@steakknife

This comment has been minimized.

steakknife commented Sep 1, 2013

TL;DR - Please stop.

The updated spec was committed but  this pull was clearly never in an acceptable state and there was zero feedback to rectify it.   So yes, it's closed because I'm done wasting my time.






  Like the Olympics certifying the rules after it were over, feel free to stay and argue, but everyone else has already gone home.








Fork and pull if you want, I'm unwatching and ignoring this now.


Sent from Mailbox for iPhone

On Sun, Sep 1, 2013 at 10:36 AM, Parker Moore <notifications@github.com="mailto:notifications@github.com">> wrote:

@kelvinst 


Reply to this email directly or view it on GitHub.

@BurntSushi

This comment has been minimized.

Member

BurntSushi commented Sep 2, 2013

@steakknife Yes yes. You've made your lack of interest clear. No need to repeat yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment