Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecmarkup shorthand required #1

Closed
bterlson opened this issue Apr 12, 2014 · 23 comments
Closed

Ecmarkup shorthand required #1

bterlson opened this issue Apr 12, 2014 · 23 comments

Comments

@bterlson
Copy link
Member

Terseness is an important goal of this proposal. Custom elements help a lot by removing a bunch of boilerplate HTML, but don't go far enough. Markdown, on the other hand, is very terse and readable but doesn't provide any semantic meaning (and may be too complex for our purposes anyway).

After a discussion on IRC with @domenic and @jorendorff, we think we need a shorthand for commonly used ECMAScript entities (see the readme for a list). At the very least, the following:

  • Non-terminals. Any non-terminals can automatically be cross-referenced with the definition of that non-terminal.
  • Cross references
  • Code blocks

Additionally, es-algorithm elements need the following shorthands:

  • Local variables
  • Algorithm steps

Algorithms also use bold typeface to denote values like this, true, and false but I think this can be the same shorthand as used for code blocks.

es-algorithm will also want to auto-xref references to internal algorithms using a heuristic of somesort (@jorendorff can comment on this)

Proposal So Far

Entity Short-hand
Non-terminal ??
Cross-reference Markdown link syntax
Code blocks / literals Markdown code syntax (back-ticks)
Local variables Markdown italic syntax (variable)
Algorithm steps Markdown bulleted list syntax

Algorithm steps could use markdown numbering syntax but putting numbers into the source text means adding/removing steps hoses up the diff so perhaps simply using bulleted list syntax would be good.

Open to additional suggestions!

@domenic
Copy link
Member

domenic commented Apr 13, 2014

In domenic/promises-unwrapping I ended up using three things within the algorithm steps:

  • _variables_ (italic)
  • **undefined**/**true**/**false**/**this**/**TypeError** etc. (bold)
  • "strings" (monospace)

@domenic
Copy link
Member

domenic commented Apr 13, 2014

In the actual ES spec, completions use a sans-serif font: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-thrower-functions

@bterlson
Copy link
Member Author

Your proposal in #3 is reasonable.

Is it a goal to have spec text render "appropriately" if processed by a standard markdown processor? My feeling is that we shouldn't think in terms of formatting, ie. * doesn't mean bold, it means some particular ES construct. This suggests that two distinct constructs should not both use * even if both are currently displayed bold. With that in mind...

Since we only have one kind of list, I'd prefer to use *. My OCD dislikes seeing 1. repeated. Also using 1. will tempt people to number the list in plaintext, which will be bad practice due to its impact on diffs.

undefined and "string" are both values, and I would prefer to use backtick for these (code blocks/literals in the table above). I like to think of this as saying "the value you'd get if you eval'd this code in a fully conformant ECMAScript implementation".

I say we jettison the MD equivalence of * and _ and use _ for vars as you propose, and * for non-terminals. Thus, to complete my proposal:

Entity Element Short-hand
Non-terminal es-nt FunctionDeclaration
Cross-reference es-xref / a Markdown link syntax
Code blocks / literals code Markdown code syntax (back-ticks)
Local variables var Markdown italic syntax (variable)
Algorithm steps li Markdown bulleted list syntax

Plus some yet-to-be-precisely-defined auto-linking semantics.

Two concerns: Doesn't have a syntactic distinction between undefined/true/false/etc. and string literals that exists in #3. @domenic do you think this is an important aspect?

Also, MD link syntax may just be untenable period. I'll look into this more...

Thoughts? Not too different from #3, though I'm willing to defer to experience with actually writing these specs if anyone has concerns :)

@domenic
Copy link
Member

domenic commented Apr 14, 2014

Is the idea to auto-detect a fixed set of keywords, and translate them into <code class="value"> which gets styled as bold, instead of <code class="code"> which gets styled as monospace?

I can't really find good names for the distinction... +0, -0, true, false, TypeError on one side; "string" and Promise.prototype.done on the other side.

@bterlson
Copy link
Member Author

My goal is to arrive at the minimal set of entities and associated MD-like shorthand syntax. Don't want to auto detect unless the distinction is important. I don't see the difference between false and "string" so I can't make that call!

@domenic
Copy link
Member

domenic commented Apr 14, 2014

I guess the question is: is our goal to be able to faithfully reproduce the existing spec, or are we looking to simplify it at the same time? It sounds like you are looking to simplify, whereas I was assuming we were planning to keep the same typography without simplification.

@bterlson
Copy link
Member Author

Hmm, good question. I should get better at stating my assumptions up front!

I was assuming that it would be better to focus on the markup that makes most sense and then worry about styling later, assuming we could reproduce something like the current spec using CSS or something. I want to see how good we can make the plaintext format so I am to be as terse and simple as possible while still retaining all of the important semantic meaning.

So, in that light:

  • The readme is a minimal list of things I've found that have important semantic meaning in the ECMAScript spec.
  • The table above proposes a short-hand for the most commonly used of these in order to make the format more readable and hand-writable while preserving clear semantic intent.

You raise a good point though that it's possible that this proposal is not able to faithfully reproduce the formatting of our current document. If this would be a blocking adoption issue I would be fine going with your proposal which does a better job of sticking to the spec formatting. Although if the distinction between true (bold) and "string" (bold monospace) is not particularly useful I'd prefer to drop it just to make things easier to parse/author/etc.

@jorendorff
Copy link

Simplifying: If something is syntactically obvious anyway, we do not need special ecmarkup to point it out; that would be redundant. The distinction between boolean and string values seems like that kind of case. Let's write **true** and **"string"**, which is semantically reasonable, and have the ecmarkup-to-html script render the strings in monospace as a presentation thing.

More broadly I think we should aim to reproduce the current document faithfully. The goal here is to switch from Word to a text format. To keep focused on that, I propose taking the most boring possible stance on everything else.

Whitespace: People will have opinions about where and how to indent, where to put blank lines, whether to wrap at 80 columns, 100 columns, never wrap, etc. Let's nail it down right away. (I vote we wrap at 80, no blank lines between steps.)

Algorithm lists: Let's use Markdown numbered lists and insist that every step be numbered 0.

Grammar: The Word document has rather rich formatting for grammatical productions. Using real markup for all that will render the grammar unreadable. The es-spec-html script already strips down Word to a plain-text grammar format, then converts the plain text to pretty HTML; I think we should formally describe the plain-text format and adopt it. Examples in a minute.

@jorendorff
Copy link

Examples of the plain-text grammar format I referred to:

WhileStatement :
    while ( Expression ) Statement

IterationStatement :
    for ( LexicalDeclaration ; Expression_opt ; Expression_opt ) Statement

Here's it might look like in full markup:

*WhileStatement* **:**  
`while` `(` *Expression* `)` *Statement*

*IterationStatement* **:**  
`for` `(` *LexicalDeclaration* `;` *Expression*<sub>opt</sub> `;` *Expression*<sub>opt</sub> `)` *Statement*

and there would have to be some way to recover the indentation.

Other random things that have to be dealt with in grammar include: but not, one of, [lookahead ∉], [?Yield], [Lexical goal InputElementRegExp]. It'll be fairly hard to get human authors to get these right all the time; plain text is easier.

@bterlson
Copy link
Member Author

Is it really important to leverage existing markdown formatting semantics? I'd prefer to focus on picking syntax that is easy to write and has clear semantic intent. This can be rendered to HTML trivially using a custom renderer in something like Marked.

@jorendorff, two things about your proposal concern me. First, I think it's impossible to represent the grammar in plaintext due to all the possibilities (both now and in the future). Second, I think machine readability is important for clarity and tooling purposes. This means it needs to be completely unambiguous whether something is a non-terminal, terminal, annotation, prose, etc. I think the current format in bterlson/ecmascript is best here, as it supports every grammar convention in ECMAScript... for your consideration:

<es-production name="WhileStatement">
    <es-rhs>while ( <es-nt>Expression</es-nt> ) <es-nt>Statement</es-nt></es-rhs>
</es-production>

If we adopt my proposal above to adopt * as a non-terminal shorthand:

<es-production name="WhileStatement">
    <es-rhs>while ( *Expression* ) *Statement*</es-rhs>
</es-production>

See readme.md for a complete listing of these. Also here is the current IterationStatement grammar (note that if this were hand-authored I'd probably make different stylistic choices :)).

@bterlson
Copy link
Member Author

Re: whitespace, Is it still true that we can't depend on text editors to wrap for us?

Re: algorithm steps, if you guys are set on preserving MD format semantics and do not want algorithm steps to be displayed as bulleted lists when rendered by standard MD I will give up my fight to just use * for them.

@domenic
Copy link
Member

domenic commented Apr 14, 2014

I would strongly prefer to have no wrapping. It works out well for all my specs and I go back and look at old Markdown readmes I wrote with wrapping and am sad. It's a pain to maintain.

I don't particularly care between *, -, +, ., 0., or 1. for numbering. I kind of like 0..

@bterlson
Copy link
Member Author

Ok, so resolved 0. for algorithm steps!

I'll vote no wrapping unless @jorendorff feels strongly. I don't have a strong position here other than I hate it when I have to break an html tag in half to wrap sanely :-P

I think we have agreement on using custom elements, and since I think we need a full set of custom elements to define the specification anyway (the readme contains the full set of elements I think are necessary), it makes sense to define a mapping from MD syntax to these custom elements. Does that make sense to everyone else? If so we can start discussing my proposal above more concretely by proposing deltas to elements required, element names, and shorthand forms.

@domenic
Copy link
Member

domenic commented Apr 15, 2014

it makes sense to define a mapping from MD syntax to these custom elements. Does that make sense to everyone else?

No, this does not necessarily make sense to me. It needs to be spelled out further. For example, if we decide nonterminals are denoted with *nonterminal*, why do we ever need to consider writing <es-nt>nonterminal</es-nt>? Concrete examples of when you would do that, preferably drawn from the current spec, would be helpful.

@bterlson
Copy link
Member Author

The source format we check in would only contain <es-nt> when you need an attribute. The only such attribute today for es-nt is optional, so you would type <es-nt optional>Identifier</es-nt> instead of *Identifier*. If optional needed a short-hand we could say **Identifier** desugars to <es-nt optional>Identifier</es-nt>.

We would be defining two formats, effectively. A normalized form which is pure HTML/custom elements, and a short-hand form which desugars to it. Defining a mapping from MD to Custom Elements means its easy to reason about the structure of the desugared document. The desugared document is what tools will consume (both in browser, and likely on top of a library we provide for command-line tools).

Where possible I think it's better to wrap spec text in semantically meaningful elements (ie. <es-t> rather than <b>) and my proposal is an easy way to do it with a short-hand. @jorendorff is correct that the rules for inferring meaning from formatted spans already exist and work, but it's complexity I'd avoid if we can. With a mapping, our parse code is a simple custom formatter for a markdown parser with no contextual reasoning required.0

I'll work on an example this afternoon PST(:trollface:), but feel free to stop me before I get to that if this seems like the wrong path. I could be convinced to use MD off the shelf if you two (@jorendorff / @domenic) would prefer to do so.

@domenic
Copy link
Member

domenic commented Apr 15, 2014

The desugared document is what tools will consume (both in browser, and likely on top of a library we provide for command-line tools).

I don't understand this. If I write <es-algorithm> 0. blah blah *Identifier* blah blah</es-algorithm> in my .html file, this is what the browser will see. The browser will never see <es-nt>.

I could be convinced to use MD off the shelf

This isn't really my preference, so no worries there.

@jorendorff is correct that the rules for inferring meaning from formatted spans already exist and work, but it's complexity I'd avoid if we can.

This is more along the lines of my preference. It seems to be a better authoring experience, with no downsides in terms of the final output's machine-readability, which IMO is a net win.

@bterlson
Copy link
Member Author

I don't understand this. If I write 0. blah blah Identifier blah blah in my .html file, this is what the browser will see. The browser will never see .I don't understand this. If I write 0. blah blah Identifier blah blah in my .html file, this is what the browser will see. The browser will never see .

The custom elements will replace * with something so it renders appropriately. My feeling is that replacing it with <es-nt> is better than <i> even just for styling reasons, but I also think we need to consider the desugared document as an actual thing and specify it accordingly so static build tools can produce such a document. Likewise, if someone wanted to write a custom visualization of the spec where non-terminals were treated specially they could do so without having to change how the markdown is processed.

It seems to be a better authoring experience

There are many grammar conventions, and I found that promoting some of them to tags or attributes in html made things easier (though somewhat more verbose, granted). That said if we think we can get away with using MD completely for the grammar, let's do it. I'd still like a desugaring to es-* tags if possible (should be possible?) for above reasons. I think a strong example will help all of us come to the same page on this one, so I'll work on that unless someone beats me to it!

@domenic
Copy link
Member

domenic commented Apr 15, 2014

My feeling is that replacing it with is better than even just for styling reasons,

OK, this makes sense.

I also think we need to consider the desugared document as an actual thing and specify it accordingly so static build tools can produce such a document.

I am not sure about this but willing to wait and see.

That said if we think we can get away with using MD completely for the grammar, let's do it.

You keep mischaracterizing this position as one involving MD, whereas it's actually one involving no markup or markdown at all, and simply inference of the type @jorendorff already does on a plaintext format. In other words, it's the first example in his earlier post, not the second one.

I think a strong example will help all of us come to the same page on this one, so I'll work on that unless someone beats me to it!

Definitely agreed!!

@bterlson
Copy link
Member Author

You keep mischaracterizing this position as one involving MD, whereas it's actually one involving no markup or markdown at all, and simply inference of the type @jorendorff already does on a plaintext format. In other words, it's the first example in his earlier post, not the second one.

Sorry for being so obtuse! I read earlier posts so many times and still missed this key point. I was thinking the plaintext grammar format was mutually exclusive with an MD-like shorthand. I now see how @jorendorff's proposal is better than HTML so I'm on board (though I would like to see what the various conventions like params and such look like).

To understand completely, the rules could be something like:

  • Inside es-production, parse plaintext grammar --> ecmarkup html
  • Inside es-* (other than es-production), parse MD-alike --> ecmarkup html

Is this close to what your preference is?

@domenic
Copy link
Member

domenic commented Apr 15, 2014

Is this close to what your preference is?

Yeah, that sounds right, I think! My motivation being that grammars are different enough beasts from the rest of prose, that making ecmarkdown work for that use case seems to make it a much larger language. (And of course, using ecmarkup tags is too verbose to expect people to write or maintain intelligibly.)

I am still not sure exactly how this "parse --> ecmarkup html" step is going to work, but I think examples and implementations will show the way. I guess you use Mutation Observers to re-render whenever the contents of the element changes, including on first parse?

@bterlson
Copy link
Member Author

You could use mutation observers and that'd be pretty sweet. In bterlson/ecmascript it's an on-insertion-time-only thing. For example, when an es-rhs node is created, any text nodes inside of it are wrapped in es-t element nodes.

@bterlson
Copy link
Member Author

Here's an example of a spec with the grammar for BindingElement (which doesn't currently display right but I think I interpreted it correctly): https://gist.github.com/bterlson/10785424

I'm not sure what the plaintext grammar representation is so I'll leave @jorendorff to add that part to the gist.

@domenic
Copy link
Member

domenic commented Apr 18, 2014

Looks good to me, although yeah that grammar is not something you'd want to write by hand.

What does title do on es-clause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants