Ecmarkup shorthand required #1

bterlson · 2014-04-12T20:06:03Z

Terseness is an important goal of this proposal. Custom elements help a lot by removing a bunch of boilerplate HTML, but don't go far enough. Markdown, on the other hand, is very terse and readable but doesn't provide any semantic meaning (and may be too complex for our purposes anyway).

After a discussion on IRC with @domenic and @jorendorff, we think we need a shorthand for commonly used ECMAScript entities (see the readme for a list). At the very least, the following:

Non-terminals. Any non-terminals can automatically be cross-referenced with the definition of that non-terminal.
Cross references
Code blocks

Additionally, es-algorithm elements need the following shorthands:

Local variables
Algorithm steps

Algorithms also use bold typeface to denote values like this, true, and false but I think this can be the same shorthand as used for code blocks.

es-algorithm will also want to auto-xref references to internal algorithms using a heuristic of somesort (@jorendorff can comment on this)

Proposal So Far

Entity	Short-hand
Non-terminal	??
Cross-reference	Markdown link syntax
Code blocks / literals	Markdown code syntax (back-ticks)
Local variables	Markdown italic syntax (variable)
Algorithm steps	Markdown bulleted list syntax

Algorithm steps could use markdown numbering syntax but putting numbers into the source text means adding/removing steps hoses up the diff so perhaps simply using bulleted list syntax would be good.

Open to additional suggestions!

The text was updated successfully, but these errors were encountered:

domenic · 2014-04-13T04:09:02Z

In domenic/promises-unwrapping I ended up using three things within the algorithm steps:

_variables_ (italic)
**undefined**/**true**/**false**/**this**/**TypeError** etc. (bold)
"strings" (monospace)

domenic · 2014-04-13T04:11:12Z

In the actual ES spec, completions use a sans-serif font: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-thrower-functions

bterlson · 2014-04-13T20:14:43Z

Your proposal in #3 is reasonable.

Is it a goal to have spec text render "appropriately" if processed by a standard markdown processor? My feeling is that we shouldn't think in terms of formatting, ie. * doesn't mean bold, it means some particular ES construct. This suggests that two distinct constructs should not both use * even if both are currently displayed bold. With that in mind...

Since we only have one kind of list, I'd prefer to use *. My OCD dislikes seeing 1. repeated. Also using 1. will tempt people to number the list in plaintext, which will be bad practice due to its impact on diffs.

undefined and "string" are both values, and I would prefer to use backtick for these (code blocks/literals in the table above). I like to think of this as saying "the value you'd get if you eval'd this code in a fully conformant ECMAScript implementation".

I say we jettison the MD equivalence of * and _ and use _ for vars as you propose, and * for non-terminals. Thus, to complete my proposal:

Entity	Element	Short-hand
Non-terminal	es-nt	FunctionDeclaration
Cross-reference	es-xref / a	Markdown link syntax
Code blocks / literals	code	Markdown code syntax (back-ticks)
Local variables	var	Markdown italic syntax (variable)
Algorithm steps	li	Markdown bulleted list syntax

Plus some yet-to-be-precisely-defined auto-linking semantics.

Two concerns: Doesn't have a syntactic distinction between undefined/true/false/etc. and string literals that exists in #3. @domenic do you think this is an important aspect?

Also, MD link syntax may just be untenable period. I'll look into this more...

Thoughts? Not too different from #3, though I'm willing to defer to experience with actually writing these specs if anyone has concerns :)

domenic · 2014-04-14T00:23:19Z

Is the idea to auto-detect a fixed set of keywords, and translate them into <code class="value"> which gets styled as bold, instead of <code class="code"> which gets styled as monospace?

I can't really find good names for the distinction... +0, -0, true, false, TypeError on one side; "string" and Promise.prototype.done on the other side.

bterlson · 2014-04-14T02:43:46Z

My goal is to arrive at the minimal set of entities and associated MD-like shorthand syntax. Don't want to auto detect unless the distinction is important. I don't see the difference between false and "string" so I can't make that call!

domenic · 2014-04-14T03:06:38Z

I guess the question is: is our goal to be able to faithfully reproduce the existing spec, or are we looking to simplify it at the same time? It sounds like you are looking to simplify, whereas I was assuming we were planning to keep the same typography without simplification.

bterlson · 2014-04-14T17:31:32Z

Hmm, good question. I should get better at stating my assumptions up front!

I was assuming that it would be better to focus on the markup that makes most sense and then worry about styling later, assuming we could reproduce something like the current spec using CSS or something. I want to see how good we can make the plaintext format so I am to be as terse and simple as possible while still retaining all of the important semantic meaning.

So, in that light:

The readme is a minimal list of things I've found that have important semantic meaning in the ECMAScript spec.
The table above proposes a short-hand for the most commonly used of these in order to make the format more readable and hand-writable while preserving clear semantic intent.

You raise a good point though that it's possible that this proposal is not able to faithfully reproduce the formatting of our current document. If this would be a blocking adoption issue I would be fine going with your proposal which does a better job of sticking to the spec formatting. Although if the distinction between true (bold) and "string" (bold monospace) is not particularly useful I'd prefer to drop it just to make things easier to parse/author/etc.

jorendorff · 2014-04-14T19:03:44Z

Simplifying: If something is syntactically obvious anyway, we do not need special ecmarkup to point it out; that would be redundant. The distinction between boolean and string values seems like that kind of case. Let's write **true** and **"string"**, which is semantically reasonable, and have the ecmarkup-to-html script render the strings in monospace as a presentation thing.

More broadly I think we should aim to reproduce the current document faithfully. The goal here is to switch from Word to a text format. To keep focused on that, I propose taking the most boring possible stance on everything else.

Whitespace: People will have opinions about where and how to indent, where to put blank lines, whether to wrap at 80 columns, 100 columns, never wrap, etc. Let's nail it down right away. (I vote we wrap at 80, no blank lines between steps.)

Algorithm lists: Let's use Markdown numbered lists and insist that every step be numbered 0.

Grammar: The Word document has rather rich formatting for grammatical productions. Using real markup for all that will render the grammar unreadable. The es-spec-html script already strips down Word to a plain-text grammar format, then converts the plain text to pretty HTML; I think we should formally describe the plain-text format and adopt it. Examples in a minute.

jorendorff · 2014-04-14T19:25:56Z

Examples of the plain-text grammar format I referred to:

WhileStatement :
    while ( Expression ) Statement

IterationStatement :
    for ( LexicalDeclaration ; Expression_opt ; Expression_opt ) Statement

Here's it might look like in full markup:

*WhileStatement* **:**  
`while` `(` *Expression* `)` *Statement*

*IterationStatement* **:**  
`for` `(` *LexicalDeclaration* `;` *Expression*<sub>opt</sub> `;` *Expression*<sub>opt</sub> `)` *Statement*

and there would have to be some way to recover the indentation.

Other random things that have to be dealt with in grammar include: but not, one of, [lookahead ∉], _[?Yield], [Lexical goal InputElementRegExp]. It'll be fairly hard to get human authors to get these right all the time; plain text is easier.

bterlson · 2014-04-14T20:11:03Z

Is it really important to leverage existing markdown formatting semantics? I'd prefer to focus on picking syntax that is easy to write and has clear semantic intent. This can be rendered to HTML trivially using a custom renderer in something like Marked.

@jorendorff, two things about your proposal concern me. First, I think it's impossible to represent the grammar in plaintext due to all the possibilities (both now and in the future). Second, I think machine readability is important for clarity and tooling purposes. This means it needs to be completely unambiguous whether something is a non-terminal, terminal, annotation, prose, etc. I think the current format in bterlson/ecmascript is best here, as it supports every grammar convention in ECMAScript... for your consideration:

<es-production name="WhileStatement">
    <es-rhs>while ( <es-nt>Expression</es-nt> ) <es-nt>Statement</es-nt></es-rhs>
</es-production>

If we adopt my proposal above to adopt * as a non-terminal shorthand:

<es-production name="WhileStatement">
    <es-rhs>while ( *Expression* ) *Statement*</es-rhs>
</es-production>

See readme.md for a complete listing of these. Also here is the current IterationStatement grammar (note that if this were hand-authored I'd probably make different stylistic choices :)).

bterlson · 2014-04-14T20:29:05Z

Re: whitespace, Is it still true that we can't depend on text editors to wrap for us?

Re: algorithm steps, if you guys are set on preserving MD format semantics and do not want algorithm steps to be displayed as bulleted lists when rendered by standard MD I will give up my fight to just use * for them.

domenic · 2014-04-14T20:32:24Z

I would strongly prefer to have no wrapping. It works out well for all my specs and I go back and look at old Markdown readmes I wrote with wrapping and am sad. It's a pain to maintain.

I don't particularly care between *, -, +, ., 0., or 1. for numbering. I kind of like 0..

bterlson · 2014-04-15T00:09:04Z

Ok, so resolved 0. for algorithm steps!

I'll vote no wrapping unless @jorendorff feels strongly. I don't have a strong position here other than I hate it when I have to break an html tag in half to wrap sanely :-P

I think we have agreement on using custom elements, and since I think we need a full set of custom elements to define the specification anyway (the readme contains the full set of elements I think are necessary), it makes sense to define a mapping from MD syntax to these custom elements. Does that make sense to everyone else? If so we can start discussing my proposal above more concretely by proposing deltas to elements required, element names, and shorthand forms.

domenic · 2014-04-15T05:10:22Z

it makes sense to define a mapping from MD syntax to these custom elements. Does that make sense to everyone else?

No, this does not necessarily make sense to me. It needs to be spelled out further. For example, if we decide nonterminals are denoted with *nonterminal*, why do we ever need to consider writing <es-nt>nonterminal</es-nt>? Concrete examples of when you would do that, preferably drawn from the current spec, would be helpful.

bterlson · 2014-04-15T16:11:34Z

The source format we check in would only contain <es-nt> when you need an attribute. The only such attribute today for es-nt is optional, so you would type <es-nt optional>Identifier</es-nt> instead of *Identifier*. If optional needed a short-hand we could say **Identifier** desugars to <es-nt optional>Identifier</es-nt>.

We would be defining two formats, effectively. A normalized form which is pure HTML/custom elements, and a short-hand form which desugars to it. Defining a mapping from MD to Custom Elements means its easy to reason about the structure of the desugared document. The desugared document is what tools will consume (both in browser, and likely on top of a library we provide for command-line tools).

Where possible I think it's better to wrap spec text in semantically meaningful elements (ie. <es-t> rather than <b>) and my proposal is an easy way to do it with a short-hand. @jorendorff is correct that the rules for inferring meaning from formatted spans already exist and work, but it's complexity I'd avoid if we can. With a mapping, our parse code is a simple custom formatter for a markdown parser with no contextual reasoning required.0

I'll work on an example this afternoon PST(:trollface:), but feel free to stop me before I get to that if this seems like the wrong path. I could be convinced to use MD off the shelf if you two (@jorendorff / @domenic) would prefer to do so.

domenic · 2014-04-15T16:15:12Z

The desugared document is what tools will consume (both in browser, and likely on top of a library we provide for command-line tools).

I don't understand this. If I write <es-algorithm> 0. blah blah *Identifier* blah blah</es-algorithm> in my .html file, this is what the browser will see. The browser will never see <es-nt>.

I could be convinced to use MD off the shelf

This isn't really my preference, so no worries there.

@jorendorff is correct that the rules for inferring meaning from formatted spans already exist and work, but it's complexity I'd avoid if we can.

This is more along the lines of my preference. It seems to be a better authoring experience, with no downsides in terms of the final output's machine-readability, which IMO is a net win.

bterlson · 2014-04-15T16:27:56Z

I don't understand this. If I write 0. blah blah Identifier blah blah in my .html file, this is what the browser will see. The browser will never see .I don't understand this. If I write 0. blah blah Identifier blah blah in my .html file, this is what the browser will see. The browser will never see .

The custom elements will replace * with something so it renders appropriately. My feeling is that replacing it with <es-nt> is better than <i> even just for styling reasons, but I also think we need to consider the desugared document as an actual thing and specify it accordingly so static build tools can produce such a document. Likewise, if someone wanted to write a custom visualization of the spec where non-terminals were treated specially they could do so without having to change how the markdown is processed.

It seems to be a better authoring experience

There are many grammar conventions, and I found that promoting some of them to tags or attributes in html made things easier (though somewhat more verbose, granted). That said if we think we can get away with using MD completely for the grammar, let's do it. I'd still like a desugaring to es-* tags if possible (should be possible?) for above reasons. I think a strong example will help all of us come to the same page on this one, so I'll work on that unless someone beats me to it!

domenic · 2014-04-15T16:36:14Z

My feeling is that replacing it with is better than even just for styling reasons,

OK, this makes sense.

I also think we need to consider the desugared document as an actual thing and specify it accordingly so static build tools can produce such a document.

I am not sure about this but willing to wait and see.

That said if we think we can get away with using MD completely for the grammar, let's do it.

You keep mischaracterizing this position as one involving MD, whereas it's actually one involving no markup or markdown at all, and simply inference of the type @jorendorff already does on a plaintext format. In other words, it's the first example in his earlier post, not the second one.

I think a strong example will help all of us come to the same page on this one, so I'll work on that unless someone beats me to it!

Definitely agreed!!

bterlson · 2014-04-15T16:54:14Z

You keep mischaracterizing this position as one involving MD, whereas it's actually one involving no markup or markdown at all, and simply inference of the type @jorendorff already does on a plaintext format. In other words, it's the first example in his earlier post, not the second one.

Sorry for being so obtuse! I read earlier posts so many times and still missed this key point. I was thinking the plaintext grammar format was mutually exclusive with an MD-like shorthand. I now see how @jorendorff's proposal is better than HTML so I'm on board (though I would like to see what the various conventions like params and such look like).

To understand completely, the rules could be something like:

Inside es-production, parse plaintext grammar --> ecmarkup html
Inside es-* (other than es-production), parse MD-alike --> ecmarkup html

Is this close to what your preference is?

domenic · 2014-04-15T17:36:29Z

Is this close to what your preference is?

Yeah, that sounds right, I think! My motivation being that grammars are different enough beasts from the rest of prose, that making ecmarkdown work for that use case seems to make it a much larger language. (And of course, using ecmarkup tags is too verbose to expect people to write or maintain intelligibly.)

I am still not sure exactly how this "parse --> ecmarkup html" step is going to work, but I think examples and implementations will show the way. I guess you use Mutation Observers to re-render whenever the contents of the element changes, including on first parse?

bterlson · 2014-04-15T19:22:09Z

You could use mutation observers and that'd be pretty sweet. In bterlson/ecmascript it's an on-insertion-time-only thing. For example, when an es-rhs node is created, any text nodes inside of it are wrapped in es-t element nodes.

bterlson · 2014-04-15T22:53:34Z

Here's an example of a spec with the grammar for BindingElement (which doesn't currently display right but I think I interpreted it correctly): https://gist.github.com/bterlson/10785424

I'm not sure what the plaintext grammar representation is so I'll leave @jorendorff to add that part to the gist.

domenic · 2014-04-18T04:10:20Z

Looks good to me, although yeah that grammar is not something you'd want to write by hand.

What does title do on es-clause?

bterlson mentioned this issue Apr 13, 2014

Sample algorithm steps "markdown" #3

Closed

bterlson closed this as completed Jan 14, 2015

bakkot mentioned this issue Dec 11, 2020

[DO NOT MERGE YET] Improve rendering performance by chunking layout #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ecmarkup shorthand required #1

Ecmarkup shorthand required #1

bterlson commented Apr 12, 2014

domenic commented Apr 13, 2014

domenic commented Apr 13, 2014

bterlson commented Apr 13, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 14, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 14, 2014

jorendorff commented Apr 14, 2014

jorendorff commented Apr 14, 2014

bterlson commented Apr 14, 2014

bterlson commented Apr 14, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 18, 2014

Ecmarkup shorthand required #1

Ecmarkup shorthand required #1

Comments

bterlson commented Apr 12, 2014

Proposal So Far

domenic commented Apr 13, 2014

domenic commented Apr 13, 2014

bterlson commented Apr 13, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 14, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 14, 2014

jorendorff commented Apr 14, 2014

jorendorff commented Apr 14, 2014

bterlson commented Apr 14, 2014

bterlson commented Apr 14, 2014

domenic commented Apr 14, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 15, 2014

bterlson commented Apr 15, 2014

bterlson commented Apr 15, 2014

domenic commented Apr 18, 2014