Folded Block Scalars

mofosyne edited this page May 5, 2015 · 17 revisions

Folded Block Scalars should be removed from the language. They offer almost no abilities not offered by the other forms, and yet are hardly ever implemented correctly. Folded scalars have a lot of edge cases.

folded: >
  This content is
  folded and has trailing newline.
quoted:
  "This content is
  folded and has trailing newline.\n"

NOTE: We should write some tests to see which implementations get this right.

Current YAML Behaviour

This:

x: >
  foo
  bar
  baz

Produces:

{"x": "foo bar baz\n"}

This:

x: >
  foo
   bar
  baz

Produces:

{"x": "foo\n bar\nbaz\n"}

That is probably too precise. It uses a wiki-ish syntaxism that doesn't belong in YAML. No emitter would produce it. And no human would remember how it works. So it is not useful.

Proposal to replace folded with new quoting rules

Proposal 1: Using "

The only thing that folded offers us over quoted folding, is a trailing newline.

Currently:

x: "foo
  "
y: "foo "

produce the same values. There is probably no usage of the first form. So we should make this work:

x: "foo
  "
y: "foo\n"
z: >
  foo

all produce the same value.

Further, these two are currently the same:

x: "
  foo"
y: " foo"

You would never see the first in real life. So we make it:

x: "
  foo"
y: "foo"

Then we can make the following work:

x1: "
  foo"
y1: "foo"
z1: >-
  foo
x2: "
  foo
  "
y2: "foo\n"
z2: >
  foo

This means that when you have a folded paragraph, you don't need to put the first line on the same line as the key (to avoid the extra space).

folded paragraph: "
  This means that when you have a folded paragraph, you don't need to put
  the first line on the same line as the key (to avoid the extra space)."

Looks great! We have completely obviated any usefulness of the folded scalar form.

Getting rid of the folded form that nobody really understands, will be a good move for YAML, whose detractors think it is too complicated.

Proposal 2: Using :: as block mode declaration, with modifiers commands

Instead of using | to indicate newline preserved block scalar and > for folded block scalar, lets use :: to indicate block mode, then modifiers to change it's behaviour (Default behaviour should be "newline preserved" as that is what most people would expect)


PSUDO-CODE::

    IF `::` THEN // work out which block mode

        //// Let's avoid too much complexity in parsing logic...
        //IF `'` THEN explicitly folded block mode (with automatic indent level detection)
        //IF `"` THEN explicitly newline preserved (with automatic indent level detection)

        IF ( `"` OR (`\n` then ASCII) )  THEN // detects newline-preserved1
            implied/explicitly newline preserved block mode ( indent level autodetected ) 

        IF ( `\n` then `"`) THEN // detects newline-preserved2
            explicitly newline preserved block mode ( indent level specified ) 

        IF ( `'` ) THEN // detects folded-block1 
            implied folded block mode ( indent level autodetected ) 

        IF ( `\n` then `'`) THEN // detects folded-block2
            explicitly folded block mode ( indent level specified ) 

        // More experimental proposal 
        IF ( Not (space or `\n` or number) right after `::` (This is your "boundary string") ) THEN // detects NEWLINEPRESERVED-experimental1
            read the word after `::` (e.g. `::frontier` would yeild boundary=frontier ) into boundary variable
            // This functions similar to https://en.wikipedia.org/wiki/MIME#Multipart_messages boundary=frontier 
            explicitly newline preserved, but keep reading at any indent level 
            Ignore the first line if its matches '::<var boundary>' (It's optional for setting indent level)
            (even if it's below parent indent level e.g. indent level 0 )
            Keep reading in until a matching '::<var boundary>' number of characters (or more) in it's own line is detected at the right indent level,
            Or end of document
            ( For practicality, it)

        IF ( NUMBER ) THEN // detects NEWLINEPRESERVED-experimental2
            read in a specific number of lines as specified by NUMBER
            good for immutable records. Has speed advantage over the more flexible option above.

    ELSEIF `:` THEN
        Might be something else! Keep parsing

NEWLINEPRESERVED:

    newline-preserved1::
        This is the default behaviour
        where it will save all newlines

    newline-preserved1-alt::"
        Same behaviour
        as the above

    newline-preserved2::
        "
            This allows for beginning spaces
        to be preserved

FOLDEDBLOCK:

    folded-block1::'
        This is a folded block
        might as well treat it like this
        as it is easier to deal with

    folded-block2::
        '
        This is also a folded block,
        fortunately, since newline is ignored
        the same parsing logic will work for both 
        folded-block1 and folded-block2

NEWLINEPRESERVED-experimental1:
    data:text/html::______________________________________________________________
    <html>
    This is for preserving newlines, where the source is all the way at the bottom
    this make it easier to copy paste codes.

    Also nicer for QR codes too.

    </html>
    ::____________________________________________________________________________

    data:text/html::FRONTIER
    ::FRONTIER START
    <html>
    This is for preserving newlines, where the source is all the way at the bottom
    this make it easier to copy paste codes.

    Also nicer for QR codes too.

    </html>
    ::FRONTIER END

    data:text/html::FRONTIER
::FRONTIER START
<html>
This is for preserving newlines, where the source is all the way at the bottom
this make it easier to copy paste codes.

Also nicer for QR codes too.

</html>
::FRONTIER END

    otherData: 42

NEWLINEPRESERVED-experimental2:
    data:text/html::4
    <html>
    This is for preserving newlines, where the source is all the way at the bottom
    this make it easier to copy paste codes.
    </html>

    otherData: 42
    otherDat2: lol

NEWLINEPRESERVED-experimental3:
    data:text/html:::
    <html>
    This is for preserving newlines, where the source is all the way at the bottom
    this make it easier to copy paste codes.
    </html>

    //END OF DOCUMENT SIGNAL HERE

Comments

At first read, your conclusion seems like a nice one. But then that trailing quote strikes me as out of place relative the the first, and I start to wonder did we really gain anything in this trade. I also want to ask how this picture might change if YAML adopts the literal format as the default format.