Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVG2 path data coordinates are currently spec'd as integers. #335

Closed
fuchsia opened this issue Jul 21, 2017 · 31 comments
Closed

SVG2 path data coordinates are currently spec'd as integers. #335

fuchsia opened this issue Jul 21, 2017 · 31 comments

Comments

@fuchsia
Copy link

fuchsia commented Jul 21, 2017

I'm sure you're aware, but just in case you're not: the current version of Path Data Grammar in SVG 2 specifies coordinates as signed integers. Support for decimals and exponents seems to have been lost in the haze of rewrites.

@fuchsia
Copy link
Author

fuchsia commented Jul 24, 2017

If it helps, I think the SVG 1.1 grammar can be written as:

sign       ::= "+" | "-"
digits     ::= [0-9]+
unsigned   ::= digits
integer    ::= sign? unsigned
exponent   ::= [Ee] integer
fraction   ::= ( digits ( "." digits? )? ) | ( "." digits )
float      ::= sign? fraction exponent?
coordinate ::= float

But there are many other ways to do it.

IMPORTANT, the first two arguments for eliptical_arc_argument/elliptical_arc_closing_argument should be unsigned and the third integer:

elliptical_arc_argument::= 
    unsigned comma_wsp? unsigned comma_wsp? integer comma_wsp 
    flag comma_wsp? flag comma_wsp? coordinate_pair

elliptical_arc_closing_argument::= 
    unsigned comma_wsp? unsigned comma_wsp? integer comma_wsp 
    flag comma_wsp? flag comma_wsp? closepath

(Again, I'm going by the 1.1 grammar.)

Also, you need to decide whether the argument to bearing_argument_sequence is unsigned or integer. (I've deliberately not used number as a symbol so we can spot similar errors.)

The presentation of fraction is slightly different to that in SVG 1.1, but matches the one I use for character-by-character parsers and as a regex (/[-+]?(?:\d+(?:\.\d*)?|\.\d+)(?:[Ee][+-]?\d+)?/).

The above syntax makes floats like 4. legal. That matches javascript but differs from CSS, where I think 4. is an invalid number.

@AmeliaBR
Copy link
Contributor

Thanks for that @fuchsia .

For the bearing command and for the third argument in the Arc command, any number (signed, with decimal) should be valid. Negative angles refer to counter-clockwise rotation in SVG, and there is no logical reason to restrict it to integer degrees.

The relevant SVG 1.1 syntax was as follows:

elliptical-arc-argument:
    nonnegative-number comma-wsp? nonnegative-number comma-wsp? 
        number comma-wsp flag comma-wsp? flag comma-wsp? coordinate-pair

where 

nonnegative-number:
    integer-constant
    | floating-point-constant
number:
    sign? integer-constant
    | sign? floating-point-constant

See #331 for discussion about numbers with trailing decimals, like 4. (both in paths and elsewhere in SVG/CSS).

@fuchsia
Copy link
Author

fuchsia commented Jul 25, 2017

Okay, unless I've screwed something else up, the grammar should be:

elliptical_arc_argument::=
    unsigned_float comma_wsp? unsigned_float comma_wsp? float comma_wsp
    flag comma_wsp? flag comma_wsp? coordinate_pair

elliptical_arc_closing_argument::=
    unsigned-float comma_wsp? unsigned-float comma_wsp? float comma_wsp
    flag comma_wsp? flag comma_wsp? closepath

bearing_argument_sequence::=
    bearing_argument | (bearing_argument comma_wsp? bearing_argument_sequence)

bearing_argument ::= float

sign           ::= "+" | "-"
digits         ::= [0-9]+
exponent       ::= ( "E" | "e" ) sign? digits
fraction       ::= ( digits ( "." digits? )? ) | ( "." digits )
unsigned_float ::= fraction exponent?
float          ::= sign unsigned-float
coordinate     ::= signed-float

I've reverted to "E" | "e" in the exponent for consistency with the rest of the spec.

I've created bearing_argument for semantic reasons. (Separate issue: aren't multiple bearing arguments redundant? Whether it;s relative or absolute, the bearing will always take on the value of the last bearing_argument.)

You can rename float and unsigned_float as number and nonnegative_number if you like. I'm not going be precious about any of this; I just can't believe nobody has fixed this in all this time.

If SVG numbers are brought into line with CSS numbers, then fraction changes to:

fraction       ::= ( digits ( "." digits )? ) | ( "." digits )

i.e. drop the first ?.

I'm not really a grammar geek *deletes rant about the grammar using right-recursion over the Kleene star* so it's possible this will produce a horrible parse tree or has transgressed LALR(1) in some way. But it's better than what's there.

@AmeliaBR
Copy link
Contributor

Looks good at a glance, but it really needs more than a glance.

Pinging @tshinnic, who has also been a bit of a path-grammar geek who can't believe nobody has fixed this in all this time.

PS. We're still waiting to hear whether or not the SVG Working Group will be revived, with spec editors who actually have clear responsibility & maybe paid time to work on this. As it is, nothing's happened because no one has any authority to do anything.

@fuchsia
Copy link
Author

fuchsia commented Jul 26, 2017

A paid editor would be great! But if nothing keeps happening, maybe there's enough volunteers to fork it and document the de facto standard.

Thanks for your help, anyway.

@tshinnic
Copy link

I have been somewhat overambitious in wanting to "wrap up" the several grammar issues into one nice package with a bow on top, to present to all and sundry. But I kept finding little nits requiring sanity evaluation: mine vs. draft. e.g. number formats #331
.

The draft grammar has accumulated quite a number of changes over time, trying to respond to multiple ideas as they occurred.
.

The first major reason to update the draft grammar was to move away from the SVG 1.1 ABNF format in order to agreeably share the common EBNF format seen with CSS and others.

Then there were a few bugs to be worked out in smaller changes. As this issue and others have appeared since, we have seen there are more bugs to be fixed.

The last major change to the draft grammar was to address the work item called "segment-completing closepath", which fixes a lack not recognized since SVG 1.0 days.
.

Over this series of changes errors have crept in, sometimes merely because they were "thoughts in progress", like the wish to agree with the CSS number formats (which I only now understand is warranted #331) but which resulted in the broken number format mentioned in this issue. And the other issue mentioned above (see also #324) Unfortunately the aggregate result is a grammar that will not parse some currently legal constructs.
.

What I thought to do was introduce a PR in three parts or sets of interrelated changes. Organizing into three parts best fits the three areas mentioned above.

Re-deriving an EBNF grammar from the SVG 1.1 grammar would create a "clean slate" that would make the following issue-specific changes much clearer.

Then addressing the individual bugs in the SVG grammar with small changes would allow problem description, change discussion, and acceptance of these small, separate issues to proceed smoothly.

And then applying the rather large change that supports the addition of "segment-completing closepath" would be done.
.

I wonder how best to proceed? Should there be one PR with a large change (EBNF), then a number of small changes (bugs), then another large change (closepath)? Or should there be three PRs in the same order, or many?

Two things are clear to me. Amending the grammar from its current state would be very confusing, and would not help at all the "why this grammar rule?" questions of the future. Second, the 'closepath'
work item is recent and somewhat unclear, the change is also massive, and that grammar parsing is 'fiddly'.

Finally, Amelia's comment above shows that tracking all these grammar specific issues is 'dispersed'. Should there be a separate issue collecting all these together as work items or would a new Github label/tag be better?
.

.
More bits and pieces found while writing comment:

@AmeliaBR
Copy link
Contributor

I wonder how best to proceed? Should there be one PR with a large change (EBNF), then a number of small changes (bugs), then another large change (closepath)? Or should there be three PRs in the same order, or many?

Given the state of SVG standardization & software implementer commitments, it would probably be very helpful if we can develop two versions of the grammar:

  • One that accurately describes the SVG 1.1 grammar in the new syntax, fixing things like the unintentional removal of floating-point numbers and scientific notation
  • One that includes the new features (bearing command and segment-completing closepath). Ideally, have those two features as separate commits, so that one or the other could be cherry-picked if we get implementer support for one but not the other.

Similarly, for any other syntax fixes/changes relative to SVG 1.1, it would be nice to have a clear commit documenting changes relative to the SVG 1.1-equivalent grammar. But it would also be useful to have tests of what the current software support is like, to demonstrate whether this is really a "new feature" or a "make the spec reflect reality" situation.

But if nothing keeps happening, maybe there's enough volunteers to fork it and document the de facto standard.

I've already told Tom, but I'll repeat here: for things that are obvious errors (like floats vs integers, unintentionally changing the grammar relative to SVG 1.1), we can probably at least merge them in to the Editor's Draft, which is built from the master branch of this repo. However, we'd need to keep careful note of what changes are being made and why, so that eventually it can be merged back in to the official W3C spec process.

To eventually merge it back into the W3C process, we'd also need to have licensing agreements from anyone contributing content that would become part of the spec. See the Contributing guidelines for this repo, and the W3C FAQ.

@fuchsia
Copy link
Author

fuchsia commented Jul 31, 2017

Step 1: the SVG 1.1 grammer in EBNF

svg-path           ::= wsp* command*
command            ::= moveto-command wsp* (drawto-command wsp*)*

moveto-command     ::= moveto
drawto-command     ::= closepath
                       | lineto
                       | horizontal-lineto
                       | vertical-lineto
                       | curveto
                       | smooth-curveto
                       | quadratic-bezier-curveto
                       | smooth-quadratic-bezier-curveto
                       | elliptical-arc

closepath          ::= [Zz]

moveto             ::= [Mm] wsp* moveto-argument (comma-wsp? lineto-argument)*
moveto-argument    ::= coordinate-pair

lineto             ::= [Ll] wsp* lineto-argument (comma-wsp? lineto-argument)*
lineto-argument    ::= coordinate-pair

horizontal-lineto  ::= [Hh] wsp* horizontal-lineto-argument
                       (comma-wsp? horizontal-lineto-argument)*
horizontal-lineto-argument ::=
                       coordinate

vertical-lineto    ::= [Vv] wsp* vertical-lineto-argument
                       (comma-wsp? vertical-lineto-argument)*
vertical-lineto-argument ::=
                       coordinate

curveto            ::= [Cc] wsp* curveto-argument (comma-wsp? curveto-argument)*
curveto-argument   ::= coordinate-pair comma-wsp? coordinate-pair comma-wsp? coordinate-pair

smooth-curveto     ::= [Ss] wsp* smooth-curveto-argument
                       (comma-wsp? smooth-curveto-argument)*
smooth-curveto-argument ::=
                       coordinate-pair comma-wsp? coordinate-pair

quadratic-bezier-curveto ::=
                       [Qq] wsp* quadratic-bezier-curveto-argument
                       (comma-wsp? quadratic-bezier-curveto-argument)*
quadratic-bezier-curveto-argument ::=
                       coordinate-pair comma-wsp? coordinate-pair

smooth-quadratic-bezier-curveto ::=
                       [Tt] wsp* smooth-quadratic-bezier-curveto-argument
                       (comma-wsp? smooth-quadratic-bezier-curveto-argument)*
smooth-quadratic-bezier-curveto-argument ::=
                       coordinate-pair

elliptical-arc     ::= [Aa] wsp* elliptical-arc-argument
                       (comma-wsp? elliptical-arc-argument)*
elliptical-arc-argument ::=
                       nonnegative-number comma-wsp? nonnegative-number comma-wsp?
                       number comma-wsp flag comma-wsp? flag comma-wsp? coordinate-pair

coordinate-pair    ::= coordinate comma-wsp? coordinate
flag               ::= [01]
comma-wsp          ::= wsp+ delimiter? | delimiter
delimiter          ::= "," wsp*
coordinate         ::= number
number             ::= sign? nonnegative-number
nonnegative-number ::= fraction exponent?
fraction           ::= digits ( dot digits? )? | dot digits 
exponent           ::= [Ee] sign? digits
sign               ::= "+" | "-"
digits             ::= [0-9]+
dot                ::= "."
wsp                ::= [#x9#xA#xD#x20]

I've tried my hardest to be consistent. (And, trust me, that comes very unnaturally to me; if there are two ways of doing something, I'll manage it in three.)

Changes I've made:

  • Every command now has an xxx-argument which contains the argument for that single command. That wasn't the case.
  • Recursion has been eliminated. I remain conflicted about this because recursion is an easy idiom once you're used to it. But unless people are chucking this into Bison, I can't see its needed. And it's made the grammar simpler for non-experts, I think.
  • I've made use of ranges ([Mm]) for the commands, rather than a quoted pair, except for sign - because the EBNF spec doesn't make clear whether [+-] is legal. This is a stylistic choice that makes the commands shorter, but I'm not wed to it.
  • I've kept the three separate number types: number, nonnegative-number and flag. But the existence of flag causes all sorts of havoc in the grammar. All those * and ? quantifiers after wsp and comma-wsp are the result of flag. And, as I follow it, this is a legal description of an elliptical arc M1 1A1 1 1 111 1. (Space is required before the first flag, but the second flag and the next coordinate don't need any so 111 is three parameters each with a value 1.) I haven't tested any browsers and parsers to see if they get it right.

I've turned this EBNF into a recursive descent parser on jsfiddle. It's green for valid paths and red for invalid ones.

PS why is it wrong to define a comma as one of the space characters and allow them anywhere? (Something deep inside me objects. But I don't know why.)

@tshinnic
Copy link

tshinnic commented Aug 1, 2017

flag really does need to be separate and restricted to values 0 and 1. Please see #324 for notes about both flags and the elliptical arc nonnegative-number thing.

From the W3C tests (old) I got the following test strings. These should pass:

    //<path d="M120,120 h25 a25,25 0 10 -25,25z"     fill="lime"/>
    //<path d="M200,120 h-25 a25,25 0 1125,25 z"     fill="lime"/>
    //<path d="M120,200 h25 a25,25 0 1 1-25,-25 z"   fill="lime"/>

And these should fail:

    //<path d="M280,120 h25 a25,25 0 6 0 -25,25 z"   fill="red"/>
    //<path d="M280,200 h25 a25 25 0 1 7 -25 -25 z"  fill="red"/>
    //<path d="M360,120 h-25 a25,25 0 1 -1 25,25 z"  fill="red"/>

And they really do pass and fail in the major browsers.

I know the need for the nonnegative-number rule goes away as the browsers do understand to use the
absolute value of rx/ry.

As for the comma thing, it seems to me that the SVG 1.0 specifiers were fixated on using the comma only as a meaningful separator for numeric operands. Thus not a generalized separator. (This for clarity?) Similarly they were fixated on allowing for minimal path string sizes, and so we inherit the weirdness for flags, as those rules allow for spaces to be dropped.

I need to gather my notes... And look at your work above, compared to my unambitious straight translate to EBNF form.

And one of the subprojects I don't know I can get to, is to collect together the multiple sources/versions of tests for path strings. The W3C tests are old and miss cases. Chrome/FireFox have tests but they're held in those projects somewhere. Good and complete tests would have prevented some of the confusion in the 1.5 decades since. Oh, and give the dozens of implementations of path string parsers something real to test against. (sigh)

@fuchsia
Copy link
Author

fuchsia commented Aug 4, 2017

Well my validator accepted your three lime paths and rejected your red ones.

I then grabbed all 120 paths from the w3c list (thanks for pointing me at that) and all are accepted, except two occurrences of "M 20 100 H 40#90" in paths-data-18-f.svg and these five from paths-data-20-f.svg:

M280,120 h25 a25,25 0 6 0 -25,25 z
M360,120 h-25 a25,25 0 1 -1 25,25 z
M200,200 h-25 a25,2501 025,-25 z
M280,200 h25 a25 25 0 1 7 -25 -25 z
M360,200 h-25 a25,25 0 -1 0 25,-25 z

which are your three red paths, plus two more. (I also found your lime paths in paths-data-20-f.svg.)

I haven't looked to see if I'm passing something that should fail, but all those failures look legit.

Things that aren't tested include: decimals, any form of exponent, leading plus signs and negative arc radii. (I get that negative arc radii are permitted in practice. The lack of tests is probably why. And our aim, for the time being, is to recreate what the original authors intended.)

I would have wanted to see more tests like this:

M
M0
M0,0
M0,0,0
M0,0,0,0
L0,0
M0.0.0
M0.0.0.
M,0,0 

The particular problem with flag is it requires context sensitive tokenisation. Encounter a digit and you don't know whether it's the start of a number or the start of a flag unless you know where you are in the grammar. I would have wanted to fix that, but clearly it can't be altered.

It's not an issue of character budget, either. Or, at least, it could be fixed without making pathdata longer. But, in defence of the original authors, they weren't the ones who deleted the API that we use to parse paths.

The comment about commas was an aside about human psychology. I have exactly the same flaw: I cling to the need for commas even as I see they are pointless. Compare "M1,1,1,0" and "M,,0,0" -- something is "missing" in the latter.

@tshinnic
Copy link

tshinnic commented Aug 4, 2017

I definitely agree on the basic tests. There was one goof reported against a parser - as a security/spoofing problem(!) - where an example 10000 char path string was submitted with no explanation. One was supposed to just eyeball the glitch present somewhere in the string. Depending on the font the problem might not leap out at you.

...... M 133.00,127.0O C 133.00,127.00 149.53,127.00 149.53,127.00 ......

As a treat the reporter had the following path commands display something quite rude. 😄

The report was suggesting that silently truncating SVG path strings at the first error could allow nefarious data to be introduced unnoticed. That parser was following the rules in 9.5.1. Error handling in path data. Whether that program being too coy about a detected error was wrong I couldn't say.

Implicit in the rules is how something like 12.34.56 should be handled. It should come out as 12.34 then .56. So your example of M0.0.0. should fail but M0.0.0 would pass. Just playing with my own number format tests I had 60+ tests. Translating all those into visual form ala the W3C tests would be difficult. Making parsing failures spell rude words would be extra credit.

(Still working on notes - while the interest here was in moving from ABNF to EBNF syntax "because CSS" (I thought), it is interesting that the former CSS specs never used a 'BNF' form, but only ref'd Yacc and Flex syntaxes, and now CSS 2.2 uses only prose and railroad diagrams. Maybe the future is "steam punk SVG grammar"?)

@tshinnic
Copy link

@fuchsia: Thank you. I played with your JSFiddle and then copied the parser into an ad-hoc test program that runs over a (non-representative) set of 400000 SVG 1.x paths. Your parser produces the same true/false results as one I did. Impressive. Did you derive the parser code ad-hoc or via a parser generator? (Asking because wanting to play with the nonnegative-number should be just plain number thing)

I'm going to try to find test for paths from Firefox/Chrome groups. I really want to be able to say "yes, this is backward compatible."

@tshinnic
Copy link

I would have wanted to see more tests like this:
M,0,0

You might be amused by this bug report and the helpful explanation at the comment.

"M0,0, 100,100"

vs.

"M0,0, L100,100"

Of course, Mozilla ended up allowing the invalid path anyway by adding a 'fix'. (sigh)

(Going through the Mozilla test cases...)

@longsonr
Copy link

@tshinnic your analysis of what I did is incorrect. I did not make Mozilla support invalid paths. Firefox rejects M0,0, L100,100 and allows M0,0, 100,100 per the SVG 1.1 BNF. See https://bugzilla.mozilla.org/show_bug.cgi?id=938569#c8 for more details.

@tshinnic
Copy link

@longsonr As I look now, I indeed misread which comma you were enabling. Not

    "M0,0,L100,100"
         ^^

which everyone agrees isn't in the grammar, but rather you re-enabled

    "M0,0,100,100"
         ^

where numbers for the implicit lineto commands follow directly after the moveto command. Commas between numbers are legal, and only legal between numbers.

And you added a test, /layout/reftests/svg/path-08.svg for this, so that a similar regression would not be unnoticed in the future.

I have to think I misread the fix partly because the fault was so unlikely. But the most amazing part of all this is that this test did not exist originally.

With a little looking around I can see the test was copied into the Webkit project. That is certainly a good thing. But I'm not seeing the same or similar test under the Blink project or the web-platform-tests or Github mirror. I can't guess after IE/Edge. Is it really true those projects don't have a test for this after 16 years? Is this area really so disorganized?

The meagerness of tests, the lack of rigor, for testing basic SVG things is astonishing to me. When, how do we make up for the oversight back then?

@fsoder
Copy link

fsoder commented Aug 16, 2017

When, how do we make up for the oversight back then?

Starting now would be perfectly fine - tests can be submitted to the WPT repo (it's not actually a mirror you link above - it's the source of truth - PRs welcome.)

FWIW, SVG has had a bit of a rough upbringing (not to say that other similar era, and complexity, specs had it easier.) I wasn't around in 2001 when 1ed SVG 1.1 was released, but I'm assuming the testsuite wasn't really at the top of the list of priorities. Even then it was probably one of the larger ones for a spec in those days, clocking in at 300-400 tests or so I think... For spec development beyond 1ed it was probably as easy to get WG members to write tests as it was to get them to edit the actual spec. (2ed and Tiny 1.2 had a few more tests, but not orders of magnitude more.)

From an implementer's perspective it usually comes down to resources (people and their time) - you may not have the luxury of spending all the time you want analyzing every nook and cranny of a particular section of the specification, because you need to implement another even more obscure and less well-defined part than what you're currently working on. So occasionally an edge error-case is missed in test coverage (more often than not I'd say even.) So if this particular case stuck out like a sore thumb to you, that's great! That doesn't mean that everyone else would think of it right away - especially those tasked with implementing a 300+-page specification on the clock. I take it you'd be equally astounded to hear about the number of browser tests that come to be via bugs (like in the Gecko bug referenced above.) Also, for a long time this whole "sharing tests" thingamajib (WPT) was really unheard of, which probably ended up hurting interop immensely (for SVG and everything), but things were different back then in some regards.
For path parsing tests in Presto (which I used to work on) I think we used our unit-test framework, and Blink does that too although there's likely some stray path data test in LayoutTests/.

So to loop back to the question at hand: WPT is currently best way of fixing this. Submit tests (the more the merrier.)

@fuchsia
Copy link
Author

fuchsia commented Aug 18, 2017

@tshinnic Yeah, it took a while to see the 'O'.

Actually, railroad diagrams aren't normative for CSS. (See CSS Syntax S4.1) It's the step-by-step algorithms that are. I'd be happy to follow either path. (Railroad diagrams are isomorphic to an EBNF which has had character ranges and the + quantifier removed. And SVG 1.1 already has additional algorithmic constraints to handle the "0.0.0" case, and will need more for 2.0 ) But, for the moment, as always, we're trying to fix what's in the extant spec so we stick with what's there. *sigh*

@fsoder, Yeah, my production code is equally undertested. Same reasons. More so, because I've been spending time on this. ;)

I need to create a separate issue for this, but while you're here, how does Firefox handle large numbers? Chrome appears to use single precision with weirdness. It will draw "M0 0 1E37 0 0 300 Z" (fiddle), it doesn't draw or issue an error for "M0 0 1E38 0 0 300 Z" (fiddle) and it throws on "M0 0 1E39 0 0 300 Z" (fiddle) with the unhelpful error Expected number, "M0 0 1E39 0 0 300 Z". It will draw up to the error, as per error handling rules.

But I've just looked at those in Firefox and I don't see errors or drawing. It would be nice to get this standardised.

All One idea I had to simplify testing was to propose a read only property that returned the rendered path, possibly in a normal form. Then you could do path.setAttribute("d","M0,0,L0,0") and check for an exception and that the rendered path was "M 0 0". Blessing a normal form would start the decades long path to doing away with crazy syntax. It would also give webdevs something easy to parse.

Correction

Previously, the 1.1 grammar had:

comma-wsp ::= (wsp+ (comma wsp*)) | (comma? wsp*)
comma ::= ","

So comma-wsp was entirely optional, even though it's supposed to be mandatory. It's now:

comma-wsp    ::= wsp+ delimiter? | delimiter
delimiter    ::= "," wsp*

I can't actually see a way to exploit this, because, if it was omitted in the one place's its mandatory (before the first flag on the elliptical arc), your flag would be consumed as part of the preceding number. That's how crazy this grammar is.

I've also removed some excess brackets (having finally found the precedence rules for EBNF 😳). I have the 2.0 grammar working, but that will have to wait.

@fuchsia
Copy link
Author

fuchsia commented Aug 28, 2017

Step 2: new SVG 2 grammar

svg_path                        ::= wsp* moveto wsp* (command wsp*)*

command                         ::= closepath
                                    | moveto
                                    | lineto
                                    | horizontal_lineto
                                    | vertical_lineto
                                    | curveto
                                    | smooth_curveto
                                    | quadratic_bezier_curveto
                                    | smooth_quadratic_bezier_curveto
                                    | elliptical_arc
                                    | bearing

closepath                       ::= [Zz]
moveto                          ::= [Mm] wsp* moveto_argument (delimiter? lineto_argument)*
lineto                          ::= [Ll] wsp* coordpair_singlet_sequence
horizontal_lineto               ::= [Hh] wsp* number_sequence
vertical_lineto                 ::= [Vv] wsp* number_sequence
curveto                         ::= [Cc] wsp* coordpair_triplet_sequence
smooth_curveto                  ::= [Ss] wsp* coordpair_doublet_sequence
quadratic_bezier_curveto        ::= [Qq] wsp* coordpair_doublet_sequence
smooth_quadratic_bezier_curveto ::= [Tt] wsp* coordpair_singlet_sequence
elliptical_arc                  ::= [Aa] wsp* arc_argument_sequence
bearing                         ::= [Bb] wsp* number_sequence


number_sequence                 ::= number (delimiter? number)*
coordpair_singlet_sequence      ::= coordpair (delimiter? coordpair)*
                                    | closepath
coordpair_doublet_sequence      ::= coordpair_doublet (delimiter? coordpair_doublet)*
                                    (delimiter? incomplete_coordpair_doublet)?
                                    | incomplete_coordpair_doublet
                                    | closepath
coordpair_triplet_sequence      ::= coordpair_triplet (delimiter? coordpair_triplet)*
                                    (delimiter? incomplete_coordpair_triplet)?
                                    | incomplete_coordpair_triplet
                                    | closepath
arc_argument_sequence           ::= arc_argument (delimiter? arc_argument)*
                                    (delimiter? incomplete_arc_argument)?
                                    | incomplete_arc_argument


moveto_argument                 ::= coordpair
lineto_argument                 ::= coordpair
coordpair                       ::= number delimiter? number

coordpair_doublet               ::= coordpair delimiter? coordpair
incomplete_coordpair_doublet    ::= coordpair wsp* closepath

coordpair_triplet               ::= coordpair delimiter? coordpair delimiter? coordpair
incomplete_coordpair_triplet    ::= ( coordpair_doublet | coordpair ) wsp* closepath

arc_argument                    ::= number delimiter? number delimiter? number delimiter
                                    flag delimiter? flag delimiter? coordpair
incomplete_arc_argument         ::= number delimiter? number delimiter? number delimiter
                                    flag delimiter? flag wsp* closepath

delimiter                       ::= wsp+ comma_wsp? | comma_wsp
comma_wsp                       ::= "," wsp*
flag                            ::= [01]
number                          ::= sign? fraction exponent?
fraction                        ::= digits ( dot digits? )? | dot digits
exponent                        ::= [Ee] sign? digits
sign                            ::= "+" | "-"
digits                          ::= [0-9]+
dot                             ::= "."
wsp                             ::= [#x9#xA#xD#x20]
  • I've followed the 2.0 EBNF in using underscores, rather than hyphens, to divide identifiers.

  • I have zero idea what's going on with svg_path in the extant 2.0 grammar, where it's defined as:

    svg_path::= wsp* moveto? (moveto drawto_command*)?
    
    1. I can't understand why it begins with an optional moveto command and then has a mandatory one in the bracket. The optional one seems redundant, so I've eliminated it. But maybe I've missed something.
    2. There's no support for trailing whitespace, so "M0,0 " is invalid.
    3. And the spaces in "M0,0 L1,1" and "M0,0L1,1 T2,2" are illegal as there's no trailing or leading whitespace on the drawto_command,

    I've fixed all that and renamed drawto_command to command as it no longer excludes moveto, as was the case in 1.1.

  • While we're on the subject of whitespace, the existing 2.0 grammar doesn't allow whitespace between most coordinates and the final closepath; e.g. M0,0C0,0 Z is illegal.

    The exception is arcs, where a "closing" command can have spaces and commas before the "Z"; e.g. M0,0 A0,0,0,0,0,Z.

    I've settled on allowing whitespace only before the Z. That seems (to me) most consistent with established practice. That said, it makes the EBNF more complicated.

  • And on the subject of arcs and commas, the existing 2.0 grammar doesn't allow commas before the closing arc comamd. So the comma between the 17 and 21 in M0,0 A11 12 13 1 1 16 17,21 22 23 1 1 26 27 is currently illegal. Obviously, I've fixed that.

  • I've not given each command its own argument type. The existing 2.0 grammar doesn't do it and it was making the grammar convoluted. Moreover, as a dev, I just want to be able to read out of the grammar the number arguments for each command, and this approach facilitates that.

    Exceptions are made for elliptical arcs, because they have a bespoke argument type, and moveto, because I wanted to stress that the first coordinate is a moveto and the rest are lineto.

  • In the same spirit of simplification, coordinate and coordinate_sequence have been eliminated in favour of number and number_sequence.

  • When I started trying to unravel the grammar I found myself confusing coordinate_pair_sequence and coordinate_sequence. The preceding changes may have helped with that. But I've also fused coordinate_pair into coordpair -- making it a single "type" that's modified by its suffixes, and I've introduced the term "singlet" to regularise and further eliminate confusion. (I toyed with calling things coordpair_2-tuple_list So you got off lightly.)

  • I switched comma_wsp for delimiter. I think it's quicker to absorb a single word than a phrase. It also makes it less easy to confuse with wsp.

  • I've shortened elliptical_arc_xxx to arc_xxx, retaining only the top level elliptical_arc to keep us in sync with the text. I did this because these strings were getting tooooo loooonnnnggggg. According to wikipedia, all our curves (when not flat) are arcs, but I imagine people know what we mean.

    (quadratic_bezier_curveto is another name I would love to shorten. When it's a cubic bezier well call it curveto. So why do we stress the "bezierness" of quadratic beziers? Couldn't we just call them quadratic_curveto? But that's out of scope as it's a problem in the main text.)

  • I've included the closepath only where is has to be present to terminate the sequence. So for example, it's not on moveto because MZ is illegal and M0,0Z and M0,0,1,1Z are permitted without special magic. (The Z is just a regular closepath command.) Whereas M0,0LZ has to be specially coded into the grammar otherwise the user could write M0,0 L M1,1 (Aside: there's no reason we couldn't allow this -- defaulting the missing arguments and inferring a closed path when it happens.)

  • And a related insight: we could allow every command to have an optional closepath and do away with it at the top level . I think this would make the grammar simpler but, as it would bury the existance of closepath, I've stuck with the 1.1 precedent.

  • And, on the technical front, I think this grammar is non-ambiguous. (It's very easy to create a grammar where one string can match two different productions.) In particular, the |-separated clauses are genuine disjunctions: if one matches it's certain no other will. (There have been times when that wasn't the case and my parser had to test every branch and pick the longest.)

  • However, the grammar could be written more compactly. For example, coordpair_doublet_sequence could be phrased:

    coordpair_doublet_sequence   ::= ( coordpair_doublet delimiter?)*
                                     ( coordpair_doublet |  incomplete_coorpair_doublet ) 
                                     | closepath
    incomplete_coordpair_doublet ::= coordpair wsp* closepath
    

    I don't think the above is ambiguous. (The regex /(\d*)(\d)/.exec( "1" ) will put "1" in the second capture.) But my parser can't handle it (it assigns all the coordpair_doublets to the first bracket and leaves none for the second) and I'm unwilling to use this form until I've added it to my EBNF parser and had it check there are no ambiguities. However that does leave me guilty of letting implementation decisions shape the grammar, much as was happening with 1.1. So it goes.

@fuchsia
Copy link
Author

fuchsia commented Sep 8, 2017

@AmeliaBR Nobody is objecting, so can we have a go at merging this into the Editors draft? If so, what do I do - fork, hack a text file and make a pull request? (Bear in mind I'm noob at this and at git.)

@svgeesus
Copy link
Contributor

@fuchsia Yes, please do make a PR with the improved grammar, and link to this issue in your commit message. You would edit paths.html.

fuchsia added a commit to fuchsia/svgwg that referenced this issue Sep 16, 2017
@fuchsia
Copy link
Author

fuchsia commented Sep 16, 2017

@svgeesus It's done, pending IPR approval.

I see the the committee has broken out the path spec into a separate standard, and added Catmull Rom curves. (Both of which I approve of, especially the Catmull Rom.) Do you want me to add Catmull Rom to the grammar and do a pull request on that?

@AmeliaBR
Copy link
Contributor

@fuchsia The Catmull-Rom definition in SVG Paths is definitely still preliminary, and problematic (it creates a gap in the path on either end of the spline curve). So I wouldn't spend too much time on it.

That said, if you are able to add it in as currently defined and create a PR updating the rest of the grammar in that spec to match your update to SVG 2, it would make it easier to compare the differences and make corrections in the future.

@AmeliaBR
Copy link
Contributor

PS, you could add that as an additional commit to the same PR #346, instead of making a separate PR. That way, we can merge them both at the same time.

@fuchsia
Copy link
Author

fuchsia commented Sep 22, 2017

Yeah, that's an "interesting" way to define Catmull-Rom curves. Still, I've added them, as is, to the PR. (Not that I would know how to make a separate PR, even if I wanted to.)

The only differences with the above are the extra productions:

catmull_rom                     ::= [Rr] wsp* catmull_rom_argument 
catmull_rom_argument            ::= coordpair_triplet (delimiter? coordpair)* `

And the addition of a catmull_rom clause to command. Hopefully that's correct -- it passes some rudimentary tests.

State dump of my brain:

  • The existing grammar uses hyphens in the name. I've normalised them to underscores.
  • I've not used the _sequence suffix for catmull_rom_argument because it's a single, variable-length argument, rather than a repeated series of fixed-length arguments. (And, it's shorter.)
  • The existing grammar doesn't allow whitespace or commas between coordinate pairs, so the spaces in "M0,0R1,1 2,1 3,0" are illegal as are the commas in "M0 0 R1 1,2 1,3 0". (You can actually write a curve, but only with minus signs: "M0,0 R1 1-2 1-3 0") Obviously, the above fixes that.
  • I've used coordpair_triplet because, at heart, the argument is three points that can be extended. Also, it's shorter.
  • I've not provided a closepath option, although, as currently spec'd, two points and a Z would make sense and need explicit grammar. Once you have three points you can rely on the normal closepath and handle it semantically. (If we did add an optional closepath to every catmull_rom_argument, we'd be relying on the note about greedy parsing to resolve ambiguity. For example, without greedy parsing "M0,0 R1,2 2,4, 3,3Z" could be resolved as moveto catmull_rom closepath or as moveto catmull_rom. Although, even there, if both had the same semantics, it would only be an issue for purists )

@fuchsia
Copy link
Author

fuchsia commented Apr 21, 2018

I see #385 has removed the bearing command and the magic Z that fills in missing coordinates. Do you want me to remove them from the patch?

@AmeliaBR
Copy link
Contributor

@fuchsia If you're able to do that, it would be great. (Leave them in the edits to the SVG Path module, specs/paths/master/Overview.html). I'd promised folks that I would make those edits & finally review & merge your PR when I next had a chance to devote extra time to SVG work, but I'm not sure when that will be.

@fuchsia
Copy link
Author

fuchsia commented Apr 30, 2018

Okay, I've removed the bearing commands and the magic Z from master/paths.html and have run it through my thimbleful of tests. (*cough* There were none for the bearing commands. *cough*) I think the only diff between master/paths.html and SVG 1.1 is that the former allows negative arc radii.

As an aside, if I've followed the WG's bread crumb trail correctly, then your(plural) plan seems to be to allow CSS style path() functions inside the d attribute; i.e.

<svg width="100" height="100"> 
<!-- previously: -->
<path d="M100 0,L0 100"/> 

<!-- for now: --> 
<path d="path('M0 0, L100 100')"/> 

<!-- eventually: --> 
<path d="path(M50 0, L50 100)"/> 
</svg> 

If that's the case, then FWIW I think it would be a really good idea. It may seem ugly as sin but p is not a path command, so there's no whiff of ambiguity, and the quote-free form can use CSS's sophisticated specification "machinery" while ditching the legacy missteps. I kicked around other ways of doing it at the end #331 but my ideas were as crazy as bat faeces. Sticking path() into d, freezing the string pathdata in its 1.1 form, and adding new features exclusively to the CSS seems the least worst way of escaping history (the light bulb that hovers over my head lit up as soon as I saw the idea).

@boggydigital boggydigital added this to the SVG 2.1 Working Draft milestone Jun 11, 2018
@boggydigital
Copy link
Contributor

Not blocking updated 2.0 CR publication - assigning 2.1 WD milestone

@dirkschulze
Copy link
Contributor

@ericwilligers I think you are still working on optimising the path syntax description. Assigning the issue to you for review.

dirkschulze added a commit that referenced this issue Jun 20, 2019
Align path number parsing with CSS Numbers. Fixes #331 #335
@dirkschulze
Copy link
Contributor

Fixed by #697

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants