Tokens: Implement name rules #31

Hywan · 2016-07-13T22:08:53Z

Address #10.
Must be merged after #24.

Specification

https://github.com/php/php-langspec/blob/master/spec/19-grammar.md#names

Progression

It replaces the `Identifier` enumerator.

The `name` rule is mostly a renaming from `identifier`; it removes the `Identifier` computation and returns a simple `&[u8]`.

Hywan · 2016-07-13T22:13:59Z

#24 has been merged, so I have rebased the commits on master.

First, UTF-8 validation is not required. Second, the parser works with `&[u8]` so it eases the comparison with existing tokens.

These new variants (`Unqualified`, `Qualified`, `RelativeQualified`, and `FullyQualified`) better reflects the actual semantics of the code.

This new rule replaces the `namespace` rule.

`and_not!(I -> IResult<I, O>, I -> IResult<I, P>) => I -> IResult<I, 0>` returns the result of the first parser if the second fails. Both parsers run on the same input. This is handy when the first parser accepts general values and the second parser denies a particular subset of values.

The following input is valid: `namespace\Foo\Bar`, while this one is not: `Foo\namespace\Bar`. `namespace` is a valid name but the `qualified_name` rule must be stricter.

Hywan · 2016-07-15T09:27:02Z

@jubianchi: Ready for a review!

Hywan · 2016-07-15T09:27:41Z

source/ast.rs

+    Qualified(Vec<&'a [u8]>),
+    /// A relative qualified name, i.e. a name in a relative namespace
+    /// restricted to the current namespace, like `namespace\Foo\Bar`.
+    RelativeQualified(Vec<&'a [u8]>),


I am not very confident with this name. I wonder if RestrictedQualified would not be better.
cc @nikic

Comment has been closed by Github. I am reopening it.

Applies the parser 0 or more times and skips the consumed data, nothing is returned. The embedded parser may return `nom::IResult::Incomplete`. This is heavily inspired by the original nom `many0` macro.

Hywan · 2016-07-20T20:43:30Z

Exact about token_get_all. I have a plan for that, but this will come later. This is taken into account. Thanks for the reminder!

This macro is applying the `skip` rule before the first argument; it allows to skip tokens.

The following qualified name is valid: ```php Foo /* baz*/ \ Bar ``` It is equivalent to: ```php Foo\Bar ``` By using the `first` macro, skip tokens can be supported.

The `first!(parser)` syntax was not tested. This patch extends existing test cases to test this particular syntax.

nikic · 2016-07-27T14:37:22Z

source/tokens.rs

    "The `YIELD` token.\n\nRepresent the generator operator, e.g. `yield …;`."
 );
 token!(
-    pub YIELD_FROM: "yield from";
+    pub YIELD_FROM: b"yield from";


Unrelated to this PR, but the whitespace between yield and from isn't necessarily a single character.

Ah. Thanks for this!

Is it all whitespaces, comments etc. (skip tokens) or just regular space?

Spec says

Note carefully that yield from is a single token that contains whitespace. However, comments are not permitted in that whitespace.

So any whitespace is fine, but no comments.

nikic · 2016-07-27T14:42:33Z

source/tokens.rs

+        )
+    }
+
+    test_keyword!(case_keyword_abstract:     (b"abstract", super::ABSTRACT));


Maybe also add some tests to make sure variations with different case work? ABSTRACT, AbStRaCt :)

A qualified name is composed of `name` rule. However, some values must be excluded from the `name` rule. Previously, only the `namespace` token was excluded but this was wrong: All the keywords must be excluded.

Hywan · 2016-07-27T14:55:47Z

Damn, the https://github.com/php/php-langspec/blob/master/spec/19-grammar.md#names Section does not cover all the constraints. Actually https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md provides much more details. I will review my PR with this Section in mind.

This macro declares a case-insensitive ASCII array as a suite to recognize. It is pretty similar to the nom `tag!` macro except it is case-insensitive and only accepts ASCII characters so far.

This patch uses the new `itag` macro. Associated test suite is updated by testing the given result and its uppercased version.

It is an alias to the `itag` macro. The goal of this alias is twofold: 1. It avoids confusion and errors (a PHP keyword is always case-insensitive), 2. It ensures a better readability of parsers.

This patch ensures that case is insensitive for tokens in qualified names.

This is just a semantic change.

`yield from` is a keyword but the specification says: > Note carefully that yield from is a single token that contains > whitespace. However, comments are not permitted in that whitespace.

Hywan added enhancement in progress component-documentation component-test component-grammar labels Jul 13, 2016

Hywan added this to the 0.1.0 milestone Jul 13, 2016

Hywan assigned jubianchi Jul 13, 2016

Hywan added 4 commits July 14, 2016 00:13

Lexical: Move identifier to tokens.

bf06686

AST: Add the Name enumerator.

da6449f

It replaces the `Identifier` enumerator.

Tokens: Add the variable and name rules.

49bf6c4

The `name` rule is mostly a renaming from `identifier`; it removes the `Identifier` computation and returns a simple `&[u8]`.

Tokens: Add the namespace rule.

012b4b0

Hywan force-pushed the lexical_tokens_names branch from f79f472 to 012b4b0 Compare July 13, 2016 22:13

Hywan added 4 commits July 14, 2016 16:30

Tokens: Move &'static str to &'static [u8].

083dab4

First, UTF-8 validation is not required. Second, the parser works with `&[u8]` so it eases the comparison with existing tokens.

AST: New variants for Name.

3a9bb6a

These new variants (`Unqualified`, `Qualified`, `RelativeQualified`, and `FullyQualified`) better reflects the actual semantics of the code.

Tokens: The name rule is now public.

4208d9d

Tokens: Add the qualified_name rule.

c36470e

This new rule replaces the `namespace` rule.

Hywan force-pushed the lexical_tokens_names branch from efd5dbe to c36470e Compare July 14, 2016 14:39

Hywan added 4 commits July 14, 2016 19:29

Quality: Fix CS.

00d08b3

Tokens: qualified_name is stricter.

5809961

The following input is valid: `namespace\Foo\Bar`, while this one is not: `Foo\namespace\Bar`. `namespace` is a valid name but the `qualified_name` rule must be stricter.

Literal: Simplify the string rule.

9048b23

Hywan reviewed Jul 15, 2016
View reviewed changes

Hywan added 2 commits July 15, 2016 11:52

Macros: Unignore the and_not documentation test.

c23786d

Doctest: Add doctests for all Name variants.

cd55886

Hywan added 2 commits July 20, 2016 13:38

Macros: Add the skip macro.

69b3a17

Applies the parser 0 or more times and skips the consumed data, nothing is returned. The embedded parser may return `nom::IResult::Incomplete`. This is heavily inspired by the original nom `many0` macro.

Merge branch 'master' into lexical_tokens_names

9accd7c

Hywan added 6 commits July 20, 2016 22:55

Macros: Remove skip.

b289db4

Skip: Add the skip rule.

71d00bf

Macros: Add the first macro.

9eb868c

This macro is applying the `skip` rule before the first argument; it allows to skip tokens.

Tokens: qualified_name support skip tokens.

77c0226

The following qualified name is valid: ```php Foo /* baz*/ \ Bar ``` It is equivalent to: ```php Foo\Bar ``` By using the `first` macro, skip tokens can be supported.

Test: Check all first syntaxes.

f19a80c

The `first!(parser)` syntax was not tested. This patch extends existing test cases to test this particular syntax.

Documentation: Add the Examples Section.

b38d2bb

Hywan force-pushed the lexical_tokens_names branch from 4b2cad4 to b38d2bb Compare July 20, 2016 20:55

Hywan mentioned this pull request Jul 20, 2016

isIdentifier has changed since PHP7.1 hoaproject/Consistency#12

Closed

nikic reviewed Jul 27, 2016
View reviewed changes

Tokens: Add the keywords parser.

50cd568

Hywan force-pushed the lexical_tokens_names branch from 6d0978b to 50cd568 Compare July 27, 2016 14:39

nikic reviewed Jul 27, 2016
View reviewed changes

Tokens: Exclude keywords from qualified_name.

cc8b496

A qualified name is composed of `name` rule. However, some values must be excluded from the `name` rule. Previously, only the `namespace` token was excluded but this was wrong: All the keywords must be excluded.

Hywan added 2 commits July 28, 2016 00:03

Macros: Add the itag macro.

dfdd620

This macro declares a case-insensitive ASCII array as a suite to recognize. It is pretty similar to the nom `tag!` macro except it is case-insensitive and only accepts ASCII characters so far.

Tokens: Keywords are case-insensitives.

ab230d0

This patch uses the new `itag` macro. Associated test suite is updated by testing the given result and its uppercased version.

Hywan force-pushed the lexical_tokens_names branch from c5f3231 to ab230d0 Compare July 27, 2016 22:05

Hywan added 7 commits July 28, 2016 00:18

Test: Update documentation of the itag macro.

44c3f51

Macros: Add the keyword macro.

9e68cc2

It is an alias to the `itag` macro. The goal of this alias is twofold: 1. It avoids confusion and errors (a PHP keyword is always case-insensitive), 2. It ensures a better readability of parsers.

Test: Case-insensitivity for tokens in q. names.

7653b2d

This patch ensures that case is insensitive for tokens in qualified names.

Tokens: Use the keyword macro in qualified name.

a71c5b7

Tokens: Use the new keyword macro.

9b0f819

This is just a semantic change.

Tokens: Force to always inline some mappers.

e3dbecf

Tokens: yield from can contain whitespaces.

7fbcbfb

`yield from` is a keyword but the specification says: > Note carefully that yield from is a single token that contains > whitespace. However, comments are not permitted in that whitespace.

Hywan merged commit 7fbcbfb into tagua-vm:master Aug 2, 2016

Hywan removed the in progress label Aug 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokens: Implement name rules #31

Tokens: Implement name rules #31

Hywan commented Jul 13, 2016 •

edited

Hywan commented Jul 13, 2016

Hywan commented Jul 15, 2016

Hywan Jul 15, 2016 •

edited

Hywan Jul 15, 2016

Hywan commented Jul 20, 2016 •

edited

nikic Jul 27, 2016

Hywan Jul 27, 2016

Hywan Jul 27, 2016

nikic Jul 27, 2016

nikic Jul 27, 2016 •

edited

Hywan Jul 27, 2016

Hywan Jul 27, 2016

Hywan commented Jul 27, 2016

Tokens: Implement name rules #31

Tokens: Implement name rules #31

Conversation

Hywan commented Jul 13, 2016 • edited

Specification

Progression

Hywan commented Jul 13, 2016

Hywan commented Jul 15, 2016

Hywan Jul 15, 2016 • edited

Choose a reason for hiding this comment

Hywan Jul 15, 2016

Choose a reason for hiding this comment

Hywan commented Jul 20, 2016 • edited

nikic Jul 27, 2016

Choose a reason for hiding this comment

Hywan Jul 27, 2016

Choose a reason for hiding this comment

Hywan Jul 27, 2016

Choose a reason for hiding this comment

nikic Jul 27, 2016

Choose a reason for hiding this comment

nikic Jul 27, 2016 • edited

Choose a reason for hiding this comment

Hywan Jul 27, 2016

Choose a reason for hiding this comment

Hywan Jul 27, 2016

Choose a reason for hiding this comment

Hywan commented Jul 27, 2016

Hywan commented Jul 13, 2016 •

edited

Hywan Jul 15, 2016 •

edited

Hywan commented Jul 20, 2016 •

edited

nikic Jul 27, 2016 •

edited