Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

Tokens: Implement name rules #31

Merged
merged 40 commits into from
Aug 2, 2016
Merged

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented Jul 13, 2016

Address #10.
Must be merged after #24.

Specification

https://github.com/php/php-langspec/blob/master/spec/19-grammar.md#names

Progression

  • Variable,
  • Namespace,
  • Namespace name as a prefix,
  • Qualified namespace,
  • Name,
  • Name Nondigit,
  • Nondigit,
  • More tests,
  • Namespace names cannot contain a keyword:
    • Create the keyword parser.
  • yield from can (only) have whitespaces between yield and from (https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#keywords),
  • Tokens are case insensitive:
    • Create the itag parser and macro,
    • Create the keyword macro, alias to itag,
    • Update the keywords parser,
    • Update keywords test suite,
    • Update qualified_name,
    • Update qualified_name test suite.

It replaces the `Identifier` enumerator.
The `name` rule is mostly a renaming from `identifier`; it removes the
`Identifier` computation and returns a simple `&[u8]`.
@Hywan
Copy link
Member Author

Hywan commented Jul 13, 2016

#24 has been merged, so I have rebased the commits on master.

First, UTF-8 validation is not required.
Second, the parser works with `&[u8]` so it eases the comparison with
existing tokens.
These new variants (`Unqualified`, `Qualified`, `RelativeQualified`,
and `FullyQualified`) better reflects the actual semantics of the code.
This new rule replaces the `namespace` rule.
`and_not!(I -> IResult<I, O>, I -> IResult<I, P>) => I -> IResult<I, 0>`
returns the result of the first parser if the second fails. Both parsers
run on the same input.

This is handy when the first parser accepts general values and the
second parser denies a particular subset of values.
The following input is valid: `namespace\Foo\Bar`, while this one is
not: `Foo\namespace\Bar`. `namespace` is a valid name but the
`qualified_name` rule must be stricter.
@Hywan
Copy link
Member Author

Hywan commented Jul 15, 2016

@jubianchi: Ready for a review!

Qualified(Vec<&'a [u8]>),
/// A relative qualified name, i.e. a name in a relative namespace
/// restricted to the current namespace, like `namespace\Foo\Bar`.
RelativeQualified(Vec<&'a [u8]>),
Copy link
Member Author

@Hywan Hywan Jul 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very confident with this name. I wonder if RestrictedQualified would not be better.
cc @nikic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment has been closed by Github. I am reopening it.

Applies the parser 0 or more times and skips the consumed data, nothing
is returned.

The embedded parser may return `nom::IResult::Incomplete`.

This is heavily inspired by the original nom `many0` macro.
@Hywan
Copy link
Member Author

Hywan commented Jul 20, 2016

Exact about token_get_all. I have a plan for that, but this will come later. This is taken into account. Thanks for the reminder!

This macro is applying the `skip` rule before the first argument; it
allows to skip tokens.
The following qualified name is valid:

```php
Foo
/* baz*/ \ Bar
```

It is equivalent to:

```php
Foo\Bar
```

By using the `first` macro, skip tokens can be supported.
The `first!(parser)` syntax was not tested. This patch extends existing
test cases to test this particular syntax.
"The `YIELD` token.\n\nRepresent the generator operator, e.g. `yield …;`."
);
token!(
pub YIELD_FROM: "yield from";
pub YIELD_FROM: b"yield from";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but the whitespace between yield and from isn't necessarily a single character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Thanks for this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it all whitespaces, comments etc. (skip tokens) or just regular space?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec says

Note carefully that yield from is a single token that contains whitespace. However, comments are not permitted in that whitespace.

So any whitespace is fine, but no comments.

)
}

test_keyword!(case_keyword_abstract: (b"abstract", super::ABSTRACT));
Copy link

@nikic nikic Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add some tests to make sure variations with different case work? ABSTRACT, AbStRaCt :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

A qualified name is composed of `name` rule. However, some values must
be excluded from the `name` rule. Previously, only the `namespace` token
was excluded but this was wrong: All the keywords must be excluded.
@Hywan
Copy link
Member Author

Hywan commented Jul 27, 2016

Damn, the https://github.com/php/php-langspec/blob/master/spec/19-grammar.md#names Section does not cover all the constraints. Actually https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md provides much more details. I will review my PR with this Section in mind.

This macro declares a case-insensitive ASCII array as a suite to
recognize.

It is pretty similar to the nom `tag!` macro except it is
case-insensitive and only accepts ASCII characters so far.
This patch uses the new `itag` macro. Associated test suite is updated
by testing the given result and its uppercased version.
It is an alias to the `itag` macro.

The goal of this alias is twofold:

  1. It avoids confusion and errors (a PHP keyword is always case-insensitive),
  2. It ensures a better readability of parsers.
This patch ensures that case is insensitive for tokens in qualified
names.
This is just a semantic change.
`yield from` is a keyword but the specification says:

> Note carefully that yield from is a single token that contains
> whitespace. However, comments are not permitted in that whitespace.
@Hywan Hywan merged commit 7fbcbfb into tagua-vm:master Aug 2, 2016
@Hywan Hywan removed the in progress label Aug 2, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants