Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python.gram: reflect changes in cpython #41

Merged
merged 19 commits into from
May 13, 2022

Conversation

MatthieuDartiailh
Copy link
Collaborator

See python/cpython@e5f13ce

I will try to add validation of the line and offsets values in the tests but maybe not before next week.

data/python.gram Outdated Show resolved Hide resolved
data/python.gram Outdated Show resolved Hide resolved
@pablogsal
Copy link
Contributor

@MatthieuDartiailh You need to rebase this branch on top of main now that #60 has landed.

@MatthieuDartiailh
Copy link
Collaborator Author

Will do yes and I need to address the question of the tests too.

@MatthieuDartiailh
Copy link
Collaborator Author

All tests pass locally on 3.10.0, I will try to fix the three broken tests when I get a chance but I would already appreciate a review.

@MatthieuDartiailh
Copy link
Collaborator Author

The three failing are related to this rule

| !(NAME STRING | SOFT_KEYWORD) a=disjunction b=expression_without_invalid {
        _PyPegen_check_legacy_stmt(p, a) ? NULL : p->tokens[p->mark-1]->level == 0 ? NULL :
        RAISE_SYNTAX_ERROR_KNOWN_RANGE(a, b, "invalid syntax. Perhaps you forgot a comma?") }

but tokenize.py does not trac the level so emulating this will be tricky.

@pablogsal
Copy link
Contributor

The three failing are related to this rule

| !(NAME STRING | SOFT_KEYWORD) a=disjunction b=expression_without_invalid {
        _PyPegen_check_legacy_stmt(p, a) ? NULL : p->tokens[p->mark-1]->level == 0 ? NULL :
        RAISE_SYNTAX_ERROR_KNOWN_RANGE(a, b, "invalid syntax. Perhaps you forgot a comma?") }

but tokenize.py does not trac the level so emulating this will be tricky.

Yeah, we can ignore it for the time being. We could try to reformulate it using existing information or remove the restriction for now and let it be more noisy.

The underlying issue is not fixed since we do not have access to the right information.
@MatthieuDartiailh
Copy link
Collaborator Author

Once this go in I will work on adding the latest improvement to syntax errors:
bpo45716
bpo45764
bpo45727
bpo45450
bpo46836

@pablogsal
Copy link
Contributor

Is this now ready for review?

@MatthieuDartiailh
Copy link
Collaborator Author

Yes

@MatthieuDartiailh
Copy link
Collaborator Author

ping @pablogsal @lysnikolaou

Would it it be possible to get a review for this ?

@pablogsal
Copy link
Contributor

ping @pablogsal @lysnikolaou

Would it it be possible to get a review for this ?

Yeah, I will try to get to this this week. As we are close to 3.11b1 I'm getting a ton of extra work these weeks on CPython so I am a bit overwhelmed.

Apologies for the delay :(

@@ -171,59 +189,86 @@ class Parser(Parser):
f"(line {node.lineno})."
)

def get_invalid_target(self, target: Target, node: Optional[ast.AST]) -> Optional[ast.AST]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this mirrors _PyPegen_get_invalid_target

data/python.gram Outdated Show resolved Hide resolved
raise self._build_syntax_error(message, start, end)

def make_syntax_error(self, message: str) -> None:
return self._build_syntax_error(message)

def raise_syntax_error(self, message: str) -> None:
def expect_forced(self, res: Any, expectation: str) -> Optional[tokenize.TokenInfo]:
Copy link
Contributor

@pablogsal pablogsal Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using this method? IIRC forced tokens already work

def test_forced() -> None:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to get the right error location. CPython report equal start and end for forced token which is not what we do in the default implementation. The default parser only has make_syntax_error which queries the last token and use start and end which is reasonable in general but not in this special case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests covering this difference? Maybe we should add some to test_pegen.py to be explicit about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 4 tests in test_syntax_error_handling.py failing if I comment this out.

@MatthieuDartiailh
Copy link
Collaborator Author

ping @pablogsal

Copy link
Contributor

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks a lot, @MatthieuDartiailh for the patience and for the fantastic work. I know how much this work takes and I wanted to highlight how awesome is that you dedicated a lot of effort to get parity with the latest changes.

I apologized for the time this has been lying around, but the release of 3.11 is proving to be challenging 😅

@pablogsal pablogsal merged commit 995c737 into we-like-parsers:main May 13, 2022
@MatthieuDartiailh MatthieuDartiailh deleted the generator-call-error branch May 13, 2022 17:41
@MatthieuDartiailh
Copy link
Collaborator Author

Thanks @pablogsal !

#64 should be quite easy to review and add next. My other 2 PRs require some more discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants