Various changes, works with current Pygments now. #40

birkenfeld · 2020-09-06T10:17:46Z

No description provided.

All builds are now equivalent to previous "wide unicode".

Allow non-empty regexes that match the empty string if there is a state transition associated. These cases cannot use `default` since they usually contain an assertion.

We dont need words() to generate optimal regexes, it would make it too complex.

Start with file:line: and then the further details. This lets us automatically jump to these from IDEs.

In re.UNICODE mode (which is the default on Py3), the builtin charclasses \wWsSdD match more than just ASCII. The logic can't handle this currently, so disable the checker. Also, the way negated classes are handled (by building the full complement set) will have to be changed to avoid storing millions of Unicode characters over and over.

thatch · 2020-09-07T17:51:02Z

regexlint/checkers.py

-def check_wide_unicode(reg, errs):
-    num = '121'
-    level = logging.WARNING
-    msg = 'Wide unicode causes problems in narrow builds'


This code tested that patterns in Pygments that needed to use unirange in Pygments actually do. It looks like this isn't necessary anymore after PEP 393, so the unirange in pygments/util.py can also go away.

Yep, that's already gone.

thatch · 2020-09-07T17:54:01Z

regexlint/checkers.py

 def check_charclass_simplify(reg, errs):
    num = '123'
    level = logging.WARNING
    msg = 'Regex can be written more simply: %s -> %s'

-    if any(ord(c) > 255 for c in reg.raw):
+    if any(ord(c) > 255 for c in reg.raw) or reg.effective_flags & re.UNICODE:


This is ok; I'll follow up with a more permanent fix before cutting a release.

thatch · 2020-09-07T17:57:39Z

regexlint/checkers.py

+    # should be using default().
+    if not isinstance(raw_pat[1], Token.__class__):
+        return
+    if raw_pat[0] != '' and len(raw_pat) > 2:


Could you include a couple of lines of comment, I don't follow this change. Is this for the callback functions mentioned a few lines above?

It's for patterns that effect a state change with a zero-width match. Something like (r"(?=xyz)", Text, "#pop"), I don't think there's a better way to write it, and I don't think it's harmful since it will not lead to infinite looping in the same state.

I'll move the comments around a bit and clarify them.

birkenfeld · 2020-09-07T18:25:14Z

Hi Tim, nice to hear from you! This PR is not very focused, as it contains all the things I fixed and changed while working on integrating regexlint in the Pygments CI workflow. But I hope most commits messages are clear enough :)

thatch · 2020-09-16T03:36:26Z

Just pushed a 2.0; I tried and abandoned a fix for unicode charclass in time.

birkenfeld added 16 commits September 6, 2020 11:34

Update to Python 3.3+ only

225cf5a

All builds are now equivalent to previous "wide unicode".

Remove unneeded future imports

3e6d60f

Port tests to py.test

f6b252b

Allow escaping "^" in char classes

6cda599

Add "verbose" option (off by default) for the "OK" messages

7fa581d

Improve the "matches empty regex" checker

4660796

Allow non-empty regexes that match the empty string if there is a state transition associated. These cases cannot use `default` since they usually contain an assertion.

The Python 3.3+ regex parser handles \u and \U

1053598

Do not check for (?P=) capture groups without bygroups()

b6a35b0

Ignore W123 on words() generated regexes

56d41ac

We dont need words() to generate optimal regexes, it would make it too complex.

Fix determination of a lexer's module file

6ae7a4a

Fix message for bygroups() gap check

600be7a

Return exit status 1 if errors found

c90f449

Remove six requirement

9b3b18f

Fix compatibility with Python 3.8

8c7a97a

Change output format to match compiler errors

8cb8db5

Start with file:line: and then the further details. This lets us automatically jump to these from IDEs.

thatch reviewed Sep 7, 2020

View reviewed changes

Clarify comments for manual_check_for_empty_string_match

3d7426a

thatch merged commit d948c7e into thatch:master Sep 7, 2020

birkenfeld mentioned this pull request Feb 14, 2023

Add X++ support pygments/pygments#2339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes, works with current Pygments now. #40

Various changes, works with current Pygments now. #40

birkenfeld commented Sep 6, 2020

thatch Sep 7, 2020 •

edited

birkenfeld Sep 7, 2020

thatch Sep 7, 2020

thatch Sep 7, 2020

birkenfeld Sep 7, 2020

birkenfeld commented Sep 7, 2020

thatch commented Sep 16, 2020

Various changes, works with current Pygments now. #40

Various changes, works with current Pygments now. #40

Conversation

birkenfeld commented Sep 6, 2020

thatch Sep 7, 2020 • edited

Choose a reason for hiding this comment

birkenfeld Sep 7, 2020

Choose a reason for hiding this comment

thatch Sep 7, 2020

Choose a reason for hiding this comment

thatch Sep 7, 2020

Choose a reason for hiding this comment

birkenfeld Sep 7, 2020

Choose a reason for hiding this comment

birkenfeld commented Sep 7, 2020

thatch commented Sep 16, 2020

thatch Sep 7, 2020 •

edited