Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use re.fullmatch #750

Open
karlcz opened this issue Nov 23, 2022 · 5 comments
Open

Use re.fullmatch #750

karlcz opened this issue Nov 23, 2022 · 5 comments
Assignees

Comments

@karlcz
Copy link

karlcz commented Nov 23, 2022

We have an app where we have been setting global flags in the URL pattern with a prefix like (?i) or (?s). It has worked well until recently with Python 3.11 where the re module throws an error that the global flags must be set at the beginning of the pattern. I tracked this down to the fact that webpy is blindly modifying the pattern with anchors like ^ pattern \Z and the re module no longer tolerates the anchor (or any other pattern text) coming before the flags.

This can be fixed in application.py by replacing this call:

result = utils.re_complile(r'%s\z' % (pat,)).match(value)

with the equivalent:

result = utils.re_compile(pat).fullmatch(value)

This allows legal patterns with globals to be passed successfully, without adding any complexity to actually parse/split the pattern and add anchor after the globals.

I also see two other calls with the re_subm utility function. These look like:

what, results = utils.re_subm(r"^%s\Z'" % (pat,), what, value)
...
what, results = utils.re_subm("^" + pat + "$", what, value)

Since there isn't a variant of sub comparable to fullmatch to require a full match without adding anchors to the pattern, you'd have to add something uglier if you want to allow flags here. For example, you can then introduce anchors that won't break the pattern:

def re_set_anchors(pat, start='^', end='$'):
    """Return pattern modified with specified start and end anchors.

    :param pat: the original pattern to be augmented
    :param start: the desired starting anchor or None
    :param end: the desired ending anchor or None
    
    Returns the augmented pattern while preserving any prefixed global
    flags. Also detects and strips existing anchors.
    """
    start = start if start else ''
    end = end if end else ''
    metapat = r'^((\(\?[aiLmsux]+\))*)\^?(.*?)(\$|\\Z)?$'
    return re_compile(metapat).sub(r'\1' + start + r'\3' + end, pat)    
...
what, results = utils.re_subm(re_set_anchors(pat), what, value)
@cclauss
Copy link
Contributor

cclauss commented Nov 23, 2022

@cdrini Your review, please.

@cdrini
Copy link

cdrini commented Nov 24, 2022

Ah ok, if I understand the problem correctly, in python <3.11 this would work:

re.compile(r'^(?i)hello').match('HELLO')
# Equivalent to:
re.compile(r'^hello', re.IGNORECASE).match('HELLO')

Now, the first example throws an error in 3.11 since (?i) is only allowed at the start.

Because web.py does manual modification or regex patterns (like re.compile('^' + user_pattern)), it causes an issue in 3.11 if the user_pattern containers an inline flag (eg (?i)).

@cdrini
Copy link

cdrini commented Nov 24, 2022

The proposed solution is to change from using things like '^' + user_pattern to using the appropriate re method (eg fullmatch) that has that some function. This seems pretty reasonable to me? Only thing to keep an eye on is to make sure that the new regexes aren't returned by a public function anywhere. Because that could cause users of the library's code to fail.

@cdrini
Copy link

cdrini commented Nov 24, 2022

Ah but the sub case is trickier since there is no equivalent fn like fullmatch...

@cdrini
Copy link

cdrini commented Nov 24, 2022

I can't think of anything else! The proposed fix for subm seems reasonable. I love how it correctly handles the case where the regex already has ^ or $!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants