Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url parsing mangles urls with parenthesis in regex #373

Closed
janne-tervo opened this issue Apr 29, 2021 · 8 comments
Closed

Url parsing mangles urls with parenthesis in regex #373

janne-tervo opened this issue Apr 29, 2021 · 8 comments
Labels
bug Something isn't working fix confirmation pending issue has been fixed and confirmation from issue reporter is pending

Comments

@janne-tervo
Copy link

Describe the bug
Spectacular schema generation mangles urls with parenthesis in parameter patterns

To Reproduce
Add to url patterns:
url(r'^jsi18n/(?P[a-z]{2}(-[a-z]{2})?)/$', JavaScriptCatalog.as_view()),
Attempt to generate schema fails.

Expected behavior
Url regex patterns with parenthesis are parsed correctly.

@tfranzel
Copy link
Owner

hi @janne-tervo!

url(r'^jsi18n/(?P[a-z]{2}(-[a-z]{2})?)/$', JavaScriptCatalog.as_view()),

we do not account for that i suppose. pretty sure that this is an illegal regex pattern written like that

>>> import re
>>> re.compile(r'^jsi18n/(?P[a-z]{2}(-[a-z]{2})?)/$')
re.error: unknown extension ?P[ at position 9
>>> re.compile('^jsi18n/(?P<some_name>[a-z]{2}(-[a-z]{2})?)/$')
>>> # no error

i would guess that making requests there would also fail, maybe even starting the server.

@janne-tervo
Copy link
Author

Markdown bit me here.

This gets mangled by spectacular url parsing:
url(r'^jsi18n/(?P<language>[a-z]{2}(-[a-z]{2})?)/$',

This does not get mangled:
url(r'^jsi18n/(?P<language>[^/]+)/$',

@janne-tervo
Copy link
Author

This is caused by:
plumbing.py:767 regex=re.sub(r'(?P<(\w+)>.+?)', r'(?P<\1>[^/]+)', pattern._regex),
which stops at first parentheses in variable regex.

@tfranzel
Copy link
Owner

ahh now i get it. makes sense.

@tfranzel tfranzel added the bug Something isn't working label Apr 29, 2021
tfranzel added a commit that referenced this issue Apr 30, 2021
replace still incorrect regex with parsing state machine.
regex are not cut out for parenthesis counting.
@tfranzel
Copy link
Owner

turns out this was fixed once before (#168) but it was incomplete. i chose to write a parser for it and replace the old RE.

i believe it would have been possible to construct a extended RE, but it would have been (imho) way more complicated with low readability. i hope this solution now is at least somewhat readable and more importantly correct.

can you check if this works for you now?

@tfranzel tfranzel added the fix confirmation pending issue has been fixed and confirmation from issue reporter is pending label Apr 30, 2021
@janne-tervo
Copy link
Author

This commit fixes my issue.
Alternative fix with extra dependency could be created using e.g. recursive extension from https://pypi.org/project/regex/. This problem can't be solved using python re package because it can't recognize this language.

@tfranzel
Copy link
Owner

tfranzel commented May 3, 2021

awesome!

i love the regex package and use it myself. i briefly tried but was not able to get it right and as you said, adding deps is always problematic. if my solution turns out to be suboptimal, we can still switch to regex.

@tfranzel
Copy link
Owner

tfranzel commented May 3, 2021

i'll close this issue for now. feel free to circle back if you notice any problems.

@tfranzel tfranzel closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fix confirmation pending issue has been fixed and confirmation from issue reporter is pending
Projects
None yet
Development

No branches or pull requests

2 participants