Skip to content
This repository has been archived by the owner on May 8, 2021. It is now read-only.

Use recursive subpattern to match balanced parentheses. #596

Closed
wants to merge 1 commit into from

Conversation

MaskRay
Copy link

@MaskRay MaskRay commented May 8, 2018

This skips the trailing ) in [](http://example) while matches
https://en.wikipedia.org/wiki/Haskell_(programming_language)

Fix #594

@thestinger
Copy link
Owner

thestinger commented May 9, 2018

This doesn't look what I've requested in the past.

I think it makes sense to ignore the trailing URL in the following case, bu treating it as part of a wrapper around the URL:

(https://foobar.com/baz)

However, I don't want it to be ignored in the following

https://foobar.com/baz)

In a case like this, I don't want to make an attempt to treat the trailing parentheses as not part of the URL:

(a statement in parentheses with a URL is not something I want to try to handle: https://foobar.com/baz)

This regular expression also needs to be treated as somewhat performance sensitive.

@thestinger
Copy link
Owner

I'd also be fine with approaching cases like a URL wrapped in single quotes in the same way.

@MaskRay
Copy link
Author

MaskRay commented May 9, 2018 via email

@MaskRay
Copy link
Author

MaskRay commented May 9, 2018

atomic subgroup ?> is used for performance

@MaskRay
Copy link
Author

MaskRay commented May 26, 2018

Has this PR been rejected? If so I can close it and use my local patch.

@jelly
Copy link
Collaborator

jelly commented May 27, 2018

Are the cases brought up by @thestinger resolved in the pull request?

@MaskRay
Copy link
Author

MaskRay commented May 27, 2018

I don't understand the rationale of the cases brought up. With the PR,

(https://foobar.com/baz)

However, I don't want it to be ignored in the following

https://foobar.com/baz)

In a case like this, I don't want to make an attempt to treat the trailing parentheses as not part of the URL:

(a statement in parentheses with a URL is not something I want to try to handle: https://foobar.com/baz)

This is IMHO more intuitive and better than his proposed () being part of the URL):

unmatched ) will not be a part of the URL. If you think this is unintended, I can close the PR immediately.

@thestinger thestinger closed this May 28, 2018
@thestinger
Copy link
Owner

It's valid to have ( and ) characters in a URL and they don't need to be balanced. That's the rationale.

I'll wait for someone to submit a pull request implementing ignoring ) when ( is directly in front of the URL because that's a clear form of syntax.

@MaskRay
Copy link
Author

MaskRay commented Oct 26, 2020

I've carried this local patch for 2 years now as it is needed to mitigate #594 and #556 and many Wikipedia links.

@thestinger
Copy link
Owner

It wasn't the approach that I wanted and there isn't currently a maintainer/developer for the project.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't match unpaired ) in url_regex.hh URLPATH
3 participants