Better clickable URL regexes #3112

pragma- · 2023-02-14T18:54:24Z

The default URL regexes leave a lot to be desired. URLs ending with ) e.g. https://en.wikipedia.org/wiki/Class_(set_theory) will not include the trailing closing ) in the clicked result. Likewise for URLs ending with - as in https://www.onlinegdb.com/fork/Jd2x8Yvh-. There also were quirky issues with URLs wrapped with parentheses or brackets or quotes.

After a lot of tweaking and experimentation, I've settled on these two regexes for URLs:

    hyperlink_rules = {
        -- Handle URLs wrapped with punctuation
        -- e.g. 'http://foo' (http://foo) <http://foo> etc
        -- the punctuation will be underlined but excluded when clicked
        {
            regex = '[[:punct:]](\\w+://\\S+)[[:punct:]]',
            format = '$1',
        },

        -- Linkify things that look like URLs
        -- and include terminating ), / or - characters, if any
        {
            regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
            format = '$0',
        },
    },

I have used these two regexes for the last couple of days on busy IRC channels and they seem to work much, much better than the default regexes. I don't profess to be an expert with regexes and I appreciate any critical feedback/improvements/suggestions.

What does everyone think?

The text was updated successfully, but these errors were encountered:

pragma- · 2023-02-14T18:55:34Z

Oh, and some websites require the closing / on URLs, e.g. some cgit repositories. The default wezterm URL regexes do not include the closing /. The above regexes do.

pragma- · 2023-02-14T18:56:53Z

Is there a way to not underline the surrounding punctuation in the first regex? I use a capture-group to capture $1 within the [[:punct:]] classes.

pragma- · 2023-02-14T19:10:59Z

Unfortunately it doesn't handle messages like:

(test if your terminal supports sixels before anything: https://github.com/csdvrx/sixel-testsuite)

It ends up making https://github.com/csdvrx/sixel-testsuite) the clickable URL, with the ) attached to the end. :-( Hopefully someone has an idea to improve this situation.

wez · 2023-02-17T19:00:02Z

Essentially a duplicate of #928
I'm open to a PR that is accompanied by a set of regexes that works well. As you've discovered, it's not as easy as it might appear at first glance!
I'm going to close this issue in favor of the other one so that there are fewer things to keep track of. You are welcome to continue iterating and sharing your findings on the other issue.

pragma- · 2023-02-17T23:10:34Z

What about a way to underline just the part that is captured? My first regex is regex = '[[:punct:]](\\w+://\\S+)[[:punct:]] which is a capture group within punct classes. Is there a way to make the underlined part be just the captured $1 region? Can the hyperlink underliner be patched to underline just the captured region instead?

wez · 2023-02-18T01:59:49Z

If you don't need the punctuation to match or be underlined, can you omit the [[:punct:]] from either end? It doesn't seem like it serves a purpose.

pragma- · 2023-02-18T16:29:25Z

The purpose is to specifically match hyperlinks that are within punctuation. It is a simple (and therefore performant) way to match things like <http://foo/> and (http://foo/) and [http://foo/] without losing the trailing /. It will even correctly extract http://foo/bar_(baz) from (http://foo/bar_(baz)).

There are two regexes in my snippet in the first comment. The [[:punct:]] regex is first specifically to handle hyperlinks that are wrapped with punctuation. The regex is very general in order to be much simpler so it can be more performant. It's intended to match hyperlinks surrounded by [] or () or <> or whatever punctuation.

If a hyperlink doesn't match this first regex, the next regex gets tested, which has a \b word-boundary to handle the beginning protocol identifier and then terminates on [)/a-zA-Z0-9-]+ to include the terminating - or ) or / in the hyperlinks, for reasons described in my previous comments.

I've been using these two regexes instead of the default regexes for the last 3 or 4 days, and it is much, much, much better for matching hyperlinks. 99.9% of the links are correctly clickable now whereas before only about 60% to 70% of the links would be clickable without needing to manually add a trailing ) or / or to manually highlight the entire link because it'd incorrectly stop at an earlier word-boundary.

I was hoping to share these with the community so they could try them out and iterate on them to improve them. They are much better than the defaults because not only do they match more hyperlinks more accurately but they're also simpler, and therefore more performant, than the defaults. There is just this one edge-case, as mentioned, that if resolved would make these regexes 100% perfect for me.

wez · 2023-02-18T16:33:30Z

Have you considered using a non-capturing group for the punctuation?
https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions

pragma- · 2023-02-18T16:34:29Z

... but the capture group isn't wrapped around the punctuation classes!

pragma- · 2023-02-18T16:40:14Z

Okay, I tried setting the regex to (?:[[:punct:]])(\\w+://\\S+)(?:[[:punct:]]) with non-capturing groups around the punctuation classes. It had no effect.

pragma- · 2023-02-18T17:12:49Z

I'd like to move this conversation to #928 and to continue it there. I've condensed my multiple comments here into one heavily edited comment on that issue.

github-actions · 2023-03-21T03:18:13Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

pragma- added the enhancement New feature or request label Feb 14, 2023

pragma- mentioned this issue Feb 14, 2023

Updated URL regexes for wezterm goyalyashpal/yafc#1

Closed

wez closed this as completed Feb 17, 2023

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better clickable URL regexes #3112

Better clickable URL regexes #3112

pragma- commented Feb 14, 2023

pragma- commented Feb 14, 2023

pragma- commented Feb 14, 2023 •

edited

pragma- commented Feb 14, 2023

wez commented Feb 17, 2023

pragma- commented Feb 17, 2023

wez commented Feb 18, 2023

pragma- commented Feb 18, 2023 •

edited

wez commented Feb 18, 2023

pragma- commented Feb 18, 2023

pragma- commented Feb 18, 2023

pragma- commented Feb 18, 2023

github-actions bot commented Mar 21, 2023

Better clickable URL regexes #3112

Better clickable URL regexes #3112

Comments

pragma- commented Feb 14, 2023

pragma- commented Feb 14, 2023

pragma- commented Feb 14, 2023 • edited

pragma- commented Feb 14, 2023

wez commented Feb 17, 2023

pragma- commented Feb 17, 2023

wez commented Feb 18, 2023

pragma- commented Feb 18, 2023 • edited

wez commented Feb 18, 2023

pragma- commented Feb 18, 2023

pragma- commented Feb 18, 2023

pragma- commented Feb 18, 2023

github-actions bot commented Mar 21, 2023

pragma- commented Feb 14, 2023 •

edited

pragma- commented Feb 18, 2023 •

edited