URL with port specifier not recognized as URL #928

dev-zero · 2021-07-07T15:15:22Z

Describe the bug

Some URLs are not recognized as URLs (e.g. not clickable).

Environment (please complete the following information):

OS: Linux Wayland (Wezterm running under XWayland)
Version: 20210502-154244-3f7122cb
The active keyboard layout name: US

To Reproduce

From the output of jupyter lab --no-browser (over SSH):

$ jupyter lab --port=9000 --no-browser
[I 2021-07-07 16:44:06.893 ServerApp] jupyterlab | extension was successfully linked.
[I 2021-07-07 16:44:07.231 ServerApp] jupyter_nbextensions_configurator | extension was found and enabled by nbclassic. Consider moving the extension to Jupyter Server's extension paths.
[I 2021-07-07 16:44:07.231 ServerApp] jupyter_nbextensions_configurator | extension was successfully linked.
[W 2021-07-07 16:44:07.231 ServerApp] The module 'ipyparallel.nbextension' could not be found. Are you sure the extension is installed?
[I 2021-07-07 16:44:07.231 ServerApp] nbclassic | extension was successfully linked.
[W 2021-07-07 16:44:07.267 ServerApp] jupyter_nbextensions_configurator | extension failed loading with message: 'nbextensions_path'
[I 2021-07-07 16:44:07.268 LabApp] JupyterLab extension loaded from /users/tiziano/.pyenv/versions/3.9.5/lib/python3.9/site-packages/jupyterlab
[I 2021-07-07 16:44:07.268 LabApp] JupyterLab application directory is /users/tiziano/.pyenv/versions/3.9.5/share/jupyter/lab
[I 2021-07-07 16:44:07.271 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-07-07 16:44:07.279 ServerApp] nbclassic | extension was successfully loaded.
[I 2021-07-07 16:44:07.281 ServerApp] Serving notebooks from local directory: /users/tiziano
[I 2021-07-07 16:44:07.281 ServerApp] Jupyter Server 1.7.0 is running at:
[I 2021-07-07 16:44:07.281 ServerApp] http://localhost:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:07.281 ServerApp]     http://127.0.0.1:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:07.281 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-07-07 16:44:07.287 ServerApp]

    To access the server, open this file in a browser:
        file:///users/tiziano/.local/share/jupyter/runtime/jpserver-3401-open.html    Or copy and paste one of these URLs:
        http://localhost:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
        http://127.0.0.1:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:27.845 LabApp] Build is up to date

The file:/// URL is correctly recognized as such and clickable, the http:// URLs are not (none of them).

The text was updated successfully, but these errors were encountered:

wez · 2021-07-07T15:20:12Z

If the built-in rules aren't matching, you can define your own: https://wezfurlong.org/wezterm/hyperlinks.html#implicit-hyperlinks
If you can nail down what's not matching, please submit a pull request to fix it: this is the default in the code:
https://github.com/wez/wezterm/blob/main/config/src/lib.rs#L1742

dev-zero · 2021-07-07T15:58:51Z

Ok, I think I see why it is not matching. How general vs costly should the general regex be?
According to this SO answer the full universal RFC compatible regex is a bit "extensive".

wez · 2021-07-07T17:17:15Z

Cheaper is better! I don't think this regex needs to be perfect, just to match the most common/useful things out of the box

Fingel · 2022-06-10T22:36:38Z

I was surprised when output from my local development server with a local address wasn't clickable. It is indeed because the built in URL matcher doesn't work with ports. I added this to my wezterm config file:

 hyperlink_rules = {
        -- Linkify things that look like URLs
        -- This is actually the default if you don't specify any hyperlink_rules
        {
          regex = "\\b\\w+://(?:[\\w.-]+)(?:(:?:\\.[a-z]{2,15}\\S*)|(?::\\d{1,5}))\\b",
          format = "$0",
        },

        -- linkify email addresses
        {
          regex = "\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b",
          format = "mailto:$0",
        },

        -- file:// URI
        {
          regex = "\\bfile://\\S*\\b",
          format = "$0",
        },
  }

The regex was taken from @matyklug18 's patch above.

pragma- · 2023-02-18T23:53:22Z

In addition to the issues described by this issue and its referenced issues, URLs ending with ) e.g. https://en.wikipedia.org/wiki/Class_(set_theory) will not include the trailing closing ) in the clicked result. Likewise for URLs ending with - as in, e.g., https://www.onlinegdb.com/fork/Jd2x8Yvh-. Some websites require the closing / on URLs, e.g. some cgit repositories.

After a lot of tweaking and experimentation, I've settled on these two regexes for URLs. I specifically tried to make the regexes as simple as possible (using \w and \S) so they would be more performant. Note that \w+:// will match http://, https://, file://, ftp://, nntp://, gopher://, irc://, etc, all with one regex without additional file://-specific, etc, regexes.

    hyperlink_rules = {
        -- First handle URLs wrapped with punctuation (i.e. brackets)
        -- e.g. [http://foo] (http://foo) <http://foo> etc
        -- the punctuation will be underlined but excluded when clicked
        {
            regex = '[[:punct:]](\\w+://\\S+)[[:punct:]]',
            format = '$1',
        },

        -- Then handle URLs not wrapped in brackets
        -- and include terminating ), / or - characters, if any
        -- these seem to be the most common trailing characters that are part of URLs
        -- there may be additional common ones. . . 
        {
            regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
            format = '$0',
        },
    },

I have tested these two regexes extensively for the last week or so on busy IRC channels and they seem to work much, much better than the default regexes. With the defaults, around 20% to 40% of hyperlinks would need manual intervention to add a missing trailing /, ) or -, or to highlight and copy the hyperlink because the default regex stopped prematurely at an earlier word-boundary. With the above two regexes, 99% of hyperlinks are clickable. As for the rare 1% of cases, it unfortunately doesn't flawlessly handle messages like:

(test if your terminal supports sixels before anything: https://github.com/csdvrx/sixel-testsuite)

It ends up making https://github.com/csdvrx/sixel-testsuite) the clickable URL, with the ) attached to the end. :-( Hopefully someone has an idea to improve this situation. Fortunately, this case is relatively rare because most people know to put a space after URLs so trailing punctuation doesn't become clickable (or to wrap the URL with brackets). MOST people! :-)

Additionally, with first regex for handling wrapped punctuation/brackets, only the captured $1 region will be sent to the browser, but wezterm will still underline the outside punctuation/brackets anyway. I'm hoping that wezterm can be patched to underline only the captured region(s) from the capture group(s).

Funami580 · 2023-02-20T03:18:39Z

I think the reason why not only the captured region is captured is because of this line:

wezterm/termwiz/src/hyperlink.rs

Line 238 in 23211fc

let c0 = self.captures.get(0).unwrap();

Capture 0 gets the entire match regardless of capture regions.

wez · 2023-03-17T01:38:38Z

re: punct version, what do you think about splitting it up into explicit regexes for the various pairs of brackets?

        {
            regex = '\\((\\w+://\\S+)\\)',
            format = '$1',
        },
        {
            regex = '\\[(\\w+://\\S+)\\]',
            format = '$1',
        },
        {
            regex = '<(\\w+://\\S+)>',
            format = '$1',
        },

refs: #928

wez · 2023-03-17T04:13:28Z

I've made some updates to incorporate your suggested rules, and added a way to select which capture group is highlighted.
https://wezfurlong.org/wezterm/config/lua/config/hyperlink_rules/ has more information about this.

It typically takes about an hour before commits are available as nightly builds for all platforms. Linux builds are the fastest to build and are often available within about 20 minutes. Windows and macOS builds take a bit longer.

Please take a few moments to try out the fix and let me know how that works out. You can find the nightly downloads for your system in the wezterm installation docs.

If you prefer to use packages provided by your distribution or package manager of choice and don't want to replace that with a nightly download, keep in mind that you can download portable packages (eg: a .dmg file on macOS, a .zip file on Windows and an .AppImage file on Linux) that can be run without permanently installing or replacing an existing package, and can then simply be deleted once you no longer need them.

If you are eager and can build from source then you may be able to try this out more quickly.

pragma- · 2023-03-17T18:47:16Z

re: punct version, what do you think about splitting it up into explicit regexes for the various pairs of brackets?

Good idea. I'd actually been considering replacing [[:punct:]] with opening/closing character classes like [(\[<] and [)\]>] because [[:punct:]] could potentially be very large, especially if it includes all of Unicode punctuations.

What I'm wondering now is whether a single regex using character classes for the three kinds of brackets, like [(\[<](\w+://\S+)[)\]>], would be more performant than three separate regexes.

If using the character classes, I suppose there is the valid concern about the opening bracket not lining up with the closing bracket. For what it's worth, for the last month or so that I've been using the [[:punct:]] variant, I haven't encountered any annoyances with the brackets. The character class variant may be accurate enough while being much more performant than three distinctive regexes, perhaps.

I'm not sure what kind of optimizations the regex crate provides. I did notice the RegexSet feature in the documentation: https://docs.rs/regex/latest/regex/#example-match-multiple-regular-expressions-simultaneously -- I'm unsure if this just iterates over each regex or if it actually optimizes/merges the regexes into one. It does say "in a single scan" ...

wez · 2023-03-17T18:50:46Z

regex set can match multiple at once, however, when doing substitutions and extracting captures, we still need to run the individual regexes. Until regex shows up near the top of a profile, I'm not concerned about the cost of a couple of more regexes.

pragma- · 2023-03-19T03:25:06Z

Just got around to trying out the nightly. The highlight option seems to solve underlining the desired captured group. Thanks!

github-actions · 2023-04-21T03:05:55Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dev-zero added the bug Something isn't working label Jul 7, 2021

wez added enhancement New feature or request and removed bug Something isn't working labels Jul 25, 2021

matyklug18 added a commit to matyklug18/wezterm that referenced this issue Aug 18, 2021

Added support for ports in hyperlinks (fixes wez#928)

ff64bb3

wez mentioned this issue Dec 29, 2022

hyperlinks broken for IP-based domains #2903

Closed

wez mentioned this issue Jan 11, 2023

Hyperlink on files ending with .js doesnt work #2917

Closed

wez mentioned this issue Feb 17, 2023

Better clickable URL regexes #3112

Closed

wez mentioned this issue Feb 24, 2023

default config , http://127.0.0.1:8888/demo/ can't click #3147

Closed

wez added a commit that referenced this issue Mar 17, 2023

improve default hyperlink_rules. add default_hyperlink_rules()

6a9dfc4

refs: #928

wez added the fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds. label Mar 17, 2023

wez closed this as completed Mar 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL with port specifier not recognized as URL #928

URL with port specifier not recognized as URL #928

dev-zero commented Jul 7, 2021

wez commented Jul 7, 2021

dev-zero commented Jul 7, 2021

wez commented Jul 7, 2021

Fingel commented Jun 10, 2022 •

edited

pragma- commented Feb 18, 2023 •

edited

Funami580 commented Feb 20, 2023

wez commented Mar 17, 2023

wez commented Mar 17, 2023

pragma- commented Mar 17, 2023 •

edited

wez commented Mar 17, 2023

pragma- commented Mar 19, 2023

github-actions bot commented Apr 21, 2023

URL with port specifier not recognized as URL #928

URL with port specifier not recognized as URL #928

Comments

dev-zero commented Jul 7, 2021

Describe the bug

Environment (please complete the following information):

To Reproduce

wez commented Jul 7, 2021

dev-zero commented Jul 7, 2021

wez commented Jul 7, 2021

Fingel commented Jun 10, 2022 • edited

pragma- commented Feb 18, 2023 • edited

Funami580 commented Feb 20, 2023

wez commented Mar 17, 2023

wez commented Mar 17, 2023

pragma- commented Mar 17, 2023 • edited

wez commented Mar 17, 2023

pragma- commented Mar 19, 2023

github-actions bot commented Apr 21, 2023

Fingel commented Jun 10, 2022 •

edited

pragma- commented Feb 18, 2023 •

edited

pragma- commented Mar 17, 2023 •

edited