Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL with port specifier not recognized as URL #928

Closed
dev-zero opened this issue Jul 7, 2021 · 12 comments
Closed

URL with port specifier not recognized as URL #928

dev-zero opened this issue Jul 7, 2021 · 12 comments
Labels
enhancement New feature or request fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds.

Comments

@dev-zero
Copy link

dev-zero commented Jul 7, 2021

Describe the bug

Some URLs are not recognized as URLs (e.g. not clickable).

Environment (please complete the following information):

  • OS: Linux Wayland (Wezterm running under XWayland)
  • Version: 20210502-154244-3f7122cb
  • The active keyboard layout name: US

To Reproduce

From the output of jupyter lab --no-browser (over SSH):

$ jupyter lab --port=9000 --no-browser
[I 2021-07-07 16:44:06.893 ServerApp] jupyterlab | extension was successfully linked.
[I 2021-07-07 16:44:07.231 ServerApp] jupyter_nbextensions_configurator | extension was found and enabled by nbclassic. Consider moving the extension to Jupyter Server's extension paths.
[I 2021-07-07 16:44:07.231 ServerApp] jupyter_nbextensions_configurator | extension was successfully linked.
[W 2021-07-07 16:44:07.231 ServerApp] The module 'ipyparallel.nbextension' could not be found. Are you sure the extension is installed?
[I 2021-07-07 16:44:07.231 ServerApp] nbclassic | extension was successfully linked.
[W 2021-07-07 16:44:07.267 ServerApp] jupyter_nbextensions_configurator | extension failed loading with message: 'nbextensions_path'
[I 2021-07-07 16:44:07.268 LabApp] JupyterLab extension loaded from /users/tiziano/.pyenv/versions/3.9.5/lib/python3.9/site-packages/jupyterlab
[I 2021-07-07 16:44:07.268 LabApp] JupyterLab application directory is /users/tiziano/.pyenv/versions/3.9.5/share/jupyter/lab
[I 2021-07-07 16:44:07.271 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-07-07 16:44:07.279 ServerApp] nbclassic | extension was successfully loaded.
[I 2021-07-07 16:44:07.281 ServerApp] Serving notebooks from local directory: /users/tiziano
[I 2021-07-07 16:44:07.281 ServerApp] Jupyter Server 1.7.0 is running at:
[I 2021-07-07 16:44:07.281 ServerApp] http://localhost:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:07.281 ServerApp]     http://127.0.0.1:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:07.281 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-07-07 16:44:07.287 ServerApp]

    To access the server, open this file in a browser:
        file:///users/tiziano/.local/share/jupyter/runtime/jpserver-3401-open.html    Or copy and paste one of these URLs:
        http://localhost:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
        http://127.0.0.1:9000/lab?token=a7a5f779e0a4f5097e4dfe4b7d9e4a6a794a30ad8ba0b6ff
[I 2021-07-07 16:44:27.845 LabApp] Build is up to date

The file:/// URL is correctly recognized as such and clickable, the http:// URLs are not (none of them).

@dev-zero dev-zero added the bug Something isn't working label Jul 7, 2021
@wez
Copy link
Owner

wez commented Jul 7, 2021

If the built-in rules aren't matching, you can define your own: https://wezfurlong.org/wezterm/hyperlinks.html#implicit-hyperlinks
If you can nail down what's not matching, please submit a pull request to fix it: this is the default in the code:
https://github.com/wez/wezterm/blob/main/config/src/lib.rs#L1742

@dev-zero
Copy link
Author

dev-zero commented Jul 7, 2021

Ok, I think I see why it is not matching. How general vs costly should the general regex be?
According to this SO answer the full universal RFC compatible regex is a bit "extensive".

@wez
Copy link
Owner

wez commented Jul 7, 2021

Cheaper is better! I don't think this regex needs to be perfect, just to match the most common/useful things out of the box

@wez wez added enhancement New feature or request and removed bug Something isn't working labels Jul 25, 2021
matyklug18 added a commit to matyklug18/wezterm that referenced this issue Aug 18, 2021
@Fingel
Copy link

Fingel commented Jun 10, 2022

I was surprised when output from my local development server with a local address wasn't clickable. It is indeed because the built in URL matcher doesn't work with ports. I added this to my wezterm config file:

 hyperlink_rules = {
        -- Linkify things that look like URLs
        -- This is actually the default if you don't specify any hyperlink_rules
        {
          regex = "\\b\\w+://(?:[\\w.-]+)(?:(:?:\\.[a-z]{2,15}\\S*)|(?::\\d{1,5}))\\b",
          format = "$0",
        },

        -- linkify email addresses
        {
          regex = "\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b",
          format = "mailto:$0",
        },

        -- file:// URI
        {
          regex = "\\bfile://\\S*\\b",
          format = "$0",
        },
  }

The regex was taken from @matyklug18 's patch above.

@pragma-
Copy link

pragma- commented Feb 18, 2023

In addition to the issues described by this issue and its referenced issues, URLs ending with ) e.g. https://en.wikipedia.org/wiki/Class_(set_theory) will not include the trailing closing ) in the clicked result. Likewise for URLs ending with - as in, e.g., https://www.onlinegdb.com/fork/Jd2x8Yvh-. Some websites require the closing / on URLs, e.g. some cgit repositories.

After a lot of tweaking and experimentation, I've settled on these two regexes for URLs. I specifically tried to make the regexes as simple as possible (using \w and \S) so they would be more performant. Note that \w+:// will match http://, https://, file://, ftp://, nntp://, gopher://, irc://, etc, all with one regex without additional file://-specific, etc, regexes.

    hyperlink_rules = {
        -- First handle URLs wrapped with punctuation (i.e. brackets)
        -- e.g. [http://foo] (http://foo) <http://foo> etc
        -- the punctuation will be underlined but excluded when clicked
        {
            regex = '[[:punct:]](\\w+://\\S+)[[:punct:]]',
            format = '$1',
        },

        -- Then handle URLs not wrapped in brackets
        -- and include terminating ), / or - characters, if any
        -- these seem to be the most common trailing characters that are part of URLs
        -- there may be additional common ones. . . 
        {
            regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
            format = '$0',
        },
    },

I have tested these two regexes extensively for the last week or so on busy IRC channels and they seem to work much, much better than the default regexes. With the defaults, around 20% to 40% of hyperlinks would need manual intervention to add a missing trailing /, ) or -, or to highlight and copy the hyperlink because the default regex stopped prematurely at an earlier word-boundary. With the above two regexes, 99% of hyperlinks are clickable. As for the rare 1% of cases, it unfortunately doesn't flawlessly handle messages like:

(test if your terminal supports sixels before anything: https://github.com/csdvrx/sixel-testsuite)

It ends up making https://github.com/csdvrx/sixel-testsuite) the clickable URL, with the ) attached to the end. :-( Hopefully someone has an idea to improve this situation. Fortunately, this case is relatively rare because most people know to put a space after URLs so trailing punctuation doesn't become clickable (or to wrap the URL with brackets). MOST people! :-)

Additionally, with first regex for handling wrapped punctuation/brackets, only the captured $1 region will be sent to the browser, but wezterm will still underline the outside punctuation/brackets anyway. I'm hoping that wezterm can be patched to underline only the captured region(s) from the capture group(s).

@Funami580
Copy link
Contributor

I think the reason why not only the captured region is captured is because of this line:

let c0 = self.captures.get(0).unwrap();

Capture 0 gets the entire match regardless of capture regions.

@wez
Copy link
Owner

wez commented Mar 17, 2023

re: punct version, what do you think about splitting it up into explicit regexes for the various pairs of brackets?

        {
            regex = '\\((\\w+://\\S+)\\)',
            format = '$1',
        },
        {
            regex = '\\[(\\w+://\\S+)\\]',
            format = '$1',
        },
        {
            regex = '<(\\w+://\\S+)>',
            format = '$1',
        },

@wez
Copy link
Owner

wez commented Mar 17, 2023

I've made some updates to incorporate your suggested rules, and added a way to select which capture group is highlighted.
https://wezfurlong.org/wezterm/config/lua/config/hyperlink_rules/ has more information about this.

It typically takes about an hour before commits are available as nightly builds for all platforms. Linux builds are the fastest to build and are often available within about 20 minutes. Windows and macOS builds take a bit longer.

Please take a few moments to try out the fix and let me know how that works out. You can find the nightly downloads for your system in the wezterm installation docs.

If you prefer to use packages provided by your distribution or package manager of choice and don't want to replace that with a nightly download, keep in mind that you can download portable packages (eg: a .dmg file on macOS, a .zip file on Windows and an .AppImage file on Linux) that can be run without permanently installing or replacing an existing package, and can then simply be deleted once you no longer need them.

If you are eager and can build from source then you may be able to try this out more quickly.

@wez wez added the fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds. label Mar 17, 2023
@pragma-
Copy link

pragma- commented Mar 17, 2023

re: punct version, what do you think about splitting it up into explicit regexes for the various pairs of brackets?

Good idea. I'd actually been considering replacing [[:punct:]] with opening/closing character classes like [(\[<] and [)\]>] because [[:punct:]] could potentially be very large, especially if it includes all of Unicode punctuations.

What I'm wondering now is whether a single regex using character classes for the three kinds of brackets, like [(\[<](\w+://\S+)[)\]>], would be more performant than three separate regexes.

If using the character classes, I suppose there is the valid concern about the opening bracket not lining up with the closing bracket. For what it's worth, for the last month or so that I've been using the [[:punct:]] variant, I haven't encountered any annoyances with the brackets. The character class variant may be accurate enough while being much more performant than three distinctive regexes, perhaps.

I'm not sure what kind of optimizations the regex crate provides. I did notice the RegexSet feature in the documentation: https://docs.rs/regex/latest/regex/#example-match-multiple-regular-expressions-simultaneously -- I'm unsure if this just iterates over each regex or if it actually optimizes/merges the regexes into one. It does say "in a single scan" ...

@wez
Copy link
Owner

wez commented Mar 17, 2023

regex set can match multiple at once, however, when doing substitutions and extracting captures, we still need to run the individual regexes. Until regex shows up near the top of a profile, I'm not concerned about the cost of a couple of more regexes.

@pragma-
Copy link

pragma- commented Mar 19, 2023

Just got around to trying out the nightly. The highlight option seems to solve underlining the desired captured group. Thanks!

@wez wez closed this as completed Mar 21, 2023
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request fixed-in-nightly This is (or is assumed to be) fixed in the nightly builds.
Projects
None yet
Development

No branches or pull requests

5 participants