New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL with port specifier not recognized as URL #928
Comments
If the built-in rules aren't matching, you can define your own: https://wezfurlong.org/wezterm/hyperlinks.html#implicit-hyperlinks |
Ok, I think I see why it is not matching. How general vs costly should the general regex be? |
Cheaper is better! I don't think this regex needs to be perfect, just to match the most common/useful things out of the box |
I was surprised when output from my local development server with a local address wasn't clickable. It is indeed because the built in URL matcher doesn't work with ports. I added this to my wezterm config file: hyperlink_rules = {
-- Linkify things that look like URLs
-- This is actually the default if you don't specify any hyperlink_rules
{
regex = "\\b\\w+://(?:[\\w.-]+)(?:(:?:\\.[a-z]{2,15}\\S*)|(?::\\d{1,5}))\\b",
format = "$0",
},
-- linkify email addresses
{
regex = "\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b",
format = "mailto:$0",
},
-- file:// URI
{
regex = "\\bfile://\\S*\\b",
format = "$0",
},
} The regex was taken from @matyklug18 's patch above. |
In addition to the issues described by this issue and its referenced issues, URLs ending with After a lot of tweaking and experimentation, I've settled on these two regexes for URLs. I specifically tried to make the regexes as simple as possible (using hyperlink_rules = {
-- First handle URLs wrapped with punctuation (i.e. brackets)
-- e.g. [http://foo] (http://foo) <http://foo> etc
-- the punctuation will be underlined but excluded when clicked
{
regex = '[[:punct:]](\\w+://\\S+)[[:punct:]]',
format = '$1',
},
-- Then handle URLs not wrapped in brackets
-- and include terminating ), / or - characters, if any
-- these seem to be the most common trailing characters that are part of URLs
-- there may be additional common ones. . .
{
regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
format = '$0',
},
}, I have tested these two regexes extensively for the last week or so on busy IRC channels and they seem to work much, much better than the default regexes. With the defaults, around 20% to 40% of hyperlinks would need manual intervention to add a missing trailing
It ends up making Additionally, with first regex for handling wrapped punctuation/brackets, only the captured |
I think the reason why not only the captured region is captured is because of this line: wezterm/termwiz/src/hyperlink.rs Line 238 in 23211fc
Capture 0 gets the entire match regardless of capture regions. |
re: punct version, what do you think about splitting it up into explicit regexes for the various pairs of brackets? {
regex = '\\((\\w+://\\S+)\\)',
format = '$1',
},
{
regex = '\\[(\\w+://\\S+)\\]',
format = '$1',
},
{
regex = '<(\\w+://\\S+)>',
format = '$1',
}, |
I've made some updates to incorporate your suggested rules, and added a way to select which capture group is highlighted. It typically takes about an hour before commits are available as nightly builds for all platforms. Linux builds are the fastest to build and are often available within about 20 minutes. Windows and macOS builds take a bit longer. Please take a few moments to try out the fix and let me know how that works out. You can find the nightly downloads for your system in the wezterm installation docs. If you prefer to use packages provided by your distribution or package manager of choice and don't want to replace that with a nightly download, keep in mind that you can download portable packages (eg: a If you are eager and can build from source then you may be able to try this out more quickly. |
Good idea. I'd actually been considering replacing What I'm wondering now is whether a single regex using character classes for the three kinds of brackets, like If using the character classes, I suppose there is the valid concern about the opening bracket not lining up with the closing bracket. For what it's worth, for the last month or so that I've been using the I'm not sure what kind of optimizations the regex crate provides. I did notice the RegexSet feature in the documentation: https://docs.rs/regex/latest/regex/#example-match-multiple-regular-expressions-simultaneously -- I'm unsure if this just iterates over each regex or if it actually optimizes/merges the regexes into one. It does say "in a single scan" ... |
regex set can match multiple at once, however, when doing substitutions and extracting captures, we still need to run the individual regexes. Until regex shows up near the top of a profile, I'm not concerned about the cost of a couple of more regexes. |
Just got around to trying out the nightly. The |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Describe the bug
Some URLs are not recognized as URLs (e.g. not clickable).
Environment (please complete the following information):
20210502-154244-3f7122cb
To Reproduce
From the output of
jupyter lab --no-browser
(over SSH):The
file:///
URL is correctly recognized as such and clickable, thehttp://
URLs are not (none of them).The text was updated successfully, but these errors were encountered: