URL detection is too aggressive #98

kuiperzone · 2024-04-08T13:41:12Z

From what I can see, Folio auto detects a link in the text if merely contains a . with no whitespace. As a result, it also characterises simple file names and namespaces as links and my notes are littered with false positive "links".

See screenie.

May I suggest for purposes of auto-link detection, a text fragment must contain (but not start or end with) both:

://, .

where the first occurrence of . must be after the :// the scheme separator.

Or as well: a fragment may start with www. instead of containing ::/, though few use www any more.

This obviously shouldn't apply to links explicitly designated with ()[].

The text was updated successfully, but these errors were encountered:

toolstack · 2024-04-08T15:04:39Z

I'm not surprise there are a few false positives, url detection is challenging in unstructured text.

I had it checking for :// but that excluded too much.

I'll take another look and see if it can be tuned a little more, otherwise I'm thinking that erring on the side of too many matches is better than too few.

kuiperzone · 2024-04-08T16:59:14Z

Hey thanks for the reply!

While I don't want to suggest an option for every problem...

Perhaps there could be an option for auto-detect links with 3 states:

Disabled
Cautious/Normal (using ://)
Aggressive/Verbose (as current)

?

toolstack · 2024-04-08T17:10:27Z

Perhaps, but I really want to rewrite the regex processor anyway at some point so don't want to put a whole lot of effort into playing whack a mole with the existing one.

kuiperzone · 2024-06-03T14:54:11Z

If we cannot have a robust URL detector, can we at least disable the auto URL detection. I don't actually see it as useful, but understand others might.

I guess this is content specific, but if you are keeping IT related notes than more than half of "URLs" will be false positives.

It falsely detects the following as URLs:

ISO times
filenames
version numbers
namespaces
Anything with a period in it

toolstack · 2024-06-03T16:43:25Z

I was thinking of adding a disable url detection option, so I don't see that being an issue.

Unfortunately url detection is a messy business in free form text, so it's never going to be perfect.

I might do a three way selector; aggressive, strict, disable.

Strict would require a proper protocol part to exist (aka https://, etc.) before it was identified as a url.

kuiperzone · 2024-06-03T17:49:03Z

Good stuff!

Aggressive and strict? That doesn't sound fun. I'll be selecting disabled then. :)

kuiperzone changed the title ~~URL detection is too agressive~~ URL detection is too aggressive Apr 8, 2024

toolstack added the enhancement New feature or request label Apr 9, 2024

toolstack added this to the Future release milestone Apr 9, 2024

toolstack mentioned this issue Sep 4, 2024

Add option to set url detection level. #170

Merged

toolstack closed this as completed in #170 Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL detection is too aggressive #98

URL detection is too aggressive #98

kuiperzone commented Apr 8, 2024 •

edited

Loading

toolstack commented Apr 8, 2024

kuiperzone commented Apr 8, 2024

toolstack commented Apr 8, 2024

kuiperzone commented Jun 3, 2024

toolstack commented Jun 3, 2024

kuiperzone commented Jun 3, 2024 •

edited

Loading

URL detection is too aggressive #98

URL detection is too aggressive #98

Comments

kuiperzone commented Apr 8, 2024 • edited Loading

toolstack commented Apr 8, 2024

kuiperzone commented Apr 8, 2024

toolstack commented Apr 8, 2024

kuiperzone commented Jun 3, 2024

toolstack commented Jun 3, 2024

kuiperzone commented Jun 3, 2024 • edited Loading

kuiperzone commented Apr 8, 2024 •

edited

Loading

kuiperzone commented Jun 3, 2024 •

edited

Loading