Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL detection is too aggressive #98

Closed
kuiperzone opened this issue Apr 8, 2024 · 6 comments · Fixed by #170
Closed

URL detection is too aggressive #98

kuiperzone opened this issue Apr 8, 2024 · 6 comments · Fixed by #170
Labels
enhancement New feature or request

Comments

@kuiperzone
Copy link

kuiperzone commented Apr 8, 2024

From what I can see, Folio auto detects a link in the text if merely contains a . with no whitespace. As a result, it also characterises simple file names and namespaces as links and my notes are littered with false positive "links".

See screenie.

May I suggest for purposes of auto-link detection, a text fragment must contain (but not start or end with) both:

://, .

where the first occurrence of . must be after the :// the scheme separator.

Or as well: a fragment may start with www. instead of containing ::/, though few use www any more.

This obviously shouldn't apply to links explicitly designated with ()[].

image

@kuiperzone kuiperzone changed the title URL detection is too agressive URL detection is too aggressive Apr 8, 2024
@toolstack
Copy link
Owner

I'm not surprise there are a few false positives, url detection is challenging in unstructured text.

I had it checking for :// but that excluded too much.

I'll take another look and see if it can be tuned a little more, otherwise I'm thinking that erring on the side of too many matches is better than too few.

@kuiperzone
Copy link
Author

Hey thanks for the reply!

While I don't want to suggest an option for every problem...

Perhaps there could be an option for auto-detect links with 3 states:

  1. Disabled
  2. Cautious/Normal (using ://)
  3. Aggressive/Verbose (as current)

?

@toolstack
Copy link
Owner

Perhaps, but I really want to rewrite the regex processor anyway at some point so don't want to put a whole lot of effort into playing whack a mole with the existing one.

@toolstack toolstack added the enhancement New feature or request label Apr 9, 2024
@toolstack toolstack added this to the Future release milestone Apr 9, 2024
@kuiperzone
Copy link
Author

If we cannot have a robust URL detector, can we at least disable the auto URL detection. I don't actually see it as useful, but understand others might.

I guess this is content specific, but if you are keeping IT related notes than more than half of "URLs" will be false positives.

It falsely detects the following as URLs:

ISO times
filenames
version numbers
namespaces
Anything with a period in it

@toolstack
Copy link
Owner

I was thinking of adding a disable url detection option, so I don't see that being an issue.

Unfortunately url detection is a messy business in free form text, so it's never going to be perfect.

I might do a three way selector; aggressive, strict, disable.

Strict would require a proper protocol part to exist (aka https://, etc.) before it was identified as a url.

@kuiperzone
Copy link
Author

kuiperzone commented Jun 3, 2024

Good stuff!

Aggressive and strict? That doesn't sound fun. I'll be selecting disabled then. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants