Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not <char> emits [^.] #84

Closed
Aloso opened this issue Jul 10, 2022 · 5 comments
Closed

not <char> emits [^.] #84

Aloso opened this issue Jul 10, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@Aloso
Copy link

Aloso commented Jul 10, 2022

Describe the bug
not <char> produces the incorrect output, since the dot does not match an arbitrary character within character set.

To Reproduce
Open this playground.

Expected behavior
An error message.

@Aloso Aloso added the bug Something isn't working label Jul 10, 2022
@yoav-lavi
Copy link
Owner

yoav-lavi commented Jul 14, 2022

Hey @Aloso, thanks!

Will hopefully be able to look into this soon. I don't imagine this has any actual use (wouldn't "not char" essentially be an empty string?) but it makes sense to return an error.

How's it going with Pomsky?

@Aloso
Copy link
Author

Aloso commented Jul 14, 2022

wouldn't "not char" essentially be an empty string?

Not quite. A character set always matches exactly one code point (exception: JavaScript regexes can also match a surrogate, which is one half of a UTF-16 code point that uses 4 bytes).

That means that [] does not match any code point, but it also does not match the empty string, because it expects one code point, but matching the code point never succeeds. So it actually matches nothing.

Other regex engines besides JavaScript do not allow []. In ERE and PCRE, []] matches a ] character, which does not need to be escaped with a backslash. A ] directly following the [ is just treated like \]. So if you want to match nothing in PCRE, you can't use [], you have to use [^\s\S].

The problem with translating not <char> to [] or [^\s\S] is that it ignores line breaks. The dot does not match every character, unless you enable multiline mode. Normally, it matches everything except for line breaks. So normally, not <char> should be compiled to \n, but in multiline mode, it should become [].

But I think it doesn't make sense to support that in Melody (or Pomsky), because nobody needs that. That's why I wrote that I'd expect an error message when I write not <char>.

@Aloso
Copy link
Author

Aloso commented Jul 14, 2022

How's it going with Pomsky?

Good, thank you.

I wrote you an email about that, which I guess you didn't receive. It's not super important, but I'll open an issue for it 🙂

@yoav-lavi
Copy link
Owner

yoav-lavi commented Jul 14, 2022

Thanks for the detailed explanation!

A lot of useful information there.

To clarify, when referring to an empty string I didn't specifically mean if using a character class:

It's true that Melody uses character classes for some instances of not, but that's more of an implementation detail. For instance not <digit> would be \D - So not <char> wouldn't necessarily have to be a class in this case.

It could theoretically translate to something like (?:(?=)) (excluding handling of multiline expressions)

That being said it doesn't seem like something that'd make sense to allow

@yoav-lavi
Copy link
Owner

Fixed, will be released soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants