Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantifiers add unnecessary non-capturing group #70

Closed
trezy opened this issue Apr 4, 2022 · 3 comments
Closed

Quantifiers add unnecessary non-capturing group #70

trezy opened this issue Apr 4, 2022 · 3 comments
Labels
bug Something isn't working compiler Related to the Melody compiler

Comments

@trezy
Copy link

trezy commented Apr 4, 2022

Describe the bug
When using some of ... with symbols, a non-capturing group is added unconditionally. This is unnecessary for most individual symbols (e.g. <word>).

To Reproduce
Steps to reproduce the behavior:

  1. Open the Melody Playground
  2. Write a program which only uses some of ... with a single symbol, e.g.:
    some of <word>;
    
  3. Review the output

Expected behavior
RegEx output should only add wrap things with non-capturing groups if necessary or if implicitly required (via match {}).

Examples

any of <word>;
// Outputs /(?:\w)*/

any of "a";
// Outputs /a*/

some of <word>;
// Outputs /(?:\w)+/

some of "a";
// Outputs /a+/

option of <word>;
// Outputs /(?:\w)?/

option of "a";
// Outputs /a?/
@trezy trezy added the bug Something isn't working label Apr 4, 2022
@yoav-lavi yoav-lavi added the compiler Related to the Melody compiler label Apr 4, 2022
@yoav-lavi
Copy link
Owner

This is an issue of the current implementation which checks for more than a single character (https://github.com/yoav-lavi/melody/blob/main/crates/melody_compiler/src/regex/utils.rs#3) which is always correct but not ideal, it should be changed to something that can identify sequences that can directly be quantified without grouping.

Thanks for the report!

@yoav-lavi
Copy link
Owner

yoav-lavi commented Apr 5, 2022

@trezy I think this should solve it, are you aware of any other cases that are directly quantifiable?

Original

pub fn wrap_quantified(value: String) -> String {
    let is_grouped = value.starts_with('(') && value.ends_with(')');
    if !is_grouped && value.chars().count() > 1 {
        format!("(?:{value})")
    } else {
        value
    }
}

Updated

pub fn wrap_quantified(value: String) -> String {
    if !directly_quantifiable(&value) {
        format!("(?:{value})")
    } else {
        value
    }
}

fn directly_quantifiable(value: &str) -> bool {
    let value_char_count = value.chars().count();

    // missing (and currently unsupported):
    // \p{...}
    // \P{...}
    // \xYY
    // \ddd
    // \uYYYY
    match value_char_count {
        // single char values
        1 => true,
        // escaped single char values
        2 => value.starts_with('\\'),
        // groups and character classes
        _ => match value.chars().next() {
            Some('(') => value.ends_with(')'),
            Some('[') => value.ends_with(']'),
            _ => false,
        },
    }
}

Note that this will only ever receive a single expression (i.e. not [a-z][0-9]).

This produces the correct output:

#[test]
fn directly_quantifiable() {
    let output = compiler(
        r#"
      5 of <word>;
      "#,
    );
    assert_eq!(output.unwrap(), r"\w{5}");
}

@yoav-lavi
Copy link
Owner

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler Related to the Melody compiler
Projects
None yet
Development

No branches or pull requests

2 participants