Skip to content

[xi-rope][find][question] Allocations during multiline regex search are hurting performance. #1192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ngortheone opened this issue May 11, 2019 · 4 comments

Comments

@ngortheone
Copy link
Contributor

ngortheone commented May 11, 2019

@cmyr @raphlinus
I'm writing a parser for Org markup and I use xi-rope as underlying data structure.

There are numerous cases when I need to do a multilne regex search/match. Sometimes repeatedly. (Examples: poperty drawers are recognized by multiline regex)

In case of multiline regex search whole text of the rope is allocated into a temporary string.
https://github.com/xi-editor/xi-editor/blob/master/rust/rope/src/find.rs#L290

    if is_multiline_regex(pat) {
        // consume all of the text if regex is multi line matching
        text = Cow::from(String::from_iter(lines));
    } else {
        match lines.next() {
            Some(line) => text = line,
            _ => return None,
        }
    }

With big files and repeated multiline searches this really becomes a huge tax on performance.
I understand that this is due to the nature of the data structure at hand -the text stored in a set of chunks that are not connected to each other.

There are 2 questions I have:

  • Is it possible to avoid/minimize allocations for multiline regex searches? I though about a buffer for storing allocated Cow::from(String::from_iter(lines)) to make it available for subsequent searches. But if I have to maintain a plain string alongside rope - what is the benefit of using a rope? That leads me to the next question
  • Is rope a good fit for a parsing algorithm? I suspect that having simple &str would avoid any allocations altogether. And maybe I made a big mistake by using rope for a parser. In the beginning it seemed that having fast updates on the rope as user types would be a good thing to have. But now it seems that this benefit is negligible next to the allocation problem...
    I'd really appreciate your thoughts and comments.
@ngortheone ngortheone changed the title [xi-rope][find][question] Allocations durin multiline regex search are hurting performance. [xi-rope][find][question] Allocations during multiline regex search are hurting performance. May 11, 2019
@ngortheone
Copy link
Contributor Author

ngortheone commented May 13, 2019

I will post the essence of my conversation with @cmyr and @raphlinus on zulip chat.

This is a known downside of the Rope data structure. By being a non-contiguous data storage it is not directly compatible with existing tools like regex engine. Regex engine operates on String, so allocation to a contiguous memory area is necessary.

Possible workarounds:

  • Do not use regexes
  • Teach Regex engine how to work with non-contiguous data storage (rope is only one of many) using Cursor API for example

@cmyr
Copy link
Member

cmyr commented May 14, 2019

@ngortheone Should we keep this open? It's not clear that it's really actionable for us at this point.

@ngortheone
Copy link
Contributor Author

@cmyr I don't think this is actionable at the moment.
This question may come up again, so it is worth adding it to FAQ of some sort

@ngortheone
Copy link
Contributor Author

I will leave this here for reference
rust-lang/regex#425

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants