Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find using regular expressions #658

Merged
merged 20 commits into from Jun 10, 2018

Conversation

@scholtzan
Copy link
Member

scholtzan commented May 16, 2018

Preliminary support of regular expressions when searching for text.

This depends on changes in #649

/// of the leave/start of the line.
/// If the regular expression can match multiple lines then all leaves are
/// consumed and matched against the regular expression. Otherwise only the
/// current leave is matched.

This comment has been minimized.

Copy link
@eyelash

eyelash May 16, 2018

Member

Did you mean leaf here? Also on line 228 three lines above.

This comment has been minimized.

Copy link
@scholtzan

scholtzan May 16, 2018

Author Member

Yes, right. Thanks!

scholtzan added 3 commits May 17, 2018
@scholtzan

This comment has been minimized.

Copy link
Member Author

scholtzan commented May 26, 2018

This is ready for review

@scholtzan scholtzan changed the title [WIP] Find using regular expressions Find using regular expressions May 29, 2018
Copy link
Member

raphlinus left a comment

A few comments - generally looks good, but I do have some concerns.

@@ -12,6 +12,7 @@ memchr = "2.0"
serde = "1.0"
serde_derive = "1.0"
unicode-segmentation = "1.2.1"
regex = "0.2"

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 29, 2018

Member

This seems like a very low version number. Any reason not 1.0?

@@ -45,6 +51,8 @@ pub enum CaseMatching {
/// reasonably well otherwise (it is currently defined in terms of the
/// `to_lowercase` methods in the Rust standard library).
CaseInsensitive,
/// Matching using search string as regular expression.
RegularExpression,

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 29, 2018

Member

Case insensitivity can be considered orthogonal to regexness, as regexes take a casei option. I'm wondering whether this accurately captures the intent.

CaseMatching::Exact
fn set_find(&mut self, search_string: &str, case_sensitive: bool, regex: bool) {
let case_matching = if regex {
CaseMatching::RegularExpression

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 29, 2018

Member

This is related to another comment - I'm wondering why we're throwing away case_sensitive in the regex case, because it is a very reasonable option to pass along to regex.

Ok(regex) => {
let mut text = String::new();

match cursor.get_leaf() {

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 29, 2018

Member

So this will work in the common case, but isn't fully correct - there's no guarantee that a line won't be fragmented between multiple leaves. It will definitely happen when lines are long (and if there's direct access to leaves, that case should probably be reflected in tests).

My sense of the best thing to do here is to use a lines iterator, which will abstract over the details of leaves, while being efficient and not allocating when line boundaries line up with leaf boundaries.

I think that means that you don't want to use the same Cursor for both string match and regex match; it's fine for string match because you can do a string match on a line fragment, so the efficiency gain in dealing with individual leaves is probably worth it.

This comment has been minimized.

Copy link
@scholtzan

scholtzan May 31, 2018

Author Member

Right, I didn't consider this. Rope already has methods to get a line iterator but I am not sure where the line iterator should come into play in the context of find. Should there be a Cursor that returns lines instead of leaves?

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 31, 2018

Member

I was thinking we could use lines_raw, which is probably going to be more efficient than a Cursor-based approach if it's strictly doing a forward scan. Is there some reason that won't work?

This comment has been minimized.

Copy link
@raphlinus

raphlinus May 31, 2018

Member

Thinking about this, one difference between Cursor and LinesRaw is that in the latter case the client will have to keep track of the offset. But that can be done by adding up line.len() as we go along.

I can also see how to make a Cursor work, but it would at the very least require duplicating some of the logic we already have implemented in LinesRaw.

This comment has been minimized.

Copy link
@scholtzan

scholtzan Jun 1, 2018

Author Member

I updated this pull request so that for regex search a line iterator is used. I am not sure if that is the best way to implement it.

@cmyr

This comment has been minimized.

Copy link
Member

cmyr commented Jun 5, 2018

@scholtzan what's the status of this? There are some CI failures that should get cleared up, but is it otherwise ready to go? I think we should get some of this backlog merged, but I'm not sure if this should go in before #687 & #688. This also needs a xi-mac companion, but I'll do that.

@scholtzan

This comment has been minimized.

Copy link
Member Author

scholtzan commented Jun 5, 2018

I will fix the tests. This should go in after #687 and maybe also after #688 (though it should be independent from that one).

@scholtzan

This comment has been minimized.

Copy link
Member Author

scholtzan commented Jun 7, 2018

One thing I noticed: if the highlights get updated while the search query is a multiline regex, then for larger text there is a noticable delay.
The reason for this is that all the highlights after the edit position get recalculated since they might have changed. I am not sure if there is a good way to solve this problem.

@cmyr

This comment has been minimized.

Copy link
Member

cmyr commented Jun 7, 2018

@scholtzan maybe if we're in regex mode the frontend should only send the query on return?

@scholtzan

This comment has been minimized.

Copy link
Member Author

scholtzan commented Jun 7, 2018

That would be one option. But, what should happen to the highlights of the matches during an edit (text inserted/deleted resulting in more/less matches)? They could just be removed, though I don't really like that approach.

@cmyr

This comment has been minimized.

Copy link
Member

cmyr commented Jun 7, 2018

Good point, I wasn't thinking about updating the highlights during a document edit.

This will be easier when search is incremental, and we can just clear the edited region immediately and schedule the runloop to re-search the rest of the document.

I'm not sure what our best bet is for now; it might be that in the regex case we want to clear the highlights while typing. Alternatively, we could go ahead and implement incremental search. There's definitely a performance issue on a debug build though, enough that i wonder if there isn't something else going on.

Copy link
Member

raphlinus left a comment

Generally looks good. See comment inside regarding additional performance improvement that should be possible by using Cow to avoid copies.

let mut text = String::new();

if is_multiline_regex(pat) {
// consume all of the text if regex is muli line matching

This comment has been minimized.

Copy link
@raphlinus

raphlinus Jun 7, 2018

Member

typo: "multi"

text.extend(lines);
} else {
match lines.next() {
Some(line) => text.push_str(&line),

This comment has been minimized.

Copy link
@raphlinus

raphlinus Jun 7, 2018

Member

This seems ok for now, but is a copy of every byte of the source text. What I'd prefer is that text is a Cow<str>, and that in the single line case, it just copies the Cow returned by the lines iterator (meaning that if it's a borrowed reference, it just copies the reference, not the actual bytes).

Since you're reusing the buffer, at least you're not going to get an allocation per line, only a copy, which is not too bad. I wouldn't be surprised if the cost of the regex evaluation dominated in most cases.

What I think I'm most comfortable with is accepting this code as-is, and having a future work item to avoid the copy based on Cow.

This comment has been minimized.

Copy link
@scholtzan

scholtzan Jun 7, 2018

Author Member

Okay, that sounds reasonable. I will make a separate PR for this later.

…t the end of the text
@scholtzan scholtzan mentioned this pull request Jun 7, 2018
19 of 20 tasks complete
@cmyr cmyr mentioned this pull request Jun 8, 2018
@scholtzan

This comment has been minimized.

Copy link
Member Author

scholtzan commented Jun 8, 2018

The regex is now cached in Find and gets only rebuilt when the search query changes. This makes find with regex a lot faster.

@cmyr
cmyr approved these changes Jun 10, 2018
Copy link
Member

cmyr left a comment

Great, performance is much better now. Will get this in shortly. :)

Copy link
Member

raphlinus left a comment

Great, thanks for this!

@cmyr cmyr merged commit 998b0c2 into xi-editor:master Jun 10, 2018
2 checks passed
2 checks passed
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@coin723

This comment has been minimized.

Copy link

coin723 commented Sep 17, 2018

It'd be a foolish question, but is it possible to search with regex on the editor which is built from the latest master branch? If so, how can I use in the find feature?
I'm using xi-mac.

@jansol

This comment has been minimized.

Copy link
Collaborator

jansol commented Sep 17, 2018

This is the wrong repo for that, but you can switch to regex search by clicking on the magnifying glass icon on the left side of the find bar.

@coin723

This comment has been minimized.

Copy link

coin723 commented Sep 17, 2018

Oh, thank you for saving me.
I apologize for bothering contributors here.

@cmyr

This comment has been minimized.

Copy link
Member

cmyr commented Sep 17, 2018

@coin723 no worries, the situation is a bit confusing, but xi-mac now lives in its own repo: https://github.com/google/xi-mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.