copr: support more regexp functions #13480

gengliqi · 2022-09-15T19:44:31Z

Signed-off-by: gengliqi gengliqiii@gmail.com

What is changed and how it works?

Issue Number: Close #13483

What's Changed:

Support `REGEXP_INSTR()`,`REGEXP_LIKE()`,`REGEXP_REPLACE()`,`REGEXP_SUBSTR()`.

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
Need to cherry-pick to the release branch

Check List

Tests

Unit test

Side effects

Performance regression
- Consumes more CPU
- Consumes more MEM
Breaking backward compatibility

Release note

Support more regular expression functions.

Signed-off-by: gengliqi <gengliqiii@gmail.com>

ti-chi-bot · 2022-09-15T19:44:32Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

breezewish
sticnarf

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: gengliqi <gengliqiii@gmail.com>

sticnarf · 2022-09-16T08:41:11Z

components/tidb_query_expr/src/impl_regexp.rs

+}
+
+fn get_match_type<C: Collator>(match_type: &[u8]) -> Result<String> {
+    let match_type = String::from_utf8(match_type.to_vec())?;


Better to iterate the &[u8] directly and match each byte. Then, we can avoid this allocation.

I use str::from_utf8 instead.

components/tidb_query_expr/src/impl_regexp.rs

sticnarf · 2022-09-16T10:07:03Z

components/tidb_query_expr/src/impl_regexp.rs

+        let count = expr.chars().count() as i64;
+        if (pos < 1 || pos > count) && !(count == 0 && pos == 1) {
+            return Err(box_err!("invalid regex pos: {}, count: {}", pos, count));
+        }
+        let mut new_expr = String::new();
+        for (i, c) in expr.chars().enumerate() {
+            if i as i64 >= pos - 1 {
+                new_expr += &c.to_string();
+            }
+        }
+        expr = new_expr;


We can use str::char_indices to get the byte index of the start. Then, use str::get_unchecked to get a sub-str.

Then, we don't iterate the string twice and don't create a new string.

Good point! Addressed.

sticnarf · 2022-09-16T10:08:04Z

components/tidb_query_expr/src/impl_regexp.rs

+    for (i, m) in regex.find_iter(&expr).enumerate() {
+        if i as i64 == occurrence - 1 {


What about just .skip(occurrence - 1).next()

I change to use nth.

sticnarf · 2022-09-16T10:10:36Z

The suggestions above also apply to the other functions (instr/replace).

Signed-off-by: gengliqi <gengliqiii@gmail.com>

sticnarf · 2022-09-16T12:52:09Z

components/tidb_query_expr/src/impl_regexp.rs

+            None => return Ok(None),
+        };
+
+        let count = expr.chars().count() as i64;


chars().count() also iterates over the whole string. I mean we can check pos >= 1 first, then if expr.char_indices().nth((pos - 1) as usize) returns None, we can also return error.

components/tidb_query_expr/src/impl_regexp.rs

Signed-off-by: gengliqi <gengliqiii@gmail.com>

breezewish

Looks fine according to the test case.

Signed-off-by: gengliqi <gengliqiii@gmail.com>

gengliqi · 2022-09-16T15:28:03Z

/merge

ti-chi-bot · 2022-09-16T15:28:04Z

@gengliqi: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-09-16T15:28:06Z

This pull request has been accepted and is ready to merge.

Commit hash: a9d9062

support regexp

c8940c1

Signed-off-by: gengliqi <gengliqiii@gmail.com>

ti-chi-bot added do-not-merge/needs-linked-issue release-note size/XXL labels Sep 15, 2022

gengliqi added the do-not-merge/work-in-progress label Sep 15, 2022

gengliqi added 6 commits September 16, 2022 03:49

Merge branch 'master' into support-regexp

f0db6e5

update tests

6f8c56d

Signed-off-by: gengliqi <gengliqiii@gmail.com>

remove unnecessary use

7e9f53f

Signed-off-by: gengliqi <gengliqiii@gmail.com>

update tests

e6c53b3

Signed-off-by: gengliqi <gengliqiii@gmail.com>

make format

6828ea3

Signed-off-by: gengliqi <gengliqiii@gmail.com>

fix test_regexp_instr

e3d647d

Signed-off-by: gengliqi <gengliqiii@gmail.com>

gengliqi requested review from sticnarf and wshwsh12 September 16, 2022 08:25

ti-chi-bot added do-not-merge/needs-triage-completed and removed do-not-merge/needs-linked-issue do-not-merge/work-in-progress labels Sep 16, 2022

Merge branch 'master' into support-regexp

56ab844

ti-chi-bot removed the do-not-merge/needs-triage-completed label Sep 16, 2022

remove r

db57108

Signed-off-by: gengliqi <gengliqiii@gmail.com>

sticnarf reviewed Sep 16, 2022

View reviewed changes

address comments

d5ee75d

Signed-off-by: gengliqi <gengliqiii@gmail.com>

sticnarf reviewed Sep 16, 2022

View reviewed changes

components/tidb_query_expr/src/impl_regexp.rs Outdated Show resolved Hide resolved

gengliqi added 4 commits September 16, 2022 21:12

address comments

35d01fc

Signed-off-by: gengliqi <gengliqiii@gmail.com>

reduce allocation

a66bc79

Signed-off-by: gengliqi <gengliqiii@gmail.com>

fix lint

60f55b9

Signed-off-by: gengliqi <gengliqiii@gmail.com>

use cow

61177c1

Signed-off-by: gengliqi <gengliqiii@gmail.com>

sticnarf approved these changes Sep 16, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Sep 16, 2022

gengliqi requested a review from breezewish September 16, 2022 14:52

breezewish approved these changes Sep 16, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Sep 16, 2022

gengliqi added 2 commits September 16, 2022 23:26

tiny refine

5beb3b0

Signed-off-by: gengliqi <gengliqiii@gmail.com>

Merge branch 'master' into support-regexp

a9d9062

ti-chi-bot added the status/can-merge Status: Can merge to base branch label Sep 16, 2022

ti-chi-bot merged commit bcfbd56 into tikv:master Sep 16, 2022

ti-chi-bot added this to the Pool milestone Sep 16, 2022

LittleFall mentioned this pull request Sep 17, 2022

expression: add pushdown flags of regexp functions to tikv pingcap/tidb#37893

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copr: support more regexp functions #13480

copr: support more regexp functions #13480

gengliqi commented Sep 15, 2022 •

edited

ti-chi-bot commented Sep 15, 2022 •

edited

sticnarf Sep 16, 2022

gengliqi Sep 16, 2022

sticnarf Sep 16, 2022

gengliqi Sep 16, 2022

sticnarf Sep 16, 2022

gengliqi Sep 16, 2022

sticnarf commented Sep 16, 2022

sticnarf Sep 16, 2022

gengliqi Sep 16, 2022

gengliqi Sep 16, 2022

breezewish left a comment •

edited

gengliqi commented Sep 16, 2022

ti-chi-bot commented Sep 16, 2022

ti-chi-bot commented Sep 16, 2022

		for (i, m) in regex.find_iter(&expr).enumerate() {
		if i as i64 == occurrence - 1 {

copr: support more regexp functions #13480

copr: support more regexp functions #13480

Conversation

gengliqi commented Sep 15, 2022 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot commented Sep 15, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sticnarf commented Sep 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breezewish left a comment • edited

Choose a reason for hiding this comment

gengliqi commented Sep 16, 2022

ti-chi-bot commented Sep 16, 2022

ti-chi-bot commented Sep 16, 2022

gengliqi commented Sep 15, 2022 •

edited

ti-chi-bot commented Sep 15, 2022 •

edited

breezewish left a comment •

edited