Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost regex engine #725

Open
zufuliu opened this issue Oct 12, 2023 · 14 comments
Open

Boost regex engine #725

zufuliu opened this issue Oct 12, 2023 · 14 comments

Comments

@zufuliu
Copy link
Owner

zufuliu commented Oct 12, 2023

See PR #722, @atauzki 👍 is working on integrating Boost regex , after the changes are merged most if not all regex issues should be fixed.

At the end our code will have three regex engines:

defined preprocessors regex engine
BOOST_REGEX_STANDALONE Scintilla's simple POSIX regex plus Boost regex
NO_CXX11_REGEX Scintilla's simple POSIX regex (current build configuration)
none Scintilla's simple POSIX regex plus C++ STL std::regex
@zufuliu zufuliu added ✅enhancement 🌐i18n Localization/Internationalization ui labels Oct 12, 2023
@zufuliu zufuliu pinned this issue Oct 12, 2023
@zufuliu zufuliu added the regex label Oct 13, 2023
@zufuliu zufuliu added this to the v4.24.01 milestone Oct 13, 2023
@zufuliu
Copy link
Owner Author

zufuliu commented Oct 13, 2023

Some TODOs:

@atauzki
Copy link
Contributor

atauzki commented Oct 14, 2023

Another suggestion:
zero-width match's hints should be improved like Notepad3:

image

@zufuliu
Copy link
Owner Author

zufuliu commented Oct 15, 2023

following is some performance test results (match count and time in millisecond) for attached JSON file (produced by expand.py in the zip for Visual Studio 2022 instalation catalog.json) with commit 38be0ce. As such I'm going to remove SCI_OWNREGEX build configuration (still needs time to improve the speed).
re-test-1015.zip

regex RESearch std::wregex std::regex boost::wregex boost::regex
\w+ 1434315, 315 1436523, 7636 1423835, 4372 1436523, 2035 1501396, 800
[a-zA-Z0-9_]+ 1423835, 331 1423835, 7654 1423835, 4386 1423835, 2855 1423835, 777
\d+ 1028016, 280 1028016, 6470 1028016, 6475 1028016, 2050 1028016, 739
[0-9]+ 1028016, 286 1028016, 6475 1028016, 6218 1028016, 2044 1028016,725
\s+ 895401, 252 895945, 6151 895403, 5972 895917, 1883 911355, 662
[ \t]+ 895401, 254 895401, 6375 895401, 6200 895401, 2935 895401, 678
^[ \t]+ 440216, 92 440216, 846 440216, 724 440216, 465 440216, 234
[ \t]+$ 0, 154 0, 6492 0, 6324 0, 575 0, 84

@lenny20
Copy link

lenny20 commented Jan 14, 2024

今天发布的版本有没有包含Boost regex ??我看替换对话框没啥变化哦。

@zufuliu
Copy link
Owner Author

zufuliu commented Jan 14, 2024

今天发布的版本有没有包含Boost regex

Just download latest builds from boost regex branch, e.g. https://github.com/zufuliu/notepad2/actions/runs/7517811166

@zufuliu
Copy link
Owner Author

zufuliu commented Mar 29, 2024

Win32 build with boost::regex (depends on SleepConditionVariableSRW() and WakeAllConditionVariable()) or std::regex (depends on InitializeCriticalSectionEx()) doesn't run on XP.

@vvyoko
Copy link

vvyoko commented Apr 18, 2024

请问下boost::regex是不支持匹配\pP这种属性匹配吗
更多测试属性 正则表达式-匹配标点符号

另外匹配的(pattern)目前是用 \1,\2引用
将来会考虑用常用的$1,$2代替吗

@atauzki
Copy link
Contributor

atauzki commented Apr 18, 2024

另外匹配的(pattern)目前是用 \1,\2引用
将来会考虑用常用的$1,$2代替吗

boost本身支持,但是现在的代码没有用这个实现,只加了个TODO注释

@zufuliu
Copy link
Owner Author

zufuliu commented May 9, 2024

Win32 build with boost::regex (depends on SleepConditionVariableSRW() and WakeAllConditionVariable()) or std::regex (depends on InitializeCriticalSectionEx()) doesn't run on XP.

This can be "fixed" by disabling thread-safe local static initialization with /Zc:threadSafeInit-:
https://learn.microsoft.com/en-us/cpp/build/reference/zc-threadsafeinit-thread-safe-local-static-initialization?view=msvc-170

The implementation of this feature relies on Windows operating system support functions in Windows Vista and later operating systems.

@atauzki
Copy link
Contributor

atauzki commented May 12, 2024

Another bug related to boost regex search:
if execute a zero-width match (eg: ^, $, \b) searching next/previous for multiple times, it just stucks at its original place from the second time.
Emeditor also has this bug but Notepad3 doesn't, I had no good idea working on this.

@atauzki
Copy link
Contributor

atauzki commented May 13, 2024

请问下boost::regex是不支持匹配\pP这种属性匹配吗
更多测试属性 正则表达式-匹配标点符号

libICU编译出来至少20-30M吧,代价太大。要支持这个功能可以用PCRE2,就看有没有这个计划了
图片

@zufuliu
Copy link
Owner Author

zufuliu commented May 13, 2024

libICU编译出来至少20-30M吧,代价太大。

Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-

@atauzki
Copy link
Contributor

atauzki commented May 13, 2024

libICU编译出来至少20-30M吧,代价太大。

Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-

it doesn't have icu namespace in it's icu.h, but boost uses icu's c++ api. And no C++ symbol exported in icu.dll.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants