Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-pattern regular expressions #496

Open
krizhanovsky opened this issue May 22, 2016 · 3 comments · May be fixed by #2161
Open

Multi-pattern regular expressions #496

krizhanovsky opened this issue May 22, 2016 · 3 comments · May be fixed by #2161

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented May 22, 2016

Tempesta FW core must implement multi-pattern regular expressions to efficiently handle HTTP matching rules for filtering and configuration (see for example #471, #495, #530, #1544 with many ignored headers matching in #1550 for caching). Intel HyperScan can be used as reference or foundation for the feature.

ReDoS must be considered by the implementation. It seems limited or fully prohibited back and forward referencing and resource consumption in sense of #488 .

Should be done close or together with #732, since simple multi-pattern is a sub-task of multi-pattern regexps.

Since Tempesta FW deals with fields of parsed HTTP messages, in general we need (1) relatively simple regular expressions for (2) relatively short strings. E.g.

location ~ ^/(/category/foo/|dddd|ccccc|vvvv|aaaa)/
hdr "Referer" == "*.tempesta-tech.com/*"  -> base;

In most cases simple multipattern prefix/suffix is enough. Definitely no need for PCRE. However, there could be tens of location rules with simple regexps, so multi-pattern regexps still make sense.

The only functionality requiring relatively large input data (up to tens kilobytes and hundreds bytes in average) and complex regexps is WAF filtering rules against User-Agent, URI, Cookie or other headers values.

These two cases must be separated:

  1. a simple multi-patter string search (e.g. Comentz-Walter or a SIMD algorithm) with begind/end bindings
  2. multipattern regexps, e.g. with runtime ported from Hyperscan (done in https://github.com/G-Core/linux-regex-module)
@krizhanovsky krizhanovsky added this to the 0.6 OS milestone May 22, 2016
@krizhanovsky krizhanovsky modified the milestones: backlog, 1.0 Web Operating System Jan 15, 2018
@krizhanovsky krizhanovsky modified the milestones: 0.5 alpha, 1.0 Tempesta OS Feb 5, 2018
@krizhanovsky krizhanovsky modified the milestones: 1.6 muti-pattern strings search, 1.3 Web server Nov 22, 2018
@krizhanovsky krizhanovsky mentioned this issue Dec 27, 2021
2 tasks
@krizhanovsky krizhanovsky modified the milestones: 1.3 TBD( Web server & advanced strings), 1.2 TBD Jan 3, 2022
@krizhanovsky krizhanovsky modified the milestones: 1.1: TBD, 0.9 - LA Apr 18, 2024
@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Apr 18, 2024

Let's just integrate with hyperscan for now - hyperscan should be good for simple patterns.

Also need to extend the tests for HTTPtables to use the regular expressions.

@krizhanovsky
Copy link
Contributor Author

I forked the repo https://github.com/tempesta-tech/linux-regex-module . The discussed TODO is

  1. make the repo work with the current 5.10 or next 6.8
  2. adjust tempesta.sh to laod the module
  3. work with @RomanBelozerov on TODO (new) issues for packaging and CI
  4. adjust locations and httptables code to work with regexes

@biathlon3
Copy link
Contributor

biathlon3 commented Jul 16, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants