Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No mention of catastrophic backtracking #111

Open
davisjam opened this issue Feb 9, 2018 · 4 comments
Open

No mention of catastrophic backtracking #111

davisjam opened this issue Feb 9, 2018 · 4 comments

Comments

@davisjam
Copy link

davisjam commented Feb 9, 2018

Regular expressions can be vulnerable to Regular Expression Denial of Service (ReDoS). Snyk.io has a good writeup, and the .NET docs have a thorough treatment as well (1, 2).

Catastrophic backtracking affects nearly every major language, including perl, ruby, java, javascript, python, C#, C++-11, and PHP. A guide to regular expressions is incomplete without a warning about ReDoS.

@hoangnam2261
Copy link

hoangnam2261 commented Jul 18, 2019

hi @davisjam,
I know this comment is too late from the time you created this issue. I am very happy when i know in this repository there is a person who aware of this vulnerability of Regular Expression Denial of Service (ReDoS).
I have a question that need your help. This is a simple regex: ^[01]([-]*[01]+)*$
And this is a simple sample can cause Catastrophic backtracking: 1010101010101001010&
I tried many ways to convert that regex to another form to gain the same meaning of the origin one but can avoid Catastrophic backtracking. Unfortunately, i can't reach.
Please help me with this.
I am looking to hearing response from you and everyone.
Thanks a lot,

@davisjam
Copy link
Author

@ziishaned I can prepare a PR with a small section about backtracking if you would be interested. Let me know.

@davisjam
Copy link
Author

@hoangnam2261

To avoid the backtracking, you need to give the regex engine clear boundary points that it won't backtrack past. For example, this regex is vulnerable: (a+)+$, but this one is safe: (ba+)+ (because every time the regex engine sees a "b", it knows it can't re-use the a's before it in another path).

In your example, the problematic piece is (-*[01]+)*$. Let's see if we can repair it. Observe that the backtracking occurs because the regex engine isn't sure whether it should treat a sequence of [01]'s as "A group of [01]s (not) preceded by a -" or as "Multiple clauses of such groups" (sorry, that probably didn't make sense).

Since what you want is a sequence of [01] delimited by dashes, possibly with 0 dashes, would this regex work instead?

^[01]+([-]+[01]+)*$

Note I added a + to the first group (since it is always required) and I made the [-] non-optional.

@hoangnam2261
Copy link

hoangnam2261 commented Jul 18, 2019

Thank you so much @davisjam,
After reading your comment carefully, my mind become clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants