Skip to content

Python: Add Regular Expression Injection query #5442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
May 11, 2021

Conversation

jorgectf
Copy link
Contributor

@jorgectf jorgectf commented Mar 18, 2021

Any suggestions/tricks to improve the query or workarounds (to learn), no matter how minimal they are, are massively appreciated.

@erik-krogh
Copy link
Contributor

Just a small comment on naming.

There are currently ReDoS queries implemented for C# and JavaScript.
Both of these look for expensive regular expressions running on user input.

Your query looks for regular expressions being constructed from user input.
There is a similar query implemented for JavaScript, which is called Regular expression injection, and I think that name better describes what your query does.
And it would make the naming consistent across languages.

@jorgectf
Copy link
Contributor Author

Thanks for the detail @erik-krogh! I've updated everything according to Regular Expression Injection instead of Regular Expression Denial of Service.

@yoff
Copy link
Contributor

yoff commented Mar 18, 2021

Regarding the structure we now try to follow:

  • RegexInjectionSink might be added to Concepts.qll (and called something like RegexExecution).
  • EscapeSanitizer might be moved to Concepts.qll (and called something like RegexEscape)
  • The configuration should be moved to its own file (see e.g. StackTraceExposure.ql and StackTraceExposure.qll)

@yoff
Copy link
Contributor

yoff commented Mar 18, 2021

If the sanitiser is moved to concepts, it might have to be made a bit more precise, but I am not sure. Is there an awful lot of ways to escape?

@jorgectf
Copy link
Contributor Author

If the sanitiser is moved to concepts, it might have to be made a bit more precise, but I am not sure. Is there an awful lot of ways to escape?

As far as I know, re.escape is the only already-built function that escapes the regular expression, but the actual Sanitizer may also work for custom-made escape functions (I guess it'd very strange but could happen). Should it be reduced only to re.escape?

@jorgectf
Copy link
Contributor Author

jorgectf commented Mar 18, 2021

RegexInjectionSink might be added to Concepts.qll (and called something like RegexExecution).

Shall I include DirectRegex and CompiledRegex into Concepts.qll too?

@jorgectf jorgectf changed the title Python: Add ReDoS query Python: Add Regular Expression Injection query Mar 18, 2021
@yoff
Copy link
Contributor

yoff commented Mar 18, 2021

As far as I know, re.escape is the only already-built function that escapes the regular expression, but the actual Sanitizer may also work for custom-made escape functions (I guess it'd very strange but could happen). Should it be reduced only to re.escape?

In that case, we should probably limit it for now. We can then expand the concept if we see false positives due to custom escape mechanisms.

@yoff
Copy link
Contributor

yoff commented Mar 18, 2021

RegexInjectionSink might be added to Concepts.qll (and called something like RegexExecution).

Shall I include DirectRegex and CompiledRegex into Concepts.qll too?

We would actually have those predicates that populate the concepts scattered around in the models of the libraries where the concepts occur. So in this case, you identify some regex execution in the re module in the standard library. So it should go into a module named re inside Stdlib.qll. We can then later add other regex executions e.g. from the regex library in Regex.qll next to Stdlib.qll.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned we will have to put it all into experimental. I will add the required files and folders to main, then you can just pull and move things around.
We also still need qldocs on public methods, but that is a smaller thing.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I did not explain the idea behind getRegexNode clearly. I thought I had described how it would change the definition of sinks, but clearly those thoughts did not make it out of my mind. I have given the relevant modifications; this should also make the query compile again :-)

@jorgectf
Copy link
Contributor Author

jorgectf commented Mar 24, 2021

Absolutely no problem @yoff! I made some tests yesterday and had to unset the override in RegexExecution's Range since DataFlow's Node has no Range class, but didn't manage to understand what to do with regexNode. Thank you so much for your awesome explanations! :D

@yoff
Copy link
Contributor

yoff commented Mar 24, 2021

#5493 has now been merged, so you have somewhere safe to put all the files :-)

@jorgectf jorgectf force-pushed the jorgectf/python/redos branch from 6b24752 to f4fe354 Compare March 24, 2021 17:00
@jorgectf jorgectf force-pushed the jorgectf/python/redos branch from 9e7e05b to 21e01b8 Compare April 27, 2021 17:55
Signed-off-by: jorgectf <jorgectf@protonmail.com>
@yoff yoff self-requested a review April 29, 2021 13:23
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final consistency polish :-)

Co-authored-by: yoff <lerchedahl@gmail.com>
@jorgectf
Copy link
Contributor Author

Final consistency polish :-)

Thanks 👍

yoff
yoff previously approved these changes Apr 29, 2021
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yoff yoff dismissed stale reviews from themself via 78370cf May 10, 2021 12:53
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needed to make the auto format check happy..

@jorgectf
Copy link
Contributor Author

Just needed to make the auto format check happy..

😆

@yoff yoff merged commit 0e5a2c4 into github:main May 11, 2021
@jorgectf
Copy link
Contributor Author

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants