Skip to content

Java: Add ReDoS queries #7723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 58 commits into from
May 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
a8f7a44
Port redos libraries from Python
joefarebrother Nov 16, 2021
37240f0
Copy Redos queries from python
joefarebrother Dec 6, 2021
59945cd
Add dataflow logic to PolynomialRedDoS
joefarebrother Dec 7, 2021
d04c99b
Support quote sequences
joefarebrother Jan 13, 2022
7530902
Add approximate support for nested character classes.
joefarebrother Jan 19, 2022
11e465f
Implement remaining syntax differences
joefarebrother Jan 19, 2022
f9f7a01
Add Java ReDoS libraries to identical-files.json
joefarebrother Feb 2, 2022
ca422a2
Use explicit `this`
joefarebrother Feb 2, 2022
8e19182
Add PrintAst support for regex terms
joefarebrother Feb 2, 2022
28649da
Add parser tests; fix some parser issues.
joefarebrother Feb 9, 2022
5b61de6
Implement style/doc suggestions from code review
joefarebrother Feb 9, 2022
e954db2
Convert snake case predicates to camel case
joefarebrother Feb 9, 2022
aa1337d
Apply style suggestions from code review
joefarebrother Feb 10, 2022
9e88c67
Add more test cases; make some fixes
joefarebrother Feb 10, 2022
bc10952
Simplify octal handling
joefarebrother Feb 10, 2022
e797d21
Topologically sort RegexString
joefarebrother Feb 10, 2022
dd200e2
Improve char set depth calculation
joefarebrother Feb 11, 2022
9f4da65
Improve calculation of locations of regex terms
joefarebrother Feb 11, 2022
4b845d5
Move test cases to their own directory to avoid conflict
joefarebrother Feb 11, 2022
457cf41
Support more escaped characters
joefarebrother Feb 14, 2022
5a4316d
Add test cases for exponential redos query
joefarebrother Feb 15, 2022
e23162d
Add test cases for PolynomialRedos dataflow logic; make fixes
joefarebrother Feb 15, 2022
91887ab
Sync shared files
joefarebrother Feb 16, 2022
5143585
Fix to PolynomialRedos not finding results and to test cases not find…
joefarebrother Feb 16, 2022
57ba8a4
Improve handling of hex escapes; and support some named character cla…
joefarebrother Feb 21, 2022
c312b4b
Add missing qldoc
joefarebrother Feb 22, 2022
5364001
Update docs to be about Java
joefarebrother Feb 22, 2022
3ce0c2c
Add more regex use functions in String
joefarebrother Mar 3, 2022
9bd3916
Add change note
joefarebrother Mar 3, 2022
2d96317
Fix change note
joefarebrother Mar 3, 2022
f5809a7
ReDoS performance fixes
smowton Mar 3, 2022
5ba6baf
Use occursInRegex more ccnsistently throughout
joefarebrother Mar 4, 2022
49374b8
Fix parsing of alternations in character classes
joefarebrother Mar 8, 2022
bb56264
Support possessive quantifiers, which cannot backtrack.
joefarebrother Mar 8, 2022
0a5268a
Sync shared library changes across languages.
joefarebrother Mar 8, 2022
5555985
Distingush between whether or not a regex is matched against a full s…
joefarebrother Mar 9, 2022
c1290d9
Sync shared redos library files.
joefarebrother Mar 9, 2022
6794268
Split PolynomialRedos definition into a library to avoid duplication …
joefarebrother Mar 16, 2022
04edc10
Exclude regexes from test code
joefarebrother Mar 16, 2022
1605d36
Refine polynomial redos sources to exclude length limited methods
joefarebrother Mar 16, 2022
375ded4
Move check to exlude test cases so that it also covers exponential redos
joefarebrother Mar 16, 2022
3d65a9c
Update shared files
joefarebrother Mar 16, 2022
522a8af
Fix filename case
joefarebrother Mar 17, 2022
0f606d9
Remove redundant `super` call.
joefarebrother Mar 23, 2022
0d13864
Restrict polynomial ReDoS' strings-parsed-as-regexes search to those …
smowton Mar 28, 2022
bc17d4b
Break the recursion between seqChild, RegExpTerm and TRegExpSequence
smowton Mar 28, 2022
e5ca924
Allow quantifiers invoving {}; add comments
joefarebrother Mar 29, 2022
4ed2e8d
Update tests to account for only regexes with quantifiers being consi…
joefarebrother Mar 29, 2022
5e3ba13
Add a test for deeply nested sequences
joefarebrother Mar 29, 2022
2a80540
Sync shared files
joefarebrother Apr 4, 2022
eec57d4
Simplify dataflow logic by using only one configuration, and expessin…
joefarebrother Apr 5, 2022
66ab2bc
Update PrintAst test output
joefarebrother Apr 5, 2022
b08f22c
Remove unnecassary import
joefarebrother Apr 6, 2022
b854a21
Fix use of `sinkModel`
joefarebrother Apr 7, 2022
9078e13
Apply reveiw suggestions
joefarebrother May 3, 2022
2d82dfb
Reorder backreference predicates
joefarebrother May 3, 2022
c7d3008
Fix issue with named backrefs; add needed import
joefarebrother May 4, 2022
64227c9
Fix codescanning alerts
joefarebrother May 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions config/identical-files.json
Original file line number Diff line number Diff line change
Expand Up @@ -475,20 +475,23 @@
"python/ql/lib/semmle/python/security/internal/SensitiveDataHeuristics.qll",
"ruby/ql/lib/codeql/ruby/security/internal/SensitiveDataHeuristics.qll"
],
"ReDoS Util Python/JS/Ruby": [
"ReDoS Util Python/JS/Ruby/Java": [
"javascript/ql/lib/semmle/javascript/security/performance/ReDoSUtil.qll",
"python/ql/lib/semmle/python/security/performance/ReDoSUtil.qll",
"ruby/ql/lib/codeql/ruby/security/performance/ReDoSUtil.qll"
"ruby/ql/lib/codeql/ruby/security/performance/ReDoSUtil.qll",
"java/ql/lib/semmle/code/java/security/performance/ReDoSUtil.qll"
],
"ReDoS Exponential Python/JS/Ruby": [
"ReDoS Exponential Python/JS/Ruby/Java": [
"javascript/ql/lib/semmle/javascript/security/performance/ExponentialBackTracking.qll",
"python/ql/lib/semmle/python/security/performance/ExponentialBackTracking.qll",
"ruby/ql/lib/codeql/ruby/security/performance/ExponentialBackTracking.qll"
"ruby/ql/lib/codeql/ruby/security/performance/ExponentialBackTracking.qll",
"java/ql/lib/semmle/code/java/security/performance/ExponentialBackTracking.qll"
],
"ReDoS Polynomial Python/JS/Ruby": [
"ReDoS Polynomial Python/JS/Ruby/Java": [
"javascript/ql/lib/semmle/javascript/security/performance/SuperlinearBackTracking.qll",
"python/ql/lib/semmle/python/security/performance/SuperlinearBackTracking.qll",
"ruby/ql/lib/codeql/ruby/security/performance/SuperlinearBackTracking.qll"
"ruby/ql/lib/codeql/ruby/security/performance/SuperlinearBackTracking.qll",
"java/ql/lib/semmle/code/java/security/performance/SuperlinearBackTracking.qll"
],
"BadTagFilterQuery Python/JS/Ruby": [
"javascript/ql/lib/semmle/javascript/security/BadTagFilterQuery.qll",
Expand Down
58 changes: 58 additions & 0 deletions java/ql/lib/semmle/code/java/PrintAst.qll
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
*/

import java
import semmle.code.java.regex.RegexTreeView

private newtype TPrintAstConfiguration = MkPrintAstConfiguration()

Expand Down Expand Up @@ -131,6 +132,9 @@ private newtype TPrintAstNode =
} or
TImportsNode(CompilationUnit cu) {
shouldPrint(cu, _) and exists(Import i | i.getCompilationUnit() = cu)
} or
TRegExpTermNode(RegExpTerm term) {
exists(StringLiteral str | term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _))
}

/**
Expand Down Expand Up @@ -163,6 +167,19 @@ class PrintAstNode extends TPrintAstNode {
*/
Location getLocation() { none() }

/**
* Holds if this node is at the specified location.
* The location spans column `startcolumn` of line `startline` to
* column `endcolumn` of line `endline` in file `filepath`.
* For more information, see
* [Locations](https://codeql.github.com/docs/writing-codeql-queries/providing-locations-in-codeql-queries/).
*/
predicate hasLocationInfo(
string filepath, int startline, int startcolumn, int endline, int endcolumn
) {
this.getLocation().hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn)
}

/**
* Gets the value of the property of this node, where the name of the property
* is `key`.
Expand Down Expand Up @@ -274,6 +291,47 @@ final class AnnotationPartNode extends ExprStmtNode {
}
}

/**
* A node representing a `StringLiteral`.
* If it is used as a regular expression, then it has a single child, the root of the parsed regular expression.
*/
final class StringLiteralNode extends ExprStmtNode {
StringLiteralNode() { element instanceof StringLiteral }

override PrintAstNode getChild(int childIndex) {
childIndex = 0 and
result.(RegExpTermNode).getTerm() = getParsedRegExp(element)
}
}

/**
* A node representing a regular expression term.
*/
class RegExpTermNode extends TRegExpTermNode, PrintAstNode {
RegExpTerm term;

RegExpTermNode() { this = TRegExpTermNode(term) }

/** Gets the `RegExpTerm` for this node. */
RegExpTerm getTerm() { result = term }

override PrintAstNode getChild(int childIndex) {
result.(RegExpTermNode).getTerm() = term.getChild(childIndex)
}

override string toString() {
result = "[" + strictconcat(term.getPrimaryQLClass(), " | ") + "] " + term.toString()
}

override Location getLocation() { result = term.getLocation() }

override predicate hasLocationInfo(
string filepath, int startline, int startcolumn, int endline, int endcolumn
) {
term.hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn)
}
}

/**
* A node representing a `LocalVariableDeclExpr`.
*/
Expand Down
1 change: 1 addition & 0 deletions java/ql/lib/semmle/code/java/dataflow/ExternalFlow.qll
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ private module Frameworks {
private import semmle.code.java.frameworks.jOOQ
private import semmle.code.java.frameworks.JMS
private import semmle.code.java.frameworks.RabbitMQ
private import semmle.code.java.regex.RegexFlowModels
}

private predicate sourceModelCsv(string row) {
Expand Down
193 changes: 193 additions & 0 deletions java/ql/lib/semmle/code/java/regex/RegexFlowConfigs.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
/**
* Defines configurations and steps for handling regexes
*/

import java
import semmle.code.java.dataflow.ExternalFlow
private import semmle.code.java.dataflow.DataFlow
private import semmle.code.java.dataflow.DataFlow2
private import RegexFlowModels
private import semmle.code.java.security.SecurityTests

private class ExploitableStringLiteral extends StringLiteral {
ExploitableStringLiteral() { this.getValue().matches(["%+%", "%*%", "%{%}%"]) }
}

/**
* Holds if `kind` is an external sink kind that is relevant for regex flow.
* `full` is true if sinks with this kind match against the full string of its input.
* `strArg` is the index of the argument to methods with this sink kind that contan the string to be matched against,
* where -1 is the qualifier; or -2 if no such argument exists.
*/
private predicate regexSinkKindInfo(string kind, boolean full, int strArg) {
sinkModel(_, _, _, _, _, _, _, kind, _) and
exists(string fullStr, string strArgStr |
(
full = true and fullStr = "f"
or
full = false and fullStr = ""
) and
(
strArgStr.toInt() = strArg
or
strArg = -2 and
strArgStr = ""
)
|
kind = "regex-use[" + fullStr + strArgStr + "]"
)
}

/** A sink that is relevant for regex flow. */
private class RegexFlowSink extends DataFlow::Node {
boolean full;
int strArg;

RegexFlowSink() {
exists(string kind |
regexSinkKindInfo(kind, full, strArg) and
sinkNode(this, kind)
)
}

/** Holds if a regex that flows here is matched against a full string (rather than a substring). */
predicate matchesFullString() { full = true }

/** Gets the string expression that a regex that flows here is matched against, if any. */
Expr getStringArgument() {
exists(MethodAccess ma |
this.asExpr() = argOf(ma, _) and
result = argOf(ma, strArg)
)
}
}

private Expr argOf(MethodAccess ma, int arg) {
arg = -1 and result = ma.getQualifier()
or
result = ma.getArgument(arg)
}

/**
* A unit class for adding additional regex flow steps.
*
* Extend this class to add additional flow steps that should apply to regex flow configurations.
*/
class RegexAdditionalFlowStep extends Unit {
/**
* Holds if the step from `node1` to `node2` should be considered a flow
* step for regex flow configurations.
*/
abstract predicate step(DataFlow::Node node1, DataFlow::Node node2);
}

// TODO: This may be able to be done with models-as-data if query-specific flow steps beome supported.
private class JdkRegexFlowStep extends RegexAdditionalFlowStep {
override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
exists(MethodAccess ma, Method m, string package, string type, string name, int arg |
ma.getMethod().getSourceDeclaration().overrides*(m) and
m.hasQualifiedName(package, type, name) and
node1.asExpr() = argOf(ma, arg) and
node2.asExpr() = ma
|
package = "java.util.regex" and
type = "Pattern" and
(
name = ["asMatchPredicate", "asPredicate", "matcher"] and
arg = -1
or
name = "compile" and
arg = 0
)
or
package = "java.util.function" and
type = "Predicate" and
name = ["and", "or", "not", "negate"] and
arg = [-1, 0]
)
}
}

private class GuavaRegexFlowStep extends RegexAdditionalFlowStep {
override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
exists(MethodAccess ma, Method m, string package, string type, string name, int arg |
ma.getMethod().getSourceDeclaration().overrides*(m) and
m.hasQualifiedName(package, type, name) and
node1.asExpr() = argOf(ma, arg) and
node2.asExpr() = ma
|
package = "com.google.common.base" and
type = "Splitter" and
(
name = "on" and
m.getParameterType(0).(RefType).hasQualifiedName("java.util.regex", "Pattern") and
arg = 0
or
name = "withKeyValueSeparator" and
m.getParameterType(0).(RefType).hasQualifiedName("com.google.common.base", "Splitter") and
arg = 0
or
name = "onPattern" and
arg = 0
or
name = ["limit", "omitEmptyStrings", "trimResults", "withKeyValueSeparator"] and
arg = -1
)
)
}
}

private class RegexFlowConf extends DataFlow2::Configuration {
RegexFlowConf() { this = "RegexFlowConfig" }

override predicate isSource(DataFlow::Node node) {
node.asExpr() instanceof ExploitableStringLiteral
}

override predicate isSink(DataFlow::Node node) { node instanceof RegexFlowSink }

override predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
any(RegexAdditionalFlowStep s).step(node1, node2)
}

override predicate isBarrier(DataFlow::Node node) {
node.getEnclosingCallable().getDeclaringType() instanceof NonSecurityTestClass
}
}

/**
* Holds if `regex` is used as a regex, with the mode `mode` (if known).
* If regex mode is not known, `mode` will be `"None"`.
*
* As an optimisation, only regexes containing an infinite repitition quatifier (`+`, `*`, or `{x,}`)
* and therefore may be relevant for ReDoS queries are considered.
*/
predicate usedAsRegex(StringLiteral regex, string mode, boolean match_full_string) {
any(RegexFlowConf c).hasFlow(DataFlow2::exprNode(regex), _) and
mode = "None" and // TODO: proper mode detection
(if matchesFullString(regex) then match_full_string = true else match_full_string = false)
}

/**
* Holds if `regex` is used as a regular expression that is matched against a full string,
* as though it was implicitly surrounded by ^ and $.
*/
private predicate matchesFullString(StringLiteral regex) {
exists(RegexFlowConf c, RegexFlowSink sink |
sink.matchesFullString() and
c.hasFlow(DataFlow2::exprNode(regex), sink)
)
}

/**
* Holds if the string literal `regex` is a regular expression that is matched against the expression `str`.
*
* As an optimisation, only regexes containing an infinite repitition quatifier (`+`, `*`, or `{x,}`)
* and therefore may be relevant for ReDoS queries are considered.
*/
predicate regexMatchedAgainst(StringLiteral regex, Expr str) {
exists(RegexFlowConf c, RegexFlowSink sink |
str = sink.getStringArgument() and
c.hasFlow(DataFlow2::exprNode(regex), sink)
)
}
32 changes: 32 additions & 0 deletions java/ql/lib/semmle/code/java/regex/RegexFlowModels.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/** Definitions of data flow steps for determining flow of regular expressions. */

import java
import semmle.code.java.dataflow.ExternalFlow

private class RegexSinkCsv extends SinkModelCsv {
override predicate row(string row) {
row =
[
//"namespace;type;subtypes;name;signature;ext;input;kind"
"java.util.regex;Matcher;false;matches;();;Argument[-1];regex-use[f]",
"java.util.regex;Pattern;false;asMatchPredicate;();;Argument[-1];regex-use[f]",
"java.util.regex;Pattern;false;compile;(String);;Argument[0];regex-use[]",
"java.util.regex;Pattern;false;compile;(String,int);;Argument[0];regex-use[]",
"java.util.regex;Pattern;false;matcher;(CharSequence);;Argument[-1];regex-use[0]",
"java.util.regex;Pattern;false;matches;(String,CharSequence);;Argument[0];regex-use[f1]",
"java.util.regex;Pattern;false;split;(CharSequence);;Argument[-1];regex-use[0]",
"java.util.regex;Pattern;false;split;(CharSequence,int);;Argument[-1];regex-use[0]",
"java.util.regex;Pattern;false;splitAsStream;(CharSequence);;Argument[-1];regex-use[0]",
"java.util.function;Predicate;false;test;(Object);;Argument[-1];regex-use[0]",
"java.lang;String;false;matches;(String);;Argument[0];regex-use[f-1]",
"java.lang;String;false;split;(String);;Argument[0];regex-use[-1]",
"java.lang;String;false;split;(String,int);;Argument[0];regex-use[-1]",
"java.lang;String;false;replaceAll;(String,String);;Argument[0];regex-use[-1]",
"java.lang;String;false;replaceFirst;(String,String);;Argument[0];regex-use[-1]",
"com.google.common.base;Splitter;false;onPattern;(String);;Argument[0];regex-use[]",
"com.google.common.base;Splitter;false;split;(CharSequence);;Argument[-1];regex-use[0]",
"com.google.common.base;Splitter;false;splitToList;(CharSequence);;Argument[-1];regex-use[0]",
"com.google.common.base;Splitter$MapSplitter;false;split;(CharSequence);;Argument[-1];regex-use[0]",
]
}
}
Loading