Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core.clj/parse: regexp bomb on invalid input #1

Closed
TomiTakussaari opened this issue Oct 5, 2015 · 0 comments
Closed

core.clj/parse: regexp bomb on invalid input #1

TomiTakussaari opened this issue Oct 5, 2015 · 0 comments

Comments

@TomiTakussaari
Copy link

Hi!

We have application that, among other things, uses this library to parse end user input.

Because endusers are what they are, this input sometimes contains invalid data, like email addresses, for reasons unknown.

Trying to parse such an "address" causes regexp bomb in our application, taking very long time to process and eventually leaving server unresponsive once several such requests have come in.

As this library is meant for handling address parsing, it would be great if it could handle invalid input data more gracefully, either by rejecting invalid input altogether, or by using more efficient regexp for parsing it.

I believe problem is in rather complex regexp in reg.clj/street, but did not (atleast yet) go and actually see what it does.

Test case to demonstrate problem:

diff --git a/test/jhs_106/core_test.clj b/test/jhs_106/core_test.clj
index c6fbeb6..4b9211c 100644
--- a/test/jhs_106/core_test.clj
+++ b/test/jhs_106/core_test.clj
@@ -547,6 +547,9 @@
                    :stairway "\u00C4"
                    :apartment "13\u00F6"}} (simple-parse "Gregorius IX:n tie 12-14 rak. 2 \u00C4 13\u00F6"))))

+(deftest parse-should-handle-invalid-input-without-eternal-loop
+  (parse "This is email address with some text lappeenranta@virhe.fi"))
+
 (deftest should-unabbreviate-streetname
   (doseq [v abbreviations]
     (is (= (str "Rosvo" (name (key v))) (unabbreviate (str "Rosvo" (val v)))) (str (val v) " => " (name (key v))))

Lein test takes very, very long time to execute (so long that I did not have time to wait for it and killed it after 10 minutes) on 2.3Ghz Intel Core i7.

Thread dump:

"main" #1 prio=5 os_prio=31 tid=0x00007fba05002000 nid=0x1303 runnable [0x000000010c581000]
   java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4602)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4794)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4801)
    at java.util.regex.Pattern$Prolog.match(Pattern.java:4741)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Matcher.match(Matcher.java:1270)
    at java.util.regex.Matcher.matches(Matcher.java:604)
    at clojure.core$re_matches.invoke(core.clj:4424)
    at jhs_106.core$street_line.invoke(core.clj:35)
    at jhs_106.core$parse.invoke(core.clj:92)
    at jhs_106.core_test$fn__671.invoke(core_test.clj:551)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant