Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Serious performance regression in v1.1.1 #34

Closed
KieranP opened this Issue Oct 4, 2013 · 11 comments

Comments

Projects
None yet
4 participants

KieranP commented Oct 4, 2013

I upgrade rails_autolink the other day to 1.1.3, and suddenly, pages using the autolink functionality went from loading in 2s to never loading, consuming 100% CPU power, and steadily increasing in memory. I didn't catch it before it went into production, and suddenly 15 passenger processes are using 100% CPU over 4 cores and memory going up to 3GB into swap.

I downgraded to version 1.1.0 and the issue disappeared. I upgraded to v1.1.1 and the problem reappeared.

Now there is only really one commit between the release of 1.1.0 and the release of 1.1.1: 8dd7191 , so I'm fairly certain this new REGEXP is the cause of the speed decrease...

I'm happy to stay at 1.1.0 for now, but I'd recommend reverting that change since it is pretty crippling.

KieranP commented Oct 15, 2013

@xuanxu @andresbravog Any progress on this? The new regexp is causing some serious performance issues.

Contributor

andresbravog commented Oct 15, 2013

@KieranP can you give me more feedback:

  • Ruby version
  • Type of pages are you working with

Are this problem appearing just in your production env?
Have your application unitary testing for this feature so you can measure regression time?

Collaborator

xuanxu commented Oct 15, 2013

I did some benchmarks on ruby 2.0 and couldn't find any significant difference

KieranP commented Oct 15, 2013

  • Ruby version ruby-2.0.0-p247
  • Rails version rails-3.2.14
  • rails_autolink version 1.1.1+
  • Environment - development and production
  • Project page with up to 100's of comments on the page, each that runs through rails_autolink

Testing in our app didn't pick up this problem, but probably because tests only had 1 or 2 comments on the project, not 100's. It seems the more comments and the more variety in those comments the longer autolink takes. Sometimes (not on every project, but consistently on the same projects) it just locks up, 100% CPU, and memory climbs steadily. Is it possible for a regexp to recurse?

Contributor

andresbravog commented Oct 16, 2013

If no regression comes up with singular test sounds like a ruby regexp malfunction with some texts like http://bugs.ruby-lang.org/issues/5149 , maybe you can find the exactly text failing and report it to ruby core.

Without knowing anything about your application, shouldn't you store those comments after running rails_autolinks so you don't have to process it in each page load?

cheers

Contributor

mitio commented Oct 21, 2013

I can provide more info on this, as I just got bit by it. Here's an example text which takes nearly 0.5s on my machine when passed through auto_link:

'x@..............................'

The key seems to be the dots in the domain part. Here are more test results:

x@..............................  0.470000   0.000000   0.470000 (  0.471416)
x@...............................  0.750000   0.010000   0.760000 (  0.761386)
x@................................  1.230000   0.000000   1.230000 (  1.237689)
x@.................................  1.980000   0.000000   1.980000 (  2.007114)
x@..................................  3.210000   0.000000   3.210000 (  3.238421)
x@...................................  5.190000   0.000000   5.190000 (  5.230534)
x@....................................  8.490000   0.030000   8.520000 (  9.219805)
x@..................................... 13.630000   0.040000  13.670000 ( 14.814795)
x@...................................... 22.020000   0.050000  22.070000 ( 23.397128)
x@....................................... 35.620000   0.060000  35.680000 ( 37.329822)
x@........................................ 57.660000   0.140000  57.800000 ( 61.572949)

I'm testing this in a Rails 4.0 console, where I've done include ActionView::Helpers::TextHelper. The test code is:

Benchmark.bm do |bm|
  30.upto(40).each do |dots|
    email = "x@#{'.' * dots}"
    bm.report(email) { auto_link email }
  end
end

I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 2.1.0-preview1.

Contributor

andresbravog commented Oct 22, 2013

@mitio thx for your valuable information here

i'm now able to reproduce the error, it's something related on how ruby processes the email regex:

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/

as we can see with the script

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/

require 'benchmark'

Benchmark.bm do |bm|
  30.upto(40).each do |dots|
    email = "x@#{'.' * dots}"
    bm.report(email) { email.gsub(AUTO_EMAIL_RE, ' ') }
  end
end

giving the output:

x@..............................  0.420000   0.000000   0.420000 (  0.429706)
x@...............................  0.690000   0.000000   0.690000 (  0.686012)
x@................................  1.090000   0.000000   1.090000 (  1.095430)
x@.................................  1.780000   0.000000   1.780000 (  1.786115)
x@..................................  2.870000   0.010000   2.880000 (  2.881852)
x@...................................  4.690000   0.000000   4.690000 (  4.696742)
x@....................................  7.680000   0.020000   7.700000 (  7.719964)
x@..................................... 12.290000   0.010000  12.300000 ( 12.319188)
x@...................................... 19.880000   0.020000  19.900000 ( 19.928903)
x@....................................... 32.350000   0.050000  32.400000 ( 32.483979)
x@........................................ 51.900000   0.080000  51.980000 ( 52.099437)

i'm working on finding a better 'ruby friendly' regex

I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 1.9.3

Contributor

andresbravog commented Oct 22, 2013

I think I fixed the issue, with the regex:

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]#{AUTO_EMAIL_LOCAL_RE}{0,*}@[\w-]+(?:\.[\w-]+)+/

getting better results:

       user     system      total        real
x@..............................  0.000000   0.000000   0.000000 (  0.000171)
x@...............................  0.000000   0.000000   0.000000 (  0.000008)
x@................................  0.000000   0.000000   0.000000 (  0.000012)
x@.................................  0.000000   0.000000   0.000000 (  0.000007)
x@..................................  0.000000   0.000000   0.000000 (  0.000004)
x@...................................  0.000000   0.000000   0.000000 (  0.000004)
x@....................................  0.000000   0.000000   0.000000 (  0.000004)
x@.....................................  0.000000   0.000000   0.000000 (  0.000007)
x@......................................  0.000000   0.000000   0.000000 (  0.000007)
x@.......................................  0.000000   0.000000   0.000000 (  0.000006)
x@........................................  0.000000   0.000000   0.000000 (  0.000004)

I'll create a pull request

mitio added a commit to mitio/rails_autolink that referenced this issue Oct 22, 2013

Fix the email regexp CPU usage for odd inputs
Fixes #34

Some inputs could make the Regexp engine in Ruby to take up a lot of
CPU time in order to check for a match of the email string. This change
adds a test to guard against an example of such an input and changes
the email regexp so that all the tests pass and so that no excessive
CPU usage occurs.
Contributor

mitio commented Oct 22, 2013

@andresbravog I just sent a PR with the change plus a timeout test for these inputs.

I based the value of AUTO_EMAIL_RE on your suggestion but changed it a bit so that tests pass. Is there any particular reason you added {0,*} as a quantifier of AUTO_EMAIL_LOCAL_RE instead of just *?

Contributor

andresbravog commented Oct 22, 2013

haha you was faster than me, just working on testing it into different ruby versions. It's 👍 for me

@xuanxu xuanxu closed this in #36 Oct 22, 2013

Collaborator

xuanxu commented Oct 22, 2013

Thank you both @andresbravog & @mitio!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment