simplify commentRx expression, improve memory usage by the regexp engine #10

lidlanca · 2015-02-11T22:01:39Z

issue reported: thlorenz/mold-source-map#5

If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack.
I tested it with ~1gb of string.
please test cc: @greim

@greim

issue reported: thlorenz/mold-source-map#5 If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack. I tested it with ~1gb of string. please test cc: @greim

greim · 2015-02-11T22:09:01Z

index.js

@@ -2,7 +2,7 @@
 var fs = require('fs');
 var path = require('path');

-var commentRx = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,((?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?)(?:[ \t]*\*\/)?$/mg;
+var commentRx = (?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg;


regex needs opening slash :)

yes, it kinda does :)

thlorenz · 2015-02-11T22:44:42Z

@greim does this still work properly and solve the problem for you?

greim · 2015-02-11T23:20:05Z

Strangely, no this doesn't prevent the original error.

/Users/greim/project/node_modules/mold-source-map/index.js:9
  var m = source.match(/(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:applic
                 ^
RangeError: Maximum call stack size exceeded
    at String.match (native)
    at extractComment (/Users/greim/project/node_modules/mold-source-map/index.js:9:18)
    at new SourceMolder (/Users/greim/project/node_modules/mold-source-map/index.js:44:18)
    at exports.fromSource (/Users/greim/project/node_modules/mold-source-map/index.js:73:10)
    at Stream.end (/Users/greim/project/node_modules/mold-source-map/index.js:63:24)
    at _end (/Users/greim/project/node_modules/mold-source-map/node_modules/through/index.js:61:9)
    at Stream.stream.end (/Users/greim/project/node_modules/mold-source-map/node_modules/through/index.js:70:5)
    at Labeled.onend (/Users/greim/project/node_modules/browserify/node_modules/labeled-stream-splicer/node_modules/stream-splicer/node_modules/readable-stream/lib/_stream_readable.js:537:10)
    at Labeled.g (events.js:184:16)
    at Labeled.emit (events.js:119:20)

I tested by pasting the new regex directly into mold-source-map/index.js in my node_modules and re-running it.

lidlanca · 2015-02-13T02:55:03Z

@greim can you try the regexp with the isolated code you posted. and see if it fail for you too?

How many bytes and approximately how many comments are in the source file you are parsing?

greim · 2015-02-13T03:00:58Z

The isolation code ran the new regex without error. Will check and get back regarding source file.

greim · 2015-02-13T03:33:25Z

source.length => 11135198
Buffer.byteLength(source) => 11137624
approx number of /* comments => 2196
approx number of // comments => 6790

In an isolated environment using same version of iojs, same source string, and same regex, there's no error. It only happens during my gulp build which is creating browserify bundles.

lidlanca · 2015-02-13T04:43:26Z

@greim, to explain why it is working in an isolated setup, my wild guess is that when you are adding browserify to the pipe, your process uses more memory, possibly to the point that the regexp engine is not able to allocate the necessary memory for the stack.

The regexp stack is dynamically allocated, it's start as 1mb and grows exponentially by factor of 2.
if there is no more memory in the process heap for the stack, it will raise that error.

I pushed the limit on my machine testing that regexp and it should not have failed on 10mb of string input. or even 9000 matches.

greim · 2015-02-13T05:05:32Z

In my isolation code, I can use up a bunch of memory and still not get an error.

var fs = require('fs')
var arr = []
for (var i=0; i<50000000; i++){
  arr.push(Math.random())
}
var patt = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg;
var source = fs.readFileSync('./source.js', 'utf8')
for (var i=0; i<100; i++){
  source.match(patt)
}

If I push that number much higher than 50000000 it fails with process out of memory, so it's getting really close to the limit and not failing. It feels like something weird is going on with v8, especially since 0.12 works fine. But of course I have no proof for that.

greim · 2015-02-13T05:22:40Z

I finally found a variation of the regex that works.

var works = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(..*)$/mg
var fails = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg

That's weird. I don't know enough regex theory to know whether that's intended or surprising in any way, or indicative of a v8 bug or anything :/

lidlanca · 2015-02-13T06:55:45Z

You can read about v8 irregexp engine a bit here.
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

The main point is that the engine optimize the expression to a point the any slight change could produce a different execution plan, some more efficient some less.

glad you got something that works for you.

thlorenz · 2015-02-19T03:04:55Z

Not merging as is since it didn't solve the problem.
@greim if you want to submit your solution as a PR (assuming all tests pass) please do so.

greim · 2015-02-19T03:50:01Z

See: #11

On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error: ``` RangeError: Maximum call stack size exceeded at String.match (native) ``` It happens only on the second pass while karma is working (auto watching and reruns the specs). For more details read this issue: nodejs/node#759 Similar issues: thlorenz/convert-source-map#10 thlorenz/convert-source-map#11 Looks like this related to optimizations in V8 regex engine: http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.

greim reviewed Feb 11, 2015
View reviewed changes

Update index.js

5c3aabf

make it so it pass test and minor optimization

bd4d9f2

wavded mentioned this pull request Feb 18, 2015

RangeError: Maximum call stack size exceeded gulp-sourcemaps/gulp-sourcemaps#73

Closed

thlorenz closed this Feb 19, 2015

dmitry mentioned this pull request Jun 30, 2015

Prevent from RangeError exceptions on large data demerzel3/karma-sourcemap-loader#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify commentRx expression, improve memory usage by the regexp engine #10

simplify commentRx expression, improve memory usage by the regexp engine #10

lidlanca commented Feb 11, 2015

greim Feb 11, 2015

lidlanca Feb 11, 2015

thlorenz commented Feb 11, 2015

greim commented Feb 11, 2015

lidlanca commented Feb 13, 2015

greim commented Feb 13, 2015

greim commented Feb 13, 2015

lidlanca commented Feb 13, 2015

greim commented Feb 13, 2015

greim commented Feb 13, 2015

lidlanca commented Feb 13, 2015

thlorenz commented Feb 19, 2015

greim commented Feb 19, 2015

simplify commentRx expression, improve memory usage by the regexp engine #10

simplify commentRx expression, improve memory usage by the regexp engine #10

Conversation

lidlanca commented Feb 11, 2015

greim Feb 11, 2015

Choose a reason for hiding this comment

lidlanca Feb 11, 2015

Choose a reason for hiding this comment

thlorenz commented Feb 11, 2015

greim commented Feb 11, 2015

lidlanca commented Feb 13, 2015

greim commented Feb 13, 2015

greim commented Feb 13, 2015

lidlanca commented Feb 13, 2015

greim commented Feb 13, 2015

greim commented Feb 13, 2015

lidlanca commented Feb 13, 2015

thlorenz commented Feb 19, 2015

greim commented Feb 19, 2015