Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify commentRx expression, improve memory usage by the regexp engine #10

Closed
wants to merge 3 commits into from

Conversation

lidlanca
Copy link

issue reported: thlorenz/mold-source-map#5

If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack.
I tested it with ~1gb of string.
please test cc: @greim

issue reported: thlorenz/mold-source-map#5

If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack.
I tested it with ~1gb of string.
please test cc: @greim
@@ -2,7 +2,7 @@
var fs = require('fs');
var path = require('path');

var commentRx = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,((?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?)(?:[ \t]*\*\/)?$/mg;
var commentRx = (?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex needs opening slash :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it kinda does :)

@thlorenz
Copy link
Owner

@greim does this still work properly and solve the problem for you?

@greim
Copy link
Contributor

greim commented Feb 11, 2015

Strangely, no this doesn't prevent the original error.

/Users/greim/project/node_modules/mold-source-map/index.js:9
  var m = source.match(/(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:applic
                 ^
RangeError: Maximum call stack size exceeded
    at String.match (native)
    at extractComment (/Users/greim/project/node_modules/mold-source-map/index.js:9:18)
    at new SourceMolder (/Users/greim/project/node_modules/mold-source-map/index.js:44:18)
    at exports.fromSource (/Users/greim/project/node_modules/mold-source-map/index.js:73:10)
    at Stream.end (/Users/greim/project/node_modules/mold-source-map/index.js:63:24)
    at _end (/Users/greim/project/node_modules/mold-source-map/node_modules/through/index.js:61:9)
    at Stream.stream.end (/Users/greim/project/node_modules/mold-source-map/node_modules/through/index.js:70:5)
    at Labeled.onend (/Users/greim/project/node_modules/browserify/node_modules/labeled-stream-splicer/node_modules/stream-splicer/node_modules/readable-stream/lib/_stream_readable.js:537:10)
    at Labeled.g (events.js:184:16)
    at Labeled.emit (events.js:119:20)

I tested by pasting the new regex directly into mold-source-map/index.js in my node_modules and re-running it.

@lidlanca
Copy link
Author

@greim can you try the regexp with the isolated code you posted. and see if it fail for you too?

How many bytes and approximately how many comments are in the source file you are parsing?

@greim
Copy link
Contributor

greim commented Feb 13, 2015

The isolation code ran the new regex without error. Will check and get back regarding source file.

@greim
Copy link
Contributor

greim commented Feb 13, 2015

source.length => 11135198
Buffer.byteLength(source) => 11137624
approx number of /* comments => 2196
approx number of // comments => 6790

In an isolated environment using same version of iojs, same source string, and same regex, there's no error. It only happens during my gulp build which is creating browserify bundles.

@lidlanca
Copy link
Author

@greim, to explain why it is working in an isolated setup, my wild guess is that when you are adding browserify to the pipe, your process uses more memory, possibly to the point that the regexp engine is not able to allocate the necessary memory for the stack.

The regexp stack is dynamically allocated, it's start as 1mb and grows exponentially by factor of 2.
if there is no more memory in the process heap for the stack, it will raise that error.

I pushed the limit on my machine testing that regexp and it should not have failed on 10mb of string input. or even 9000 matches.

@greim
Copy link
Contributor

greim commented Feb 13, 2015

In my isolation code, I can use up a bunch of memory and still not get an error.

var fs = require('fs')
var arr = []
for (var i=0; i<50000000; i++){
  arr.push(Math.random())
}
var patt = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg;
var source = fs.readFileSync('./source.js', 'utf8')
for (var i=0; i<100; i++){
  source.match(patt)
}

If I push that number much higher than 50000000 it fails with process out of memory, so it's getting really close to the limit and not failing. It feels like something weird is going on with v8, especially since 0.12 works fine. But of course I have no proof for that.

@greim
Copy link
Contributor

greim commented Feb 13, 2015

I finally found a variation of the regex that works.

var works = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(..*)$/mg
var fails = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg

That's weird. I don't know enough regex theory to know whether that's intended or surprising in any way, or indicative of a v8 bug or anything :/

@lidlanca
Copy link
Author

You can read about v8 irregexp engine a bit here.
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

The main point is that the engine optimize the expression to a point the any slight change could produce a different execution plan, some more efficient some less.

glad you got something that works for you.

@thlorenz
Copy link
Owner

Not merging as is since it didn't solve the problem.
@greim if you want to submit your solution as a PR (assuming all tests pass) please do so.

@thlorenz thlorenz closed this Feb 19, 2015
@greim
Copy link
Contributor

greim commented Feb 19, 2015

See: #11

dmitry added a commit to dmitry/karma-sourcemap-loader that referenced this pull request Jun 30, 2015
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error:

```
RangeError: Maximum call stack size exceeded
  at String.match (native)
```

It happens only on the second pass while karma is working (auto watching and reruns the specs).

For more details read this issue:

nodejs/node#759

Similar issues:

thlorenz/convert-source-map#10
thlorenz/convert-source-map#11

Looks like this related to optimizations in V8 regex engine:
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
dmitry added a commit to dmitry/karma-sourcemap-loader that referenced this pull request Jun 30, 2015
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error:

```
RangeError: Maximum call stack size exceeded
  at String.match (native)
```

It happens only on the second pass while karma is working (auto watching and reruns the specs).

For more details read this issue:

nodejs/node#759

Similar issues:

thlorenz/convert-source-map#10
thlorenz/convert-source-map#11

Looks like this related to optimizations in V8 regex engine:
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
elliottsj pushed a commit to EventMobi/karma-sourcemap-loader that referenced this pull request Oct 8, 2015
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error:

```
RangeError: Maximum call stack size exceeded
  at String.match (native)
```

It happens only on the second pass while karma is working (auto watching and reruns the specs).

For more details read this issue:

nodejs/node#759

Similar issues:

thlorenz/convert-source-map#10
thlorenz/convert-source-map#11

Looks like this related to optimizations in V8 regex engine:
http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants