New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify commentRx expression, improve memory usage by the regexp engine #10
Conversation
issue reported: thlorenz/mold-source-map#5 If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack. I tested it with ~1gb of string. please test cc: @greim
@@ -2,7 +2,7 @@ | |||
var fs = require('fs'); | |||
var path = require('path'); | |||
|
|||
var commentRx = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,((?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?)(?:[ \t]*\*\/)?$/mg; | |||
var commentRx = (?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex needs opening slash :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it kinda does :)
@greim does this still work properly and solve the problem for you? |
Strangely, no this doesn't prevent the original error.
I tested by pasting the new regex directly into mold-source-map/index.js in my node_modules and re-running it. |
@greim can you try the regexp with the isolated code you posted. and see if it fail for you too? How many bytes and approximately how many comments are in the source file you are parsing? |
The isolation code ran the new regex without error. Will check and get back regarding source file. |
source.length => 11135198 In an isolated environment using same version of iojs, same source string, and same regex, there's no error. It only happens during my gulp build which is creating browserify bundles. |
@greim, to explain why it is working in an isolated setup, my wild guess is that when you are adding browserify to the pipe, your process uses more memory, possibly to the point that the regexp engine is not able to allocate the necessary memory for the stack. The regexp stack is dynamically allocated, it's start as 1mb and grows exponentially by factor of 2. I pushed the limit on my machine testing that regexp and it should not have failed on 10mb of string input. or even 9000 matches. |
In my isolation code, I can use up a bunch of memory and still not get an error. var fs = require('fs')
var arr = []
for (var i=0; i<50000000; i++){
arr.push(Math.random())
}
var patt = /(?:\/\/|\/\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg;
var source = fs.readFileSync('./source.js', 'utf8')
for (var i=0; i<100; i++){
source.match(patt)
} If I push that number much higher than 50000000 it fails with process out of memory, so it's getting really close to the limit and not failing. It feels like something weird is going on with v8, especially since 0.12 works fine. But of course I have no proof for that. |
I finally found a variation of the regex that works. var works = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(..*)$/mg
var fails = /^[ \t]*\/(?:\/|\*)[@#][ \t]+sourceMappingURL=data:(?:application|text)\/json;base64,(.+)$/mg That's weird. I don't know enough regex theory to know whether that's intended or surprising in any way, or indicative of a v8 bug or anything :/ |
You can read about v8 irregexp engine a bit here. The main point is that the engine optimize the expression to a point the any slight change could produce a different execution plan, some more efficient some less. glad you got something that works for you. |
Not merging as is since it didn't solve the problem. |
See: #11 |
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error: ``` RangeError: Maximum call stack size exceeded at String.match (native) ``` It happens only on the second pass while karma is working (auto watching and reruns the specs). For more details read this issue: nodejs/node#759 Similar issues: thlorenz/convert-source-map#10 thlorenz/convert-source-map#11 Looks like this related to optimizations in V8 regex engine: http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error: ``` RangeError: Maximum call stack size exceeded at String.match (native) ``` It happens only on the second pass while karma is working (auto watching and reruns the specs). For more details read this issue: nodejs/node#759 Similar issues: thlorenz/convert-source-map#10 thlorenz/convert-source-map#11 Looks like this related to optimizations in V8 regex engine: http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
On io.js 2.2.1 and 2.3.1 (by the time of writing it's the latest) I have this error: ``` RangeError: Maximum call stack size exceeded at String.match (native) ``` It happens only on the second pass while karma is working (auto watching and reruns the specs). For more details read this issue: nodejs/node#759 Similar issues: thlorenz/convert-source-map#10 thlorenz/convert-source-map#11 Looks like this related to optimizations in V8 regex engine: http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Simply changing from `.+` to `..*` equivalent rule fixes an issue with match function.
issue reported: thlorenz/mold-source-map#5
If this change work you should be able to process a source string with the limit of the process memory and not the regexp stack.
I tested it with ~1gb of string.
please test cc: @greim