Out of memory error when reading from large files. #160

aaronbinns · 2016-02-05T17:29:33Z

It appears that the latest version of elasticdump (1.0.0) has a problem reading from large input files.

$  du -m input.json
166 input.json
$ wc -l input.json
   20000 input.json
$ ./node_modules/elasticdump/bin/elasticdump --input input.json --output output.json
Fri, 05 Feb 2016 17:22:43 GMT | starting dump
Fri, 05 Feb 2016 17:22:43 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:22:43 GMT | sent 100 objects to destination file, wrote 100
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

I tested with smaller files, and it appears that elastcdump reads the first 100 lines from the input file, writes them to the output, then on the next read from the input, it tries to read the entire remainder of the input.

For example:

$ head -n 250 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:27:19 GMT | starting dump
Fri, 05 Feb 2016 17:27:19 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:27:19 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:27:19 GMT | got 150 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:27:19 GMT | sent 150 objects to destination file, wrote 150
Fri, 05 Feb 2016 17:27:19 GMT | got 0 objects from source file (offset: 250)
Fri, 05 Feb 2016 17:27:19 GMT | Total Writes: 250
Fri, 05 Feb 2016 17:27:19 GMT | dump complete

and

$ head -n 623 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:27:42 GMT | starting dump
Fri, 05 Feb 2016 17:27:42 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:27:42 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:27:42 GMT | got 523 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:27:43 GMT | sent 523 objects to destination file, wrote 523
Fri, 05 Feb 2016 17:27:43 GMT | got 0 objects from source file (offset: 623)
Fri, 05 Feb 2016 17:27:43 GMT | Total Writes: 623
Fri, 05 Feb 2016 17:27:43 GMT | dump complete

and

$ head -n 2623 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:28:04 GMT | starting dump
Fri, 05 Feb 2016 17:28:05 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:28:05 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:28:08 GMT | got 2523 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:28:09 GMT | sent 2523 objects to destination file, wrote 2523
Fri, 05 Feb 2016 17:28:09 GMT | got 0 objects from source file (offset: 2623)
Fri, 05 Feb 2016 17:28:09 GMT | Total Writes: 2623
Fri, 05 Feb 2016 17:28:09 GMT | dump complete

as I keep increasing the size of the input file, the pattern persists: the second read/write iteration attempts to read the entire remainder of the input. Once the input file gets large enough, elasticdump will run out of memory trying to read it all in.

The text was updated successfully, but these errors were encountered:

evantahler · 2016-02-05T19:21:53Z

A more formal stack trace from a newer node version:

> ./bin/elasticdump --input ~/Desktop/data.json --output http://localhost:9200/data
Fri, 05 Feb 2016 19:21:02 GMT | starting dump
Fri, 05 Feb 2016 19:21:02 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 19:21:02 GMT | sent 100 objects to destination elasticsearch, wrote 100

<--- Last few GCs --->

   11090 ms: Scavenge 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 3.9 / 0 ms (+ 2.5 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
   11858 ms: Mark-sweep 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 768.1 / 0 ms (+ 3.5 ms in 2 steps since start of marking, biggest step 2.5 ms) [last resort gc].
   12715 ms: Mark-sweep 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 856.4 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0xb140b44a49 <JS Object>
    2: write [/Users/evantahler/Dropbox/Projects/taskrabbit/elasticsearch-dump/node_modules/jsonparse/jsonparse.js:~87] [pc=0x18a6a3b4303c] (this=0xae6cabc4839 <a Parser with map 0x3da79179d869>,buffer=0xb140bc3231 <an Uint8Array with map 0x3da7917203a9>)
    3: /* anonymous */ [/Users/evantahler/Dropbox/Projects/taskrabbit/elasticsearch-dump/node_modules/JSONStream/index.js:~18] [pc=0x18a6a389dc...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

evantahler · 2016-02-05T19:22:18Z

The problem seems isolated to the file reader stream not being paused properly...

evantahler · 2016-02-05T21:14:20Z

will be solved via #161

craighawki · 2016-03-11T01:15:05Z

SFO1502579255M:~ 502579255$ head -n 92006 /Users/502579255/ELK/kibana-4.4.1-darwin-x64/Kibana_Essentials-master/tweet.json | /usr/local/bin/elasticdump --bulk=true --input $ --output=http://localhost:9200/
Fri, 11 Mar 2016 01:13:55 GMT | starting dump

<--- Last few GCs --->

15098 ms: Scavenge 1405.7 (1458.1) -> 1405.7 (1458.1) MB, 19.0 / 0 ms (+ 54.7 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
16596 ms: Mark-sweep 1405.7 (1458.1) -> 1405.2 (1457.1) MB, 1498.1 / 0 ms (+ 1147.8 ms in 2587 steps since start of marking, biggest step 54.7 ms) [last resort gc].
18066 ms: Mark-sweep 1405.2 (1457.1) -> 1405.1 (1458.1) MB, 1469.7 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x314dfe6e3ac1
2: write [/usr/local/lib/node_modules/elasticdump/node_modules/jsonparse/jsonparse.js:~87] [pc=0x1a8070b2eb36](this=0xf08f9dccf41 <a Parser with map 0x320cf4783be9>,buffer=0xcea3db43cc9 <an Uint8Array with map 0x320cf4705911)
3: /* anonymous */ [/usr/local/lib/node_modules/elasticdump/node_modules/JSONStream/index.js:~18] [pc=0x1a8070b26c0f] (this=0xf08f9dcd0b9 <a Stream with map 0x32...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
SFO1502579255M:~ 502579255$ head -n 90000 /Users/502579255/ELK/kibana-4.4.1-darwin-x64/Kibana_Essentials-master/tweet.json | /usr/local/bin/elasticdump --bulk=true --input $ --output=http://localhost:9200/
Fri, 11 Mar 2016 01:14:20 GMT | starting dump
Fri, 11 Mar 2016 01:14:34 GMT | got 0 objects from source file (offset: 0)
Fri, 11 Mar 2016 01:14:34 GMT | Total Writes: 0
Fri, 11 Mar 2016 01:14:34 GMT | dump complete
SFO1502579255M:~ 502579255$

evantahler · 2016-03-11T17:49:29Z

@craighawki what version of node and elasticdump are you using?

evantahler · 2016-03-11T17:50:07Z

@craighawki it might also be the case that you have a very large document (over 1GB?)

showmeall · 2016-12-30T08:08:28Z

i get the same issue, may i know how do you solved it?
<--- Last few GCs --->

1145104 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 975.4 / 0.0 ms [allocation failure] [scavenge might not succeed].
1146280 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 1175.0 / 0.0 ms (+ 0.4 ms in 1 steps since start of marking, biggest step 0.4 ms) [allocation failure] [scavenge might not succeed].
1147268 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 987.4 / 0.0 ms [allocation failure] [scavenge might not succeed].

<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0xdecf4c [node]
3: v8::Utils::ReportApiFailure(char const*, char const*) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::MarkCompactCollector::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject*) [node]

evantahler · 2017-01-01T04:25:54Z

Your node process has run our of RAM... perhaps you are trying to import many large documents (all of which have to be parsed). Reduce how many documents you import per batch via --limit

Nandabiradar · 2017-06-13T22:25:50Z

I got the same Issue while testing, Run the protractor test and got following error:
<--- Last few GCs --->>

91873 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1263.2 / 0.0 ms [allocation failure] [scavenge might not succeed].
93098 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1224.4 / 0.0 ms [allocation failure] [scavenge might not succeed].
94300 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1202.0 / 0.0 ms [allocation failure] [scavenge might not succeed].
2: /* anonymous /(aka / anonymous */) [C:\Users\nbiradar\AppData\Roaming\npm\node_modules\protractor\node_modules\selenium-webdriver\lib\webdriver.js:188] [pc=000003FA2F0C603B] (this=00000368019
04381 ,value=000000D6ADD95629 <a Promise with map 000003ED7BE169F9>,key=0000024A87472C31 <String[9]: _idleNext>)
<--- JS stacktrace --->orEachKey) [C:\Users\nbiradar\AppData\Roaming\np...
Cannot get stack trace in GC.
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory

evantahler closed this as completed Feb 5, 2016

evantahler mentioned this issue Feb 5, 2016

better source limit buffers for file reader #161

Merged

evantahler mentioned this issue Jan 1, 2017

get error ''JavaScript heap out of memory" when i import data from json file to elasticsearch #278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory error when reading from large files. #160

Out of memory error when reading from large files. #160

aaronbinns commented Feb 5, 2016

evantahler commented Feb 5, 2016

evantahler commented Feb 5, 2016

evantahler commented Feb 5, 2016

craighawki commented Mar 11, 2016

evantahler commented Mar 11, 2016

evantahler commented Mar 11, 2016

showmeall commented Dec 30, 2016

evantahler commented Jan 1, 2017

Nandabiradar commented Jun 13, 2017 •

edited

Loading

Out of memory error when reading from large files. #160

Out of memory error when reading from large files. #160

Comments

aaronbinns commented Feb 5, 2016

evantahler commented Feb 5, 2016

evantahler commented Feb 5, 2016

evantahler commented Feb 5, 2016

craighawki commented Mar 11, 2016

evantahler commented Mar 11, 2016

evantahler commented Mar 11, 2016

showmeall commented Dec 30, 2016

evantahler commented Jan 1, 2017

Nandabiradar commented Jun 13, 2017 • edited Loading

Nandabiradar commented Jun 13, 2017 •

edited

Loading