Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory error when reading from large files. #160

Closed
aaronbinns opened this issue Feb 5, 2016 · 9 comments
Closed

Out of memory error when reading from large files. #160

aaronbinns opened this issue Feb 5, 2016 · 9 comments

Comments

@aaronbinns
Copy link
Contributor

It appears that the latest version of elasticdump (1.0.0) has a problem reading from large input files.

$  du -m input.json
166 input.json
$ wc -l input.json
   20000 input.json
$ ./node_modules/elasticdump/bin/elasticdump --input input.json --output output.json
Fri, 05 Feb 2016 17:22:43 GMT | starting dump
Fri, 05 Feb 2016 17:22:43 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:22:43 GMT | sent 100 objects to destination file, wrote 100
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

I tested with smaller files, and it appears that elastcdump reads the first 100 lines from the input file, writes them to the output, then on the next read from the input, it tries to read the entire remainder of the input.

For example:

$ head -n 250 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:27:19 GMT | starting dump
Fri, 05 Feb 2016 17:27:19 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:27:19 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:27:19 GMT | got 150 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:27:19 GMT | sent 150 objects to destination file, wrote 150
Fri, 05 Feb 2016 17:27:19 GMT | got 0 objects from source file (offset: 250)
Fri, 05 Feb 2016 17:27:19 GMT | Total Writes: 250
Fri, 05 Feb 2016 17:27:19 GMT | dump complete

and

$ head -n 623 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:27:42 GMT | starting dump
Fri, 05 Feb 2016 17:27:42 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:27:42 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:27:42 GMT | got 523 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:27:43 GMT | sent 523 objects to destination file, wrote 523
Fri, 05 Feb 2016 17:27:43 GMT | got 0 objects from source file (offset: 623)
Fri, 05 Feb 2016 17:27:43 GMT | Total Writes: 623
Fri, 05 Feb 2016 17:27:43 GMT | dump complete

and

$ head -n 2623 input.json | ./node_modules/elasticdump/bin/elasticdump --input $ --output output.json
Fri, 05 Feb 2016 17:28:04 GMT | starting dump
Fri, 05 Feb 2016 17:28:05 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 17:28:05 GMT | sent 100 objects to destination file, wrote 100
Fri, 05 Feb 2016 17:28:08 GMT | got 2523 objects from source file (offset: 100)
Fri, 05 Feb 2016 17:28:09 GMT | sent 2523 objects to destination file, wrote 2523
Fri, 05 Feb 2016 17:28:09 GMT | got 0 objects from source file (offset: 2623)
Fri, 05 Feb 2016 17:28:09 GMT | Total Writes: 2623
Fri, 05 Feb 2016 17:28:09 GMT | dump complete

as I keep increasing the size of the input file, the pattern persists: the second read/write iteration attempts to read the entire remainder of the input. Once the input file gets large enough, elasticdump will run out of memory trying to read it all in.

@evantahler
Copy link
Collaborator

A more formal stack trace from a newer node version:

> ./bin/elasticdump --input ~/Desktop/data.json --output http://localhost:9200/data
Fri, 05 Feb 2016 19:21:02 GMT | starting dump
Fri, 05 Feb 2016 19:21:02 GMT | got 100 objects from source file (offset: 0)
Fri, 05 Feb 2016 19:21:02 GMT | sent 100 objects to destination elasticsearch, wrote 100

<--- Last few GCs --->

   11090 ms: Scavenge 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 3.9 / 0 ms (+ 2.5 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
   11858 ms: Mark-sweep 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 768.1 / 0 ms (+ 3.5 ms in 2 steps since start of marking, biggest step 2.5 ms) [last resort gc].
   12715 ms: Mark-sweep 1408.5 (1457.0) -> 1408.5 (1457.0) MB, 856.4 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0xb140b44a49 <JS Object>
    2: write [/Users/evantahler/Dropbox/Projects/taskrabbit/elasticsearch-dump/node_modules/jsonparse/jsonparse.js:~87] [pc=0x18a6a3b4303c] (this=0xae6cabc4839 <a Parser with map 0x3da79179d869>,buffer=0xb140bc3231 <an Uint8Array with map 0x3da7917203a9>)
    3: /* anonymous */ [/Users/evantahler/Dropbox/Projects/taskrabbit/elasticsearch-dump/node_modules/JSONStream/index.js:~18] [pc=0x18a6a389dc...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

@evantahler
Copy link
Collaborator

The problem seems isolated to the file reader stream not being paused properly...

@evantahler
Copy link
Collaborator

will be solved via #161

@craighawki
Copy link

SFO1502579255M:~ 502579255$ head -n 92006 /Users/502579255/ELK/kibana-4.4.1-darwin-x64/Kibana_Essentials-master/tweet.json | /usr/local/bin/elasticdump --bulk=true --input $ --output=http://localhost:9200/
Fri, 11 Mar 2016 01:13:55 GMT | starting dump

<--- Last few GCs --->

15098 ms: Scavenge 1405.7 (1458.1) -> 1405.7 (1458.1) MB, 19.0 / 0 ms (+ 54.7 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
16596 ms: Mark-sweep 1405.7 (1458.1) -> 1405.2 (1457.1) MB, 1498.1 / 0 ms (+ 1147.8 ms in 2587 steps since start of marking, biggest step 54.7 ms) [last resort gc].
18066 ms: Mark-sweep 1405.2 (1457.1) -> 1405.1 (1458.1) MB, 1469.7 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x314dfe6e3ac1
2: write [/usr/local/lib/node_modules/elasticdump/node_modules/jsonparse/jsonparse.js:~87] [pc=0x1a8070b2eb36](this=0xf08f9dccf41 <a Parser with map 0x320cf4783be9>,buffer=0xcea3db43cc9 <an Uint8Array with map 0x320cf4705911)
3: /* anonymous */ [/usr/local/lib/node_modules/elasticdump/node_modules/JSONStream/index.js:~18] [pc=0x1a8070b26c0f] (this=0xf08f9dcd0b9 <a Stream with map 0x32...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
SFO1502579255M:~ 502579255$ head -n 90000 /Users/502579255/ELK/kibana-4.4.1-darwin-x64/Kibana_Essentials-master/tweet.json | /usr/local/bin/elasticdump --bulk=true --input $ --output=http://localhost:9200/
Fri, 11 Mar 2016 01:14:20 GMT | starting dump
Fri, 11 Mar 2016 01:14:34 GMT | got 0 objects from source file (offset: 0)
Fri, 11 Mar 2016 01:14:34 GMT | Total Writes: 0
Fri, 11 Mar 2016 01:14:34 GMT | dump complete
SFO1502579255M:~ 502579255$

@evantahler
Copy link
Collaborator

@craighawki what version of node and elasticdump are you using?

@evantahler
Copy link
Collaborator

@craighawki it might also be the case that you have a very large document (over 1GB?)

@showmeall
Copy link

i get the same issue, may i know how do you solved it?
<--- Last few GCs --->

1145104 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 975.4 / 0.0 ms [allocation failure] [scavenge might not succeed].
1146280 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 1175.0 / 0.0 ms (+ 0.4 ms in 1 steps since start of marking, biggest step 0.4 ms) [allocation failure] [scavenge might not succeed].
1147268 ms: Mark-sweep 1389.2 (1434.0) -> 1389.1 (1434.0) MB, 987.4 / 0.0 ms [allocation failure] [scavenge might not succeed].

<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0xdecf4c [node]
3: v8::Utils::ReportApiFailure(char const*, char const*) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::MarkCompactCollector::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject*) [node]

@evantahler
Copy link
Collaborator

Your node process has run our of RAM... perhaps you are trying to import many large documents (all of which have to be parsed). Reduce how many documents you import per batch via --limit

@Nandabiradar
Copy link

Nandabiradar commented Jun 13, 2017

I got the same Issue while testing, Run the protractor test and got following error:
<--- Last few GCs --->>

91873 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1263.2 / 0.0 ms [allocation failure] [scavenge might not succeed].
93098 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1224.4 / 0.0 ms [allocation failure] [scavenge might not succeed].
94300 ms: Mark-sweep 1391.5 (1437.1) -> 1391.5 (1437.1) MB, 1202.0 / 0.0 ms [allocation failure] [scavenge might not succeed].
2: /* anonymous /(aka / anonymous */) [C:\Users\nbiradar\AppData\Roaming\npm\node_modules\protractor\node_modules\selenium-webdriver\lib\webdriver.js:188] [pc=000003FA2F0C603B] (this=00000368019
04381 ,value=000000D6ADD95629 <a Promise with map 000003ED7BE169F9>,key=0000024A87472C31 <String[9]: _idleNext>)
<--- JS stacktrace --->orEachKey) [C:\Users\nbiradar\AppData\Roaming\np...
Cannot get stack trace in GC.
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants