You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to do a simple benchmark of the JS parquet library in modules/parquet. With this example Parquet file (1 million rows, 1 row group, no compression) I got a Maximum call stack size exceeded error (traceback below).
I figured this might have something to do with having 1 million rows in a single row group, so I tried the same file with 20 row groups (i.e. with 50,000 rows in each row group). This file worked, but took 29.949s; for comparison a benchmark with the same file using the wasm loader took around 62ms (both in Node v16.14.0).
Given these results, I'd like to get the wasm parquet loader in #2103 cleaned up sometime soon.
I couldn't get the ParquetLoader to work in a standalone NPM project; even after installing polyfills I kept getting errors of Blob is not defined. The only way I could get the ParquetLoader to work is in the existing test cases, so I just modified one of the existing tests to load these new files:
not ok 1 RangeError: Maximum call stack size exceeded
---
operator: error
expected: |-
undefined
actual: |-
[RangeError: Maximum call stack size exceeded]
at: bound (/Users/kyle/github/mapping/loaders.gl/node_modules/onetime/index.js:30:12)
stack: |-
RangeError: Maximum call stack size exceeded
at Object.decodeValues (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/codecs/rle.ts:95:14)
at decodeValues (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/decoders.ts:216:35)
at decodeDataPage (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/decoders.ts:276:15)
at decodePage (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/decoders.ts:105:20)
at decodeDataPages (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/decoders.ts:58:24)
at ParquetEnvelopeReader.readColumnChunk (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/parquet-envelope-reader.ts:140:18)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at ParquetEnvelopeReader.readRowGroup (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/parquet-envelope-reader.ts:81:43)
at ParquetCursor.next (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/parquetjs/parser/parquet-cursor.ts:48:25)
at parseParquetFileInBatches (/Users/kyle/github/mapping/loaders.gl/modules/parquet/src/lib/parse-parquet.ts:20:22)
...
The text was updated successfully, but these errors were encountered:
The JS / TypeScript version of the loader has not yet been optimized. The batches are read out row-by-row by a "row iterator" and then concatenated.
This can easily be made much faster. Probably a good WASM loader can be faster than JS but given the block memory loading model of parquet, I doubt perf differences would be significant between the two implementation. Instead
For me the selling point of the WASM loader would mainly be that parquet is a big spec and perhaps the rust version is a better maintained project.
The advantage of the typescript loader is that it is significantly easier for the typical loaders.gl user to maintain and modify that code.
Overall JS may also have a smaller bundle size, but that can be less of an issue if the code is loaded dynamically.
I was trying to do a simple benchmark of the JS parquet library in
modules/parquet
. With this example Parquet file (1 million rows, 1 row group, no compression) I got aMaximum call stack size exceeded
error (traceback below).I figured this might have something to do with having 1 million rows in a single row group, so I tried the same file with 20 row groups (i.e. with 50,000 rows in each row group). This file worked, but took 29.949s; for comparison a benchmark with the same file using the wasm loader took around 62ms (both in Node v16.14.0).
Given these results, I'd like to get the wasm parquet loader in #2103 cleaned up sometime soon.
I couldn't get the ParquetLoader to work in a standalone NPM project; even after installing polyfills I kept getting errors of
Blob is not defined
. The only way I could get the ParquetLoader to work is in the existing test cases, so I just modified one of the existing tests to load these new files:Call stack error when trying to load this Parquet file using the code:
The text was updated successfully, but these errors were encountered: