Skip to content

Commit

Permalink
Merge fea793a into eddfcc4
Browse files Browse the repository at this point in the history
  • Loading branch information
ibgreen committed Jun 12, 2020
2 parents eddfcc4 + fea793a commit 5f2904e
Show file tree
Hide file tree
Showing 14 changed files with 235 additions and 47 deletions.
1 change: 1 addition & 0 deletions docs/whats-new.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Worker support for the `WKTLoader`, designed to support future binary data impro

**@loaders.gl/json**

- `parseInBatches` now accepts `options.json.jsonpaths` to specify which array should be streamed using limited JSONPath syntax (e.g. `'$.features'` for GeoJSON).
- `parseInBatches` now returns a `batch.bytesUsed` field to enable progress bars.
- `.geojson` is now parsed by a new experimental `GeoJSONLoader` (exported with an underscore as `_GeoJSONLoader`), designed to support future binary data improvements.

Expand Down
2 changes: 1 addition & 1 deletion modules/draco/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,6 @@
"dependencies": {
"@babel/runtime": "^7.3.1",
"@loaders.gl/loader-utils": "^2.1.3",
"draco3d": "1.3.4"
"draco3d": "^1.3.4"
}
}
24 changes: 18 additions & 6 deletions modules/json/docs/api-reference/json-loader.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ import {load} from '@loaders.gl/core';
const data = await load(url, JSONLoader, {json: options});
```

The JSONLoader supports streaming JSON parsing, in which case it will yield "batches" of rows from the first array it encounters in the JSON. To e.g. parse a stream of GeoJSON:
The JSONLoader supports streaming JSON parsing, in which case it will yield "batches" of rows from one array. To e.g. parse a stream of GeoJSON, the user can specify the `options.json.jsonpaths` to stream the `features` array.

```js
import {JSONLoader} from '@loaders.gl/json';
import {loadInBatches} from '@loaders.gl/core';

const batches = await loadInBatches('geojson.json', JSONLoader);
const batches = await loadInBatches('geojson.json', JSONLoader, {json: {jsonpaths: ['$.features']}});

for await (const batch of batches) {
// batch.data will contain a number of rows
Expand All @@ -38,6 +38,8 @@ for await (const batch of batches) {
}
```

If no JSONPath is specified the loader will stream the first array it encounters in the JSON payload.

When batch parsing an embedded JSON array as a table, it is possible to get access to the containing object using the `{json: {_rootObjectBatches: true}}` option.

The loader will yield an initial and a final batch with `batch.container` providing the container object and `batch.batchType` set to `root-object-batch-partial` and `root-object-batch-complete` respectively.
Expand Down Expand Up @@ -70,10 +72,20 @@ for await (const batch of batches) {

Supports table category options such as `batchType` and `batchSize`.

| Option | From | Type | Default | Description |
| ------------------------- | ---- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `json.table` | v2.0 | Boolean | `false` | Parses non-streaming JSON as table, i.e. return the first embedded array in the JSON. Always `true` during batched/streaming parsing. |
| `json._rootObjectBatches` | v2.1 | Boolean | `false` | Yield an initial and final batch containing the partial and complete root object (excluding the array being streamed). |
| Option | From | Type | Default | Description |
| ------------------------- | ------------------------------------------------------------------------------------- | ---------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `json.table` | [![Website shields.io](https://img.shields.io/badge/v2.0-blue.svg?style=flat-square)] | Boolean | `false` | Parses non-streaming JSON as table, i.e. return the first embedded array in the JSON. Always `true` during batched/streaming parsing. |
| `json.jsonpaths` | [![Website shields.io](https://img.shields.io/badge/v2.2-blue.svg?style=flat-square)] | `string[]` | `[]` | A list of JSON paths (see below) indicating the array that can be streamed. |
| `json._rootObjectBatches` | [![Website shields.io](https://img.shields.io/badge/v2.1-blue.svg?style=flat-square)] | Boolean | `false` | Yield an initial and final batch containing the partial and complete root object (excluding the array being streamed). |

## JSONPaths

A minimal subset of the JSONPath syntax is supported, to specify which array in a JSON object should be streamed as batchs.

`$.component1.component2.component3`

- No support for wildcards, brackets etc. Only paths starting with `$` (JSON root) are supported.
- Regardless of the paths provided, only arrays will be streamed.

## Attribution

Expand Down
1 change: 1 addition & 0 deletions modules/json/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ export {
} from './geojson-loader';

// EXPERIMENTAL EXPORTS - WARNING: MAY BE REMOVED WIHTOUT NOTICE IN FUTURE RELEASES
export {default as _JSONPath} from './lib/jsonpath/jsonpath';
export {default as _ClarinetParser} from './lib/clarinet/clarinet';
65 changes: 65 additions & 0 deletions modules/json/src/lib/jsonpath/jsonpath.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
/**
* A parser for a minimal subset of the jsonpath standard
* Full JSON path parsers for JS exist but are quite large (bundle size)
*
* Supports
*
* `$.component.component.component`
*/
export default class JSONPath {
constructor(path = null) {
this.path = ['$'];

if (path instanceof JSONPath) {
this.path = [...path.path];
return;
}

if (Array.isArray(path)) {
this.path.push(...path);
return;
}

// Parse a string as a JSONPath
if (typeof path === 'string') {
this.path = path.split('.');
if (this.path[0] !== '$') {
throw new Error('JSONPaths must start with $');
}
}
}

clone() {
return new JSONPath(this.path);
}

toString() {
return this.path.join('.');
}

push(name) {
this.path.push(name);
}

pop() {
this.path.pop();
}

set(name) {
this.path[this.path.length - 1] = name;
}

equals(other) {
if (!this || !other || this.path.length !== other.path.length) {
return false;
}

for (let i = 0; i < this.path.length; ++i) {
if (this.path[i] !== other.path[i]) {
return false;
}
}

return true;
}
}
12 changes: 7 additions & 5 deletions modules/json/src/lib/parse-json-in-batches.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,20 @@ import StreamingJSONParser from './parser/streaming-json-parser';
export default async function* parseJSONInBatches(asyncIterator, options) {
asyncIterator = makeTextDecoderIterator(asyncIterator);

const {batchSize, _rootObjectBatches} = options.json;
const {batchSize, _rootObjectBatches, jsonpaths} = options.json;
const TableBatchType = options.json.TableBatch;

let isFirstChunk = true;
let tableBatchBuilder = null;
let schema = null;

const parser = new StreamingJSONParser();
const parser = new StreamingJSONParser({jsonpaths});
tableBatchBuilder =
tableBatchBuilder || new TableBatchBuilder(TableBatchType, schema, {batchSize});

for await (const chunk of asyncIterator) {
const rows = parser.write(chunk);
const jsonPath = parser.getJsonPath().toString();

if (isFirstChunk) {
if (_rootObjectBatches) {
Expand All @@ -40,18 +41,19 @@ export default async function* parseJSONInBatches(asyncIterator, options) {
tableBatchBuilder.addRow(row);
// If a batch has been completed, emit it
if (tableBatchBuilder.isFull()) {
yield tableBatchBuilder.getBatch();
yield tableBatchBuilder.getBatch({jsonPath});
}
}

tableBatchBuilder.chunkComplete(chunk);
if (tableBatchBuilder.isFull()) {
yield tableBatchBuilder.getBatch();
yield tableBatchBuilder.getBatch({jsonPath});
}
}

// yield final batch
const batch = tableBatchBuilder.getBatch();
const jsonPath = parser.getJsonPath().toString();
const batch = tableBatchBuilder.getBatch({jsonPath});
if (batch) {
yield batch;
}
Expand Down
32 changes: 25 additions & 7 deletions modules/json/src/lib/parser/json-parser.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import ClarinetParser from '../clarinet/clarinet';
import JSONPath from '../jsonpath/jsonpath';

// JSONParser builds a JSON object using the events emitted by the Clarinet parser
export default class JSONParser {
Expand All @@ -11,6 +12,7 @@ export default class JSONParser {
this.result = undefined;
this.previousStates = [];
this.currentState = Object.freeze({container: [], key: null});
this.jsonpath = new JSONPath();
}

write(chunk) {
Expand All @@ -33,44 +35,60 @@ export default class JSONParser {
}
}

_openContainer(newContainer) {
_openArray(newContainer = []) {
this.jsonpath.push(null);
this._pushOrSet(newContainer);
this.previousStates.push(this.currentState);
this.currentState = {container: newContainer, key: null};
this.currentState = {container: newContainer, isArray: true, key: null};
}

_closeContainer() {
_closeArray() {
this.jsonpath.pop();
this.currentState = this.previousStates.pop();
}

_openObject(newContainer = {}) {
this.jsonpath.push(null);
this._pushOrSet(newContainer);
this.previousStates.push(this.currentState);
this.currentState = {container: newContainer, isArray: false, key: null};
}

_closeObject() {
this.jsonpath.pop();
this.currentState = this.previousStates.pop();
}

_initializeParser() {
this.parser = new ClarinetParser({
onready: () => {
this.jsonpath = new JSONPath();
this.previousStates.length = 0;
this.currentState.container.length = 0;
},

onopenobject: name => {
this._openContainer({});
this._openObject({});
if (typeof name !== 'undefined') {
this.parser.onkey(name);
}
},

onkey: name => {
this.jsonpath.set(name);
this.currentState.key = name;
},

oncloseobject: () => {
this._closeContainer();
this._closeObject();
},

onopenarray: () => {
this._openContainer([]);
this._openArray();
},

onclosearray: () => {
this._closeContainer();
this._closeArray();
},

onvalue: value => {
Expand Down
81 changes: 60 additions & 21 deletions modules/json/src/lib/parser/streaming-json-parser.js
Original file line number Diff line number Diff line change
@@ -1,58 +1,97 @@
import {default as JSONParser} from './json-parser';
import JSONPath from '../jsonpath/jsonpath';

/**
* The `StreamingJSONParser` looks for the first array in the JSON structure.
* and emits an array of chunks
*/
export default class StreamingJSONParser extends JSONParser {
constructor() {
constructor(options = {}) {
super();
this.topLevelArray = null;
const jsonpaths = options.jsonpaths || [];
this.jsonPaths = jsonpaths.map(jsonpath => new JSONPath(jsonpath));
this.streamingJsonPath = null;
this.streamingArray = null;
this.topLevelObject = null;
this._extendParser();
}

// write REDEFINITION
// - super.write() chunk to parser
// - get the contents (so far) of "topmost-level" array as batch of rows
// - clear top-level array
// - return the batch of rows
/**
* write REDEFINITION
* - super.write() chunk to parser
* - get the contents (so far) of "topmost-level" array as batch of rows
* - clear top-level array
* - return the batch of rows\
*/
write(chunk) {
super.write(chunk);
let array = [];
if (this.topLevelArray) {
array = [...this.topLevelArray];
this.topLevelArray.length = 0;
if (this.streamingArray) {
array = [...this.streamingArray];
this.streamingArray.length = 0;
}
return array;
}

// Returns a partially formed result object
// Useful for returning the "wrapper" object when array is not top level
// e.g. GeoJSON
/**
* Returns a partially formed result object
* Useful for returning the "wrapper" object when array is not top level
* e.g. GeoJSON
*/
getPartialResult() {
return this.topLevelObject;
}

getJsonPath() {
return this.jsonpath;
}

// PRIVATE METHODS

/**
* Checks is this.getJsonPath matches the jsonpaths provided in options
*/
_matchJSONPath() {
const currentPath = this.getJsonPath();
// console.debug(`Testing JSONPath`, currentPath);

// Backwards compatibility, match any array
// TODO implement using wildcard once that is supported
if (this.jsonPaths.length === 0) {
return true;
}

for (const jsonPath of this.jsonPaths) {
if (jsonPath.equals(currentPath)) {
return true;
}
}

return false;
}

_extendParser() {
// Redefine onopenarray to locate top-level array
// Redefine onopenarray to locate and inject value for top-level array
this.parser.onopenarray = () => {
if (!this.topLevelArray) {
this.topLevelArray = [];
this._openContainer(this.topLevelArray);
} else {
this._openContainer([]);
if (!this.streamingArray) {
if (this._matchJSONPath()) {
this.streamingJsonPath = this.getJsonPath().clone();
this.streamingArray = [];
this._openArray(this.streamingArray);
return;
}
}

this._openArray();
};

// Redefine onopenarray to inject value for top-level object
this.parser.onopenobject = name => {
if (!this.topLevelObject) {
this.topLevelObject = {};
this._openContainer(this.topLevelObject);
this._openObject(this.topLevelObject);
} else {
this._openContainer({});
this._openObject({});
}
if (typeof name !== 'undefined') {
this.parser.onkey(name);
Expand Down
1 change: 1 addition & 0 deletions modules/json/test/index.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import './lib/jsonpath/jsonpath.spec';
import './lib/clarinet';
import './lib/parser/json-parser.spec';
import './lib/parser/streaming-json-parser.spec';
Expand Down

0 comments on commit 5f2904e

Please sign in to comment.