From 203290015076318f2b202bd52402c1ac3a704b78 Mon Sep 17 00:00:00 2001 From: Xiaoji Chen Date: Wed, 13 Nov 2019 13:54:52 -0800 Subject: [PATCH 1/2] add rfc for 8.0 binary support API --- dev-docs/RFCs/README.md | 3 +- .../RFCs/{v8.x => v7.x}/binary-data-rfc.md | 2 +- .../v8.0/binary-support-improvements-rfc.md | 186 ++++++++++++++++++ 3 files changed, 189 insertions(+), 2 deletions(-) rename dev-docs/RFCs/{v8.x => v7.x}/binary-data-rfc.md (99%) create mode 100644 dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md diff --git a/dev-docs/RFCs/README.md b/dev-docs/RFCs/README.md index e47ac5cdd9f..b2432c55390 100644 --- a/dev-docs/RFCs/README.md +++ b/dev-docs/RFCs/README.md @@ -57,11 +57,11 @@ Group of related RFCs loosely targeted for 7.x releases. | RFC | Author | Status | Description | | --- | --- | --- | --- | -| [**Binary Data RFC**](v7.x-binary/binary-data-rfc.md) | @ibgreen @pessimistress | **Draft** | Support binary data as input to deck.gl layers. | | [**GLSL Accessor RFC**](/dev-docs/v7.x-binary/glsl-accessor-rfc.md) | @ibgreen | **Draft** | Allow apps to define GLSL accessors to directly access injected columnar data or implement advanved effects | | [**Texture Attribute RFC**](/dev-docs/v7.x-binary/texture-attribute-rfc.md) | @ibgreen | **Draft** | Enable variable-primitive layers_ to read _descriptive attributes_ from a binary column. | | [**GPU Data Frame Support**](/dev-docs/v7.x-binary/gpu-data-frame-rfc.md) | @ibgreen | **Draft** | Proposal for enabling deck.gl to apply data frame filters on GPU. | | [**Chunked Data Support**](/dev-docs/v7.x-binary/chunked-data-rfc.md) | @ibgreen | **Draft** | Support Chunked Columnar data on the GPU. | +| [**Binary Support Improvements**](/dev-docs/v7.x-binary/chunked-data-rfc.md) | @pessimistress | **Draft** | API change for more intuitive binary data usage. | ## v7.x RFCs @@ -70,6 +70,7 @@ RFCs loosely targeted for 7.x releases. Also see [luma.gl RFCs](https://github.c | RFC | Author | Status | Description | | --- | --- | --- | --- | +| [**Binary Data RFC**](v7.x-binary/binary-data-rfc.md) | @ibgreen @pessimistress | **Draft** | Support binary data as input to deck.gl layers. | | [**Imperative API Improvements**](v7.x/imperative-api-rfc.md) | @ibgreen | **draft** | Potential API improvements for imperative programming. | | [**Partial Updates**](vNext/partial-updates-rfc.md) | @ibgreen @georgios-uber | **draft** | Allow partial updates of vertex attributes to support high-performance editing. | | [**Project / Unproject Improvements**](v7.x/project-unproject-rfc.md) | @ibgreen | **Draft** | Consolidating, generalizing and simplifying JS `project`/`unproject` API, covering uses from new features such as MultiView. | diff --git a/dev-docs/RFCs/v8.x/binary-data-rfc.md b/dev-docs/RFCs/v7.x/binary-data-rfc.md similarity index 99% rename from dev-docs/RFCs/v8.x/binary-data-rfc.md rename to dev-docs/RFCs/v7.x/binary-data-rfc.md index 297116788f8..0120475b615 100644 --- a/dev-docs/RFCs/v8.x/binary-data-rfc.md +++ b/dev-docs/RFCs/v7.x/binary-data-rfc.md @@ -2,7 +2,7 @@ * Authors: Ib Green, Xiaoji Chen * Date: December 19, 2018 -* Status: **Draft** +* Status: **Implemented** > This RFC was written as an article for the deck.gl Developer's Guide, rather than in the standard format. The intention was to copy some of this text to the developer's guide once it matures. diff --git a/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md b/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md new file mode 100644 index 00000000000..0980112e7f7 --- /dev/null +++ b/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md @@ -0,0 +1,186 @@ +# RFC: Binary Support Improvements + +* **Authors**: Xiaoji Chen +* **Date**: November 2019 +* **Status**: **Draft** + + +## Summary + +This proposal is intended to be an intermediate step towards full Arrow (or generic binary data frame) support in v8.x. + +The purpose is to introduce necessary user-facing API changes in the 8.0 release, that will be compatible with the eventual Arrow support. + +## Background + +During a layer update, there are three main bottle necks in terms of performance: + +1. Downloading and parsing the source data +2. Packing the attributes on CPU +3. Uploading the attributes to GPU + +For performance-sensitive applications that deal with a large amount of data or high-frequency updates, it is typical to transfer binary format between the server and the client. + +Before v7, once the data is downloaded, the application is still required to construct a JavaScript array from the binary, adding memor & CPU time tolls on the client side. In v7.x ([Binary Data RFC](v7.x-binary/binary-data-rfc.md)), we added two ways to use binary data that completely circumvent this step, hence eliminating the performance hit in step 1. + +* index-based accessor: + + ```js + // binaryData is packed on the server, in the format of [x, y, r, g, b, a] + const DATA = {src: binaryData, length: binaryData.length / 6}; + + new ScatterplotLayer({ + data: DATA, + getPosition: (object, {index, data, target}) => { + target[0] = data.src[index * 6]; + target[1] = data.src[index * 6 + 1]; + target[2] = 0; + return target; + }, + getColor: (object, {index, data, target}) => { + target[0] = data.src[index * 6 + 2] * 255; + target[1] = data.src[index * 6 + 3] * 255; + target[2] = data.src[index * 6 + 4] * 255; + target[3] = data.src[index * 6 + 5] * 255; + return target; + }, + ... + }) + ``` + + This method involves writing an accessor that reads relevant information out of the binary blob at each object index. During attribute update, an internal buffer is allocated, and the accessor is called `numInstance` times to populate it. Hence the performance is comparable to using plain JavaScript array data at step 2 and 3. + +* supplying external buffers directly: + + ```js + // binaryData is packed on the server, in the format of [x, y, r, g, b, a] + const DATA = { + length: binaryData.length / 6, + attributes: { + instancePositions: {value: binaryData, size: 2, stride: 4 * 6, offset: 0}, + instanceColors: {value: binaryData, size: 4, stride: 4 * 6, offset: 4 * 2, normalized: true} + } + }; + + new PointCloudLayer({ + data: DATA, + getNormal: [0, 0, 0] + }); + ``` + + This method skips step 2 entirely, and is possibly the most performant we'll ever get. However, it requires the application (BE and FE) to have knowledge of the internal implementation of a layer, including attribute names, array types and layouts, which is not documented and prone to breakage between minor releases. + + +## Goals + +Moving into v8.x, we want to make binary data a first-class citizen, in the following ways: + +* All core layers accept a variety of binary data inputs, without exposing the internal layer implementation +* Binary data are directly uploaded to the GPU if possible, and basic packing operations (e.g. position interleving, transform matrix construction) are performed on the GPU instead of CPU + + +## Proposal + +Adding an additional override to the `accessor` prop type that is a string. `props.data[accessor]` should yield a loaders.gl-compatible "attribute descriptor" object. +We shall explain this in the documentation as follows: + +A layer prop of type `accessor` can optionally be a string. If a string is provided, it is used as the key to query from `data` for a JavaScript object containing the following fields: + +- `value` ([TypedArray](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/TypedArray)) - a flat buffer that stacks the value for each object that would otherwise be returned by a function accessor. For example, `getPosition: d => [d.x, d.y, d.z]` is equivalent to `getPosition: {value: new Float64Array([x0, y0, z0, x1, y1, z1, ...])}`. This buffer can be constructed on the server side by something like `data.flatMap(d => [d.x, d.y, d.z])`. +- `size` (Number) - the number of elements per value in the typed array. For example, an RGB color has size 3, and an RGBA color has size 4. +- `elementStride` (Number, optional) - the length per vertex in the typed array. This is needed if the buffer contains interleved information that should not be used for this accessor. Default to `size`. +- `elementOffset` (Number, optional) - the number of elements into each vertex where the value starts. Default to `0`. +- `bufferLayout` (Array, optional) - the number of vertices per object. This is needed by various-width data sources such as `PathLayer` and `PolygonLayer`, referring to the number of points per path/polygon respectively. For most layers, it is assumed to be `[1, 1, 1, ...]`. + +```js +// EXAMPLE 1 - PointCloudLayer +/* + binaryData is packed on the server: + original data: [ + {x, y, r, g, b}, // d0 + {x, y, r, g, b}, // d1 + ... + ] + binary data: { + positionsAndColors: [d0x, d0y, d0r, d0g, d0b, d0a, d1x, d1y, ...] + } +*/ +new PointCloudLayer({ + data: { + length: binaryData.length / 6, + positions: {value: binaryData, size: 2, elementStride: 6, elementOffset: 0}, + colors: {value: binaryData, size: 4, elementStride: 6, elementOffset: 2, normalized: true} + }, + getPosition: 'positions', + getColor: 'colors', + getNormal: [0, 0, 0] +}); +``` + +```js +// EXAMPLE 2 - PathLayer +/* + binaryData is packed on the server: + original data: [ + {path: [[x, y], [x, y], [x, y]]}, // p0 + {path: [[x, y], [x, y]]}, // p1 + ... + ] + binary data: { + positions: [p00x, p00y, p01x, p01y, p02x, p02y, p10x, p10y, ...], + colors: [p00r, p00g, p00b, p01r, p01g, p01b, p02r, p02g, p02b, p10r, p10g, p10b, ...], + poingCountPerPath: [3, 2, ...] + } +*/ +new PathLayer({ + data: { + length: binaryData.poingCountPerPath.length, + paths: {value: binaryData.positions, size: 2, bufferLayout: binaryData.pointCountPerPath}, + colors: {value: binaryData.colors, size: 3, bufferLayout: pointCountPerPath} + }, + getPath: 'paths', + getColor: 'colors' +}); +``` + +Implementation notes: + +* If an accessor turns out to be a string, the `AttributeManager` will attempt to retrieve the information from `data` and pass it to the `Attribute` instance. +* For auto-updated attributes (most common), the attribute will directly upload the buffer to GPU, skipping the local packing step. +* For attributes that require CPU-based processing (e.g. polygon normalization, path tesselation, icon mapping), the buffer will be treated as a source from which the auto updater draws values from, similar to calling an accessor function for each object. + +The advantages of this new API include: + +* There are no conflicts with existing use cases. +* No dependency on undocumented information. To switch to using binary, the user can simply "flatten" their data with `data.flatMap(accessor)`, where `accessor` is a function that already works in the traditional use case. If tesselation/transform is required, the `Attribute` class will do it under the hood, hiding the implementation detail from applications. +* loaders.gl friendly. When loading from an URL with e.g. a CSV or Arrow loader, the binaries will be accessible by specifying a "path" into the parsed data. +* Future extensibility. This new API will allow users to prepare data and structure applications in a way that may naturally transition into custom GPU-based data processing, e.g. + +```js +new PointCloudLayer({ + data: { + ... + longitude: {value: , size: 1}, + latitude: {value: , size: 1} + }, + getPosition: new GPUData({sources: ['longitude', 'latitude'], operation: 'join'}) +}); +``` + +Relevant RFCs for full data frame support: + +- [**GLSL Accessor RFC**](/dev-docs/v7.x-binary/glsl-accessor-rfc.md) +- [**Texture Attribute RFC**](/dev-docs/v7.x-binary/texture-attribute-rfc.md) +- [**GPU Data Frame Support**](/dev-docs/v7.x-binary/gpu-data-frame-rfc.md) + +## Open Questions + +### How do composite layers support binary data? + +Currenly, composite layers support index-based accessors, but not external attribute buffers. They will not support this new feature initially. + +### Buffer sharing between attributes? + +If the user opts to use an interleved buffer, it should not be uploaded to the GPU once for each attribute. This may require a global data manager (similar to `TypedArrayManager`) that creates, tracks and destroys GL resources. + +This can be implemented as a performance improvement after the initial release. From 19f4f8d89ea14f0aefb4ff882cd61ea2d3ed2b74 Mon Sep 17 00:00:00 2001 From: Xiaoji Chen Date: Thu, 14 Nov 2019 17:44:06 -0800 Subject: [PATCH 2/2] incorporate feedback --- .../v8.0/binary-support-improvements-rfc.md | 70 ++++++++++--------- 1 file changed, 38 insertions(+), 32 deletions(-) diff --git a/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md b/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md index 0980112e7f7..01e4f02c9e3 100644 --- a/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md +++ b/dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md @@ -70,27 +70,43 @@ Before v7, once the data is downloaded, the application is still required to con This method skips step 2 entirely, and is possibly the most performant we'll ever get. However, it requires the application (BE and FE) to have knowledge of the internal implementation of a layer, including attribute names, array types and layouts, which is not documented and prone to breakage between minor releases. +## Terms + +- **Logical attribute** - the attribute provided by the user, in the format from the layer documentation. +- **Deck attribute** - the attribute created by the `AttributeManager`, by applying a pre-defined transform to the logical attribute(s). The Deck attribute is mapped 1:1 with a WebGL buffer. +- **Shader attribute** - the attribute seen by the vertex shader. A Deck attribute may map to multiple shader attributes using the same WebGL buffer and different accessors. + +For example, for scatterplot positions, the logical attribute is a `Float64Array` in the format of `x0, y0, z0, x1, y1, z1, ...`. The Deck attribute is an interleved `Float32Array` in the format of `x0, y0, z0, x0Low, y0Low, z0Low, ...`. Two shader attributes are created from the Deck attribute: `instancePositions` and `instancePositions64Low`. + +Example2: for polygon positions, the logical attribute is an array that flattens all polygon vertices. The Deck attribute include a normalized `positions` array (vertices may be added to close loops) and an `indices` array from triangulation. ## Goals Moving into v8.x, we want to make binary data a first-class citizen, in the following ways: -* All core layers accept a variety of binary data inputs, without exposing the internal layer implementation +* All core layers accept binary data inputs as logical attributes. * Binary data are directly uploaded to the GPU if possible, and basic packing operations (e.g. position interleving, transform matrix construction) are performed on the GPU instead of CPU ## Proposal -Adding an additional override to the `accessor` prop type that is a string. `props.data[accessor]` should yield a loaders.gl-compatible "attribute descriptor" object. -We shall explain this in the documentation as follows: +Allow `data.attributes` use an accessor name as the key that map to a loaders.gl-compatible "attribute descriptor" object. +We shall explain this in the [binary data developer guide](/docs/developer-guide/performance.md#on-using-binary-data) as follows: -A layer prop of type `accessor` can optionally be a string. If a string is provided, it is used as the key to query from `data` for a JavaScript object containing the following fields: +Each key-value pair in `data.attributes` maps from an accessor prop name (e.g. `getPosition`) to one of the following formats: -- `value` ([TypedArray](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/TypedArray)) - a flat buffer that stacks the value for each object that would otherwise be returned by a function accessor. For example, `getPosition: d => [d.x, d.y, d.z]` is equivalent to `getPosition: {value: new Float64Array([x0, y0, z0, x1, y1, z1, ...])}`. This buffer can be constructed on the server side by something like `data.flatMap(d => [d.x, d.y, d.z])`. -- `size` (Number) - the number of elements per value in the typed array. For example, an RGB color has size 3, and an RGBA color has size 4. -- `elementStride` (Number, optional) - the length per vertex in the typed array. This is needed if the buffer contains interleved information that should not be used for this accessor. Default to `size`. -- `elementOffset` (Number, optional) - the number of elements into each vertex where the value starts. Default to `0`. -- `bufferLayout` (Array, optional) - the number of vertices per object. This is needed by various-width data sources such as `PathLayer` and `PolygonLayer`, referring to the number of points per path/polygon respectively. For most layers, it is assumed to be `[1, 1, 1, ...]`. + - luma.gl `Buffer` instance + - A typed array + - An object containing the following optional fields. For more information, see [WebGL vertex attribute API](https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/vertexAttribPointer). + + `buffer` (Buffer) + + `value` ([TypedArray](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/TypedArray)) + + `size` (Number) - the number of elements per vertex attribute. + + `offset` (Number) - offset of the first vertex attribute into the buffer, in bytes + + `stride` (Number) - the offset between the beginning of consecutive vertex attributes, in bytes + +The `value` array represents a flat buffer that stacks the value for each object that would otherwise be returned by a function accessor. For example, `getPosition: d => [d.x, d.y, d.z]` is equivalent to `getPosition: {value: new Float64Array([x0, y0, z0, x1, y1, z1, ...])}`. This buffer can be constructed on the server side by something like `data.flatMap(d => [d.x, d.y, d.z])`. + +Additionally, `attribute.vertexCounts` must be specified if the attributes contain variable-width data, for example paths and polygons. `vertexCounts` must be an array that contains the number of vertices at each object index. For most layers, it is assumed to be `new Array(data.length).fill(1)`. ```js // EXAMPLE 1 - PointCloudLayer @@ -108,12 +124,12 @@ A layer prop of type `accessor` can optionally be a string. If a string is provi new PointCloudLayer({ data: { length: binaryData.length / 6, - positions: {value: binaryData, size: 2, elementStride: 6, elementOffset: 0}, - colors: {value: binaryData, size: 4, elementStride: 6, elementOffset: 2, normalized: true} - }, - getPosition: 'positions', - getColor: 'colors', - getNormal: [0, 0, 0] + attributes: { + getPosition: {value: binaryData, size: 2, elementStride: 6, elementOffset: 0}, + getColor: {value: binaryData, size: 4, elementStride: 6, elementOffset: 2, normalized: true}, + getNormal: {value: [0, 0, 0], constant: true} + } + } }); ``` @@ -135,17 +151,18 @@ new PointCloudLayer({ new PathLayer({ data: { length: binaryData.poingCountPerPath.length, - paths: {value: binaryData.positions, size: 2, bufferLayout: binaryData.pointCountPerPath}, - colors: {value: binaryData.colors, size: 3, bufferLayout: pointCountPerPath} - }, - getPath: 'paths', - getColor: 'colors' + vertexCounts: binaryData.pointCountPerPath, + attributes: { + getPath: {value: binaryData.positions, size: 2}, + getColor: {value: binaryData.colors, size: 3} + } + } }); ``` Implementation notes: -* If an accessor turns out to be a string, the `AttributeManager` will attempt to retrieve the information from `data` and pass it to the `Attribute` instance. +* If an accessor name appears in `data.attributes`, the `AttributeManager` will pass it to the `Attribute` instance as a "logical attribute". * For auto-updated attributes (most common), the attribute will directly upload the buffer to GPU, skipping the local packing step. * For attributes that require CPU-based processing (e.g. polygon normalization, path tesselation, icon mapping), the buffer will be treated as a source from which the auto updater draws values from, similar to calling an accessor function for each object. @@ -156,17 +173,6 @@ The advantages of this new API include: * loaders.gl friendly. When loading from an URL with e.g. a CSV or Arrow loader, the binaries will be accessible by specifying a "path" into the parsed data. * Future extensibility. This new API will allow users to prepare data and structure applications in a way that may naturally transition into custom GPU-based data processing, e.g. -```js -new PointCloudLayer({ - data: { - ... - longitude: {value: , size: 1}, - latitude: {value: , size: 1} - }, - getPosition: new GPUData({sources: ['longitude', 'latitude'], operation: 'join'}) -}); -``` - Relevant RFCs for full data frame support: - [**GLSL Accessor RFC**](/dev-docs/v7.x-binary/glsl-accessor-rfc.md)