Skip to content

Commit

Permalink
webgpu: update dataToGPU doc
Browse files Browse the repository at this point in the history
  • Loading branch information
xhcao committed Sep 1, 2022
1 parent 6d2b46b commit 990f5b1
Showing 1 changed file with 100 additions and 53 deletions.
153 changes: 100 additions & 53 deletions docs/OPTIMIZATION_PURE_GPU_PIPELINE.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,100 @@
# Optimization Topic - How to write a high performance GPU pipeline
This document shows how to write a high performance GPU pipeline with TF.js. A typical ML pipeline is composed of some ML steps (e.g. taking some inputs, running a model and getting outputs) and non-ML steps (e.g. image processing, rendering, etc.). General rule of thumb is that if the whole pipeline can run on GPU without downloading any data to CPU, it is usually much faster than a fragmented pipeline that requires data transfer between CPU and GPU. This additional time for GPU to CPU and CPU to GPU sync adds to the pipeline latency.

In the latest TF.js release (version 3.13.0), we provide a new tensor
API that allows data to stay in GPU. Our users are pretty faimilar with
`tensor.dataSync()` and `tensor.data()`, however both will download data from GPU to CPU. In the latest release, we have added `tensor.dataToGPU()`,
which returns a `GPUData` type.

The example below shows how to get the tensor texture by calling
`datatoGPU`. Also see this [example code](https://github.com/tensorflow/tfjs-examples/tree/master/gpu-pipeline) for details.

```javascript
// Getting the texture that holds the data on GPU.
// data has below fields:
// {texture, texShape, tensorRef}
// texture: A WebGLTexture.
// texShape: the [height, width] of the texture.
// tensorRef: the tensor associated with the texture.
//
// Here the tensor has a shape of [1, videoHeight, videoWidth, 4],
// which represents an image. In this case, we can specify the texture
// to use the image's shape as its shape.
const data =
tensor.dataToGPU({customTexShape: [videoHeight, videoWidth]});

// Once we have the texture, we can bind it and pass it to the downstream
// webgl processing steps to use the texture.
gl.bindTexture(gl.TEXTURE_2D, data.texture);

// Some webgl processing steps.

// Once all the processing is done, we can render the result on the canvas.
gl.blitFramebuffer(0, 0, videoWidth, videoHeight, 0, videoHeight, videoWidth, 0, gl.COLOR_BUFFER_BIT, gl.LINEAR);

// Remember to dispose the texture after use, otherwise, there will be
// memory leak.
data.tensorRef.dispose();
```

The texture from `dataToGPU()` has a specific format. The data is densely
packed, meaning all four channels (r, g, b, a) in a texel is used to store
data. This is how the texture with tensor shape = [2, 3, 4] looks like:

```
000|001|002|003 010|011|012|013 020|021|022|023
100|101|102|103 110|111|112|113 120|121|122|123
```

So if a tensor's shape is `[1, height, width, 4]` or `[height, width, 4]`, the way we store the data aligns with how image is stored, therefore if the
downstream processing assumes the texture is an image, it doesn't need to
do any additional coordination conversion. Each texel is mapped to a [CSS pixel](https://developer.mozilla.org/en-US/docs/Glossary/CSS_pixel) in an image. However, if the tensor has other shapes, downstream
processing steps may need to reformat the data onto another texture. The `texShape` field has the `[height, width]` information of the texture, which can be used for transformation.
# Optimization Topic - How to write a high performance GPU pipeline
This document shows how to write a high performance GPU pipeline with TF.js. A typical ML pipeline is composed of some ML steps (e.g. taking some inputs, running a model and getting outputs) and non-ML steps (e.g. image processing, rendering, etc.). General rule of thumb is that if the whole pipeline can run on GPU without downloading any data to CPU, it is usually much faster than a fragmented pipeline that requires data transfer between CPU and GPU. This additional time for GPU to CPU and CPU to GPU sync adds to the pipeline latency.

In the latest TF.js release (version 3.13.0), we provide a new tensor
API that allows data to stay in GPU. Our users are pretty faimilar with
`tensor.dataSync()` and `tensor.data()`, however both will download data from GPU to CPU. In the latest release, we have added `tensor.dataToGPU()`,
which returns a `GPUData` type.

# For webgl backend
The example below shows how to get the tensor texture by calling
`datatoGPU`. Also see this [example code](https://github.com/tensorflow/tfjs-examples/tree/master/gpu-pipeline/webgl) for details.

```javascript
// Getting the texture that holds the data on GPU.
// data has below fields:
// {texture, texShape, tensorRef}
// texture: A WebGLTexture.
// texShape: the [height, width] of the texture.
// tensorRef: the tensor associated with the texture.
//
// Here the tensor has a shape of [1, videoHeight, videoWidth, 4],
// which represents an image. In this case, we can specify the texture
// to use the image's shape as its shape.
const data =
tensor.dataToGPU({customTexShape: [videoHeight, videoWidth]});

// Once we have the texture, we can bind it and pass it to the downstream
// webgl processing steps to use the texture.
gl.bindTexture(gl.TEXTURE_2D, data.texture);

// Some webgl processing steps.

// Once all the processing is done, we can render the result on the canvas.
gl.blitFramebuffer(0, 0, videoWidth, videoHeight, 0, videoHeight, videoWidth, 0, gl.COLOR_BUFFER_BIT, gl.LINEAR);

// Remember to dispose the texture after use, otherwise, there will be
// memory leak.
data.tensorRef.dispose();
```

The texture from `dataToGPU()` has a specific format. The data is densely
packed, meaning all four channels (r, g, b, a) in a texel is used to store
data. This is how the texture with tensor shape = [2, 3, 4] looks like:

```
000|001|002|003 010|011|012|013 020|021|022|023
100|101|102|103 110|111|112|113 120|121|122|123
```

So if a tensor's shape is `[1, height, width, 4]` or `[height, width, 4]`, the way we store the data aligns with how image is stored, therefore if the
downstream processing assumes the texture is an image, it doesn't need to
do any additional coordination conversion. Each texel is mapped to a [CSS pixel](https://developer.mozilla.org/en-US/docs/Glossary/CSS_pixel) in an image. However, if the tensor has other shapes, downstream
processing steps may need to reformat the data onto another texture. The `texShape` field has the `[height, width]` information of the texture, which can be used for transformation.

# For webgpu backend
The example below shows how to get the tensor buffer by calling
`datatoGPU`. Also see this [example code](https://github.com/tensorflow/tfjs-examples/tree/master/gpu-pipeline/webgpu) for details.

```javascript
// Getting the buffer that holds the data on GPU.
// data has below fields:
// {buffer, bufSize, tensorRef}
// buffer: A GPUBuffer.
// bufSize: the size of the buffer.
// tensorRef: the tensor associated with the buffer.
//
// Unlike webgl backend, There is no parameter for dataGPU api on the webgpu
// backend, and the size of returned buffer is the same as the tensor size.
const data = tensor.dataToGPU();

// Once we have the buffer, we can bind it and pass it to the downstream
// webgpu processing steps to use the buffer.
const uniformBindGroup = device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [
{
binding: 1,
resource: {
buffer: data.buffer,
},
},
],
});

// Some webgpu processing steps.

// Defines the mapping between resources of all GPUBindGroup objects
passEncoder.setBindGroup(0, uniformBindGroup);

// Some webgpu processing steps.

// Remember to dispose the buffer after use, otherwise, there will be
// memory leak.
data.tensorRef.dispose();
```

The dataToGPU exports a GPUBuffer object, whose content and size is the same as
the source tensor on the webgpu backend, it is user's responsibility to parse
the content and size on the downsream webgpu processing steps.

0 comments on commit 990f5b1

Please sign in to comment.