Avoid allocating a large arraybuffer when loading weights #7598

mattsoulanille · 2023-04-18T01:48:18Z

The loadWeights function loads weights in 4MB chunks and then concatenates them into a single large ArrayBuffer. That ArrayBuffer is used for splitting the weights data back up into tensors. Allocating large ArrayBuffers (3.5GB) can be unstable on Chrome, so this PR avoids this allocation, instead slicing the weights out of the chunks manually.

The implementation wraps the array of weights (stored as ArrayBuffer[]) in a new CompositeArrayBuffer class. This class implements slice by copying the desired range out of the buffer(s) that it overlaps with.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

mattsoulanille · 2023-04-18T01:49:31Z

@pyu10055 This isn't a full review request yet, but I'm interested in knowing which approach you think is better. Thanks!

chunnienc · 2023-04-18T02:11:11Z

If the size of each chunk is fixed (4MB) except for the last one, you can get start and end offsets in O(1) by division and module. No need to do binary search or two-pointers approach on sorted data.

In terms of API, I prefer option 2, which is a better implementation for separation of concerns. In the weight loader I just need to specify where to slice and don't need to worry about how to slice. You cam implement lazy/offline slicer If performance is a concern.

mattsoulanille · 2023-04-18T16:23:28Z

If the size of each chunk is fixed (4MB) except for the last one, you can get start and end offsets in O(1) by division and module. No need to do binary search or two-pointers approach on sorted data.

In terms of API, I prefer option 2, which is a better implementation for separation of concerns. In the weight loader I just need to specify where to slice and don't need to worry about how to slice. You cam implement lazy/offline slicer If performance is a concern.

Unfortunately, there's no guarantee on the size of the chunks. We let people configure it when converting the model. They should all be the same size, but I'd even hesitate to assume that, since it seems a bit flaky.

I agree with you and also prefer option 2. I'll implement it as a binsearch for now, and if we need better perf, I can try to automatically detect the chunk size or make it check chunks near the last one read before doing a full binsearch.

mattsoulanille · 2023-04-18T22:15:08Z

Looking at this again, it doesn't actually prevent us from storing the weights in a single ArrayBuffer. ModelArtifacts contains the weightData key, which stores the model weights as a single ArrayBuffer. This is constructed by IOHandlers like http, which load the weights and then concatenate them into a single ArrayBuffer.

I'll leave this PR as-is and create a new one to fix this issue.

mattsoulanille · 2023-04-20T00:13:25Z

tfjs-core/src/io/weights_loader.ts

+  buffer: ArrayBuffer,
+};
+
+export class CompositeArrayBuffer {


This will be used in another PR that enables large model weights to be stored in a list of ArrayBuffers. That's why it's exported here.

…buffer

mattsoulanille · 2023-04-20T00:19:32Z

Looking at this again, it doesn't actually prevent us from storing the weights in a single ArrayBuffer. ModelArtifacts contains the weightData key, which stores the model weights as a single ArrayBuffer. This is constructed by IOHandlers like http, which load the weights and then concatenate them into a single ArrayBuffer.

I'll leave this PR as-is and create a new one to fix this issue.

I'm sending this out for review since it's easier to review separately from the other part of the large weights fix. I'll submit the other part, which integrates this code with the rest of the codebase, in a separate PR.

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @chunnienc and @mattsoulanille)

tfjs-core/src/io/weights_loader.ts line 290 at r2 (raw file):

      }
    }
    return outputBuffer

missing ;

tfjs-core/src/io/weights_loader.ts line 292 at r2 (raw file):

    return outputBuffer
  }
  private search(byteIndex: number) {

this could be improved if the searching is in order, I believe that is how our weights are setup.

tfjs-core/src/io/weights_loader.ts line 245 at r6 (raw file):

Previously, mattsoulanille (Matthew Soulanille) wrote…

This will be used in another PR that enables large model weights to be stored in a list of ArrayBuffers. That's why it's exported here.

It might be good to be in a separate file

tfjs-core/src/io/weights_loader.ts line 272 at r9 (raw file):

    let start = 0;

    for (let i = 0; i < buffers.length; i++) {

start from 1?

chunnienc · 2023-04-20T17:22:07Z

tfjs-core/src/io/weights_loader.ts

+
+      // Create the ranges, including their start and end points.
+      const end = start + buffer.byteLength;
+      this.ranges.push({buffer, start, end,});


nit: remove ',' or format to multiple lines

chunnienc · 2023-04-20T17:26:22Z

tfjs-core/src/io/weights_loader.ts

+  }
+
+  slice(start = 0, end = this.byteLength): ArrayBuffer {
+    // NaN is treated as zero for slicing. This matches ArrayBuffer's behavior.


convert start and end to Number with Number(...) before checking NaN?

Update:
I assume you add these nan checks because you think there may be calls from JS now or future which ignores the typescript type check. In these way I'd suggest to do start = Number(start) since isNaN('123') returns false and ArrayBuffer.prototype.slice accepts the numbers in strings.

I added these NaN checks because some of the tests were failing (they intentionally gave it no datatype, which I think eventually resulted in a NaN being passed to slice (since tfjs didn't know the byte length of the datatype), so I think the tests themselves are correct). I'd like this to match ArrayBuffer.slice as closely as possible, so I implemented your comment.

chunnienc · 2023-04-20T17:28:02Z

tfjs-core/src/io/weights_loader.ts

@@ -245,3 +235,180 @@ export function weightsLoaderFactory(
    return weightsTensorMap;
  };
 }
+
+type BufferRange = {


Naming: range -> chunk/shard/partition
And all related variable and function names

Good point. That's a much better name. Fixed.

pyu10055

Reviewed 3 of 4 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @chunnienc and @mattsoulanille)

* webgpu: Fix a bug in softmax (#7607) * Avoid allocating a large arraybuffer when loading weights (#7598) The loadWeights function loads weights in 4MB chunks and then concatenates them into a single large ArrayBuffer. That ArrayBuffer is used for splitting the weights data back up into tensors. Allocating large ArrayBuffers (3.5GB) can be unstable on Chrome, so this PR avoids this allocation, instead slicing the weights out of the chunks manually. The implementation wraps the array of weights (stored as ArrayBuffer[]) in a new CompositeArrayBuffer class. This class implements slice by copying the desired range out of the buffer(s) that it overlaps with. * Support using a list of ArrayBuffers as model weight data * Avoid 'Array.flat()' * Simplify some of the tests * Do not export 'CompositeArrayBuffer' from tfjs-core * Update doc for weightData * Fix tfjs-node * Remove unused import --------- Co-authored-by: Jiajia Qin <jiajia.qin@intel.com>

Chrome ArrayBuffers throw allocation errors above 2GB in size. This makes it impossible to load TFJS models above this size in Chrome (even with weight sharding) because model loading involves concatenating all the weights into a single ArrayBuffer. This PR avoids this concatenation. Instead of slicing the weight tensors out of a single concatenated ArrayBuffer, it keeps the weight buffers in their original shards and slices them using the CompositeArrayBuffer class created in #7598.

Implement two methods for avoiding large arraybuffer

f7be3c6

mattsoulanille added 5 commits April 18, 2023 09:36

Use the CompositeArrayBuffer method

076c8a2

Implement binsearch

bca64fd

Check the last used range first for efficiency

0d26418

Optimize for when buffers have the same size

0051fd7

Replace recursive binsearch with iterative

c5b4f8e

mattsoulanille marked this pull request as ready for review April 18, 2023 21:33

mattsoulanille added 4 commits April 18, 2023 14:33

Merge branch 'master' into avoid_large_arraybuffer

49ffd05

Comments

05a9831

Remove commented code. Fix typo

0c4e517

Add @returns annotation to search function

a530b92

mattsoulanille changed the title ~~Avoid allocating a large arraybuffer to store the model weights~~ Avoid allocating a large arraybuffer when loading weights Apr 18, 2023

mattsoulanille added 3 commits April 19, 2023 13:18

Export and test CompositeArrayBuffer

0294b53

Support NaN as a start or end to CompositeArray slice

f81638f

CompositeArrayBuffer support TypedArrays in constructor

6a07547

mattsoulanille force-pushed the avoid_large_arraybuffer branch from 159fcd1 to 6a07547 Compare April 20, 2023 00:12

mattsoulanille commented Apr 20, 2023

View reviewed changes

mattsoulanille added 3 commits April 19, 2023 17:14

Formatting

61caf8c

Merge remote-tracking branch 'upstream/master' into avoid_large_array…

029ec8f

…buffer

Lint

b1fe5eb

mattsoulanille requested review from pyu10055 and chunnienc April 20, 2023 00:19

Fix slicing out of order

b86ee6e

mattsoulanille force-pushed the avoid_large_arraybuffer branch from ee87834 to b86ee6e Compare April 20, 2023 00:31

fix lint

7e88a89

Merge branch 'master' into avoid_large_arraybuffer

8287312

pyu10055 requested changes Apr 20, 2023

View reviewed changes

Document CompositeArrayBuffer

f2fa6f8

chunnienc approved these changes Apr 20, 2023

View reviewed changes

mattsoulanille added 2 commits April 20, 2023 10:38

Rename range -> shard

91db31a

Move CompositeArrayBuffer to a new file

26c51df

mattsoulanille requested a review from pyu10055 April 20, 2023 17:47

Add license

ec9a382

pyu10055 approved these changes Apr 20, 2023

View reviewed changes

mattsoulanille mentioned this pull request Apr 20, 2023

Support loading models with weights above 2GB on Chrome #7609

Merged

mattsoulanille merged commit 3ceace9 into tensorflow:master Apr 20, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid allocating a large arraybuffer when loading weights #7598

Avoid allocating a large arraybuffer when loading weights #7598

mattsoulanille commented Apr 18, 2023 •

edited

mattsoulanille commented Apr 18, 2023

chunnienc commented Apr 18, 2023 •

edited

mattsoulanille commented Apr 18, 2023

mattsoulanille commented Apr 18, 2023 •

edited

mattsoulanille Apr 20, 2023

mattsoulanille commented Apr 20, 2023

pyu10055 left a comment

chunnienc Apr 20, 2023

chunnienc Apr 20, 2023

chunnienc Apr 20, 2023 •

edited

mattsoulanille Apr 20, 2023 •

edited

chunnienc Apr 20, 2023

mattsoulanille Apr 20, 2023

pyu10055 left a comment

Avoid allocating a large arraybuffer when loading weights #7598

Avoid allocating a large arraybuffer when loading weights #7598

Conversation

mattsoulanille commented Apr 18, 2023 • edited

mattsoulanille commented Apr 18, 2023

chunnienc commented Apr 18, 2023 • edited

mattsoulanille commented Apr 18, 2023

mattsoulanille commented Apr 18, 2023 • edited

mattsoulanille Apr 20, 2023

Choose a reason for hiding this comment

mattsoulanille commented Apr 20, 2023

pyu10055 left a comment

Choose a reason for hiding this comment

chunnienc Apr 20, 2023

Choose a reason for hiding this comment

chunnienc Apr 20, 2023

Choose a reason for hiding this comment

chunnienc Apr 20, 2023 • edited

Choose a reason for hiding this comment

mattsoulanille Apr 20, 2023 • edited

Choose a reason for hiding this comment

chunnienc Apr 20, 2023

Choose a reason for hiding this comment

mattsoulanille Apr 20, 2023

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

mattsoulanille commented Apr 18, 2023 •

edited

chunnienc commented Apr 18, 2023 •

edited

mattsoulanille commented Apr 18, 2023 •

edited

chunnienc Apr 20, 2023 •

edited

mattsoulanille Apr 20, 2023 •

edited