Batch matmul #1259

annxingyuan · 2018-08-30T13:08:24Z

This PR changes matmul so that it accepts any two tensors of rank >= 2. This mimics the behavior of Tensorflow in other languages.

The matmul op now passes its inputs to the kernels as 3D tensors, collapsing the batch dimensions together and creating a batch dimension of 1 if necessary. The kernels perform straightforward matmul on each batch element and return a 3D tensor, which is reshaped to its original dimensions before being returned.

Benchmark results

I ran matmul and downloaded the result 200 times for matrices of sizes 1, 100, 400, and 1000 across this branch and master. As we would expect, the results were very similar:

master: 7, 7, 12, 96 (miliseconds per operation)
batch_matmul: 7, 7, 10, 95

For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md

This change is

nsthorat · 2018-08-30T15:55:37Z

About to review, but just wanted to show you the travis build. If you look at the bottom of the PR you'll see this (build is failing):

If you click on "details" then click on whichever node version is failing (in this case both), you'll see the problem.

You can run the linter locally with yarn lint.

nsthorat · 2018-08-30T15:58:05Z

Also I think you can set up Sublime with clang-format so that some of these things like line length are done automatically.

The clang-format style we use is "Google", which you can see here: https://github.com/tensorflow/tfjs-core/blob/master/.vscode/settings.json#L24

nsthorat

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)

src/kernels/backend_cpu.ts, line 428 at r2 (raw file):

          }
        }
      }

remove these trailing spaces

can you set up clang-format with sublime?

src/ops/matmul.ts, line 25 at r2 (raw file):

import {op} from './operation';

function computeBatchDimension_(shape: number[]) {

return type here

src/ops/matmul.ts, line 45 at r2 (raw file):

/** @doc {heading: 'Operations', subheading: 'Matrices'} */
function matMul_<T extends Tensor>(
    a: T|TensorLike, b: T|TensorLike, transposeA = false,

what happens if we pass a vector? we should add a unit test for that

src/ops/matmul_batch_test.ts, line 1 at r2 (raw file):

/**

let's just put these in matmul_test

src/ops/matmul_batch_test.ts, line 3 at r2 (raw file):

/**
 * @license
 * Copyright 2017 Google Inc. All Rights Reserved.

2018 Google LLC.

src/ops/matmul_batch_test.ts, line 24 at r2 (raw file):

describeWithFlags('matmulBatch', ALL_ENVS, () => {
  it('A x B', () => {
    const a = tf.tensor3d([-5, -5, -6, 8, -2, -8, 4, -7, -6, -9, -1, 3, 7, -2, 5, -6, 3, 8, 7, -8, 1, 4, -4, 6, 4, -4, -9, -5, 2, -2], [5, 2, 3]);

can you add a test for a 4D tensor?

dsmilkov

This is awesome progress. Left a few comments, but this is already looking great.

Reviewed 1 of 7 files at r1, 4 of 4 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)

src/kernels/backend.ts, line 53 at r2 (raw file):

  floatPrecision(): number;

  matMul(a: Tensor3D, b: Tensor3D, transposeA: boolean, transposeB: boolean):

rename this method to batchMatMul (this is internal) so it's explicit that backends should do batched matmul and nodejs should bind to BatchMatMul Kernel in TF C++.

src/kernels/backend_cpu.ts, line 413 at r2 (raw file):

            const kBlock = Math.min(k0 + blockSize, sharedDim);

            for(let b = b0; b < bBlock; b++) {

does the batch need to be blocked? I was expecting a single outermost for loop over batch, and the blocked matmul should just treat batch as a constant.

src/kernels/backend_cpu.ts, line 419 at r2 (raw file):

                  for (let k = k0; k < kBlock; k++) {
                    sum += aValues[b * aBatchStrides + i * aOuterStep + k * aInnerStep] *

wrap to 80 width. See https://github.com/tensorflow/tfjs-core/blob/master/DEVELOPMENT.md#code-editor for pointers to clang-format.

src/kernels/backend_cpu.ts, line 428 at r2 (raw file):

          }
        }
      }

remove trailing whitespace

src/kernels/webgl/mulmat_gpu.ts, line 57 at r2 (raw file):

          getMatrixB(${bSnippetFromOffset(2, 'i')}),
          getMatrixB(${bSnippetFromOffset(3, 'i')})
        );

remove trailing whitespace

src/ops/matmul.ts, line 25 at r2 (raw file):

import {op} from './operation';

function computeBatchDimension_(shape: number[]) {

no need for trailing underscore for private methods - we do this for the ops only because we export as matmul = op({matmul_}) and we want to avoid name conflict.

src/ops/matmul.ts, line 26 at r2 (raw file):

function computeBatchDimension_(shape: number[]) {
  return shape.slice(0, -2).reduce((acc, curr) => acc * curr, 1);

reuse util.sizeFromShape(shape) instead of doing reduce.

src/ops/matmul.ts, line 62 at r2 (raw file):

  util.assert(
      $a.rank >= 2 && $b.rank >= 2 && $a.rank === $b.rank,
      `Error in matMul: inputs must have the same rank of at least 2, got ranks ${$a.rank}` +

wrap this to 80 width, otherwise the error message will have a newline character in it.

src/ops/matmul.ts, line 66 at r2 (raw file):

  util.assert(
      batchDimA === batchDimB,

according to the doc in tf.matmul in Python, "all outer dimensions must match", so checking the product is not sufficient. You can use util.arraysEqual() to assert that the outer-most shapes match.

src/ops/matmul.ts, line 67 at r2 (raw file):

  util.assert(
      batchDimA === batchDimB,
      `Error in matMul: batch dimensions (${batchDimA}) and (` +

This error message is slightly misinformative to the user, because batchDimA is implicitly computed by multiplying the outer dimensions, but the user doesn't know that. Pass the outer shapes in the error message.

dsmilkov

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)

src/ops/matmul_batch_test.ts, line 54 at r2 (raw file):

  });

  it('annyuanA^t x B^t', () => {

left over debugging snippet. FYI, you can just rename it() to fit() and karma will run only that test (no need to do --grep )

annxingyuan

Thanks guys. I think this is ready for another look.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

src/kernels/backend.ts, line 53 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

rename this method to batchMatMul (this is internal) so it's explicit that backends should do batched matmul and nodejs should bind to BatchMatMul Kernel in TF C++.

Done

src/kernels/backend_cpu.ts, line 413 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

does the batch need to be blocked? I was expecting a single outermost for loop over batch, and the blocked matmul should just treat batch as a constant.

Done

src/kernels/backend_cpu.ts, line 419 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

wrap to 80 width. See https://github.com/tensorflow/tfjs-core/blob/master/DEVELOPMENT.md#code-editor for pointers to clang-format.

Done

src/kernels/backend_cpu.ts, line 428 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

remove these trailing spaces

can you set up clang-format with sublime?

Done

src/kernels/backend_cpu.ts, line 428 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

remove trailing whitespace

Done

src/kernels/webgl/mulmat_gpu.ts, line 57 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

remove trailing whitespace

Done

src/ops/matmul.ts, line 25 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

return type here

Done

src/ops/matmul.ts, line 25 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

no need for trailing underscore for private methods - we do this for the ops only because we export as matmul = op({matmul_}) and we want to avoid name conflict.

Done

src/ops/matmul.ts, line 26 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

reuse util.sizeFromShape(shape) instead of doing reduce.

Done

src/ops/matmul.ts, line 45 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

what happens if we pass a vector? we should add a unit test for that

We should want matmul to throw an error in that case right? That seems to be what tensorflow python does.

src/ops/matmul.ts, line 62 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

wrap this to 80 width, otherwise the error message will have a newline character in it.

Done

src/ops/matmul.ts, line 66 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

according to the doc in tf.matmul in Python, "all outer dimensions must match", so checking the product is not sufficient. You can use util.arraysEqual() to assert that the outer-most shapes match.

Done

src/ops/matmul.ts, line 67 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

This error message is slightly misinformative to the user, because batchDimA is implicitly computed by multiplying the outer dimensions, but the user doesn't know that. Pass the outer shapes in the error message.

Done

src/ops/matmul_batch_test.ts, line 1 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

let's just put these in matmul_test

Done

src/ops/matmul_batch_test.ts, line 3 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

2018 Google LLC.

Done

src/ops/matmul_batch_test.ts, line 24 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you add a test for a 4D tensor?

Done

src/ops/matmul_batch_test.ts, line 54 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

left over debugging snippet. FYI, you can just rename it() to fit() and karma will run only that test (no need to do --grep )

Done

dsmilkov

Nice work! LGTM (one tiny comment in cpu matmul)

Reviewed 8 of 8 files at r3.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

src/kernels/backend_cpu.ts, line 423 at r3 (raw file):

                      bValues[k * bInnerStep + j * bOuterStep + b * bBatch];
                }
                result[b * (leftDim * rightDim) + (i * rightDim + j)] += sum;

precompute leftDim*rightDim outside all the for loops since it's constant. Usually I wouldn't push for this, but in this case, this is the hottest loop and we want to minimize cycles.

annxingyuan

Reviewable status: complete! 1 of 1 approvals obtained

src/kernels/backend_cpu.ts, line 423 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

precompute leftDim*rightDim outside all the for loops since it's constant. Usually I wouldn't push for this, but in this case, this is the hottest loop and we want to minimize cycles.

Done

annxingyuan added 24 commits August 28, 2018 11:33

matmul expects 3d tensors

cf7f1f4

revive assertions

f3ef501

fix computebatchdimension

6f38f73

update interfaces

f07f204

update more interfaces

bcf5399

typo

654c601

update modified matmul to more closely resemble original

1a6e4e2

type errors

65c6a63

convert matmul args to tensor

828179b

account for transposition in matmul

685a179

account for transpose

c2a5be6

cpu implementation of batch matmul

13f1328

reshape gradients

dd31bc6

when transposing the batch dimension is still always first

6db8ff5

fix indexing

479cfa6

explicit batch dimension to account for 0 row or column dim

6665d6b

adding tests

d680b76

finish gradient tests

f6b7f13

rename

e0a7805

change syntax

3adec0b

collapse line

197e914

Merge branch 'master' into batch_matmul

6f419b1

merge conflict

a929b4f

revive type check

9661de5

annxingyuan requested a review from dsmilkov August 30, 2018 13:39

nsthorat reviewed Aug 30, 2018

View reviewed changes

dsmilkov suggested changes Aug 30, 2018

View reviewed changes

dsmilkov reviewed Aug 30, 2018

View reviewed changes

annxingyuan added 8 commits August 30, 2018 14:02

line lengths

7a19d7f

remove log and fix test

2d298bd

fix error messages, linting

aee8e82

move matmul batch tests into matmul and add 4d cases

5ce8dee

add 4d gradient test

2846de4

add another 4d test case

72c6c1b

remove fit statement

0d56278

fix lint error

3b693ad

annxingyuan commented Aug 30, 2018

View reviewed changes

Merge branch 'master' into batch_matmul

1c4b77d

dsmilkov approved these changes Aug 30, 2018

View reviewed changes

precompute matrix size in matmul cpu

b101d2d

annxingyuan commented Aug 30, 2018

View reviewed changes

annxingyuan merged commit 4915101 into master Aug 30, 2018

annxingyuan deleted the batch_matmul branch August 30, 2018 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch matmul #1259

Batch matmul #1259

annxingyuan commented Aug 30, 2018 •

edited

nsthorat commented Aug 30, 2018

nsthorat commented Aug 30, 2018

nsthorat left a comment

dsmilkov left a comment

dsmilkov left a comment

annxingyuan left a comment

dsmilkov left a comment

annxingyuan left a comment

Batch matmul #1259

Batch matmul #1259

Conversation

annxingyuan commented Aug 30, 2018 • edited

Benchmark results

For repository owners only:

nsthorat commented Aug 30, 2018

nsthorat commented Aug 30, 2018

nsthorat left a comment

Choose a reason for hiding this comment

dsmilkov left a comment

Choose a reason for hiding this comment

dsmilkov left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

dsmilkov left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

annxingyuan commented Aug 30, 2018 •

edited