This repository has been archived by the owner on Aug 15, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 950
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is the benchmarking code (removed it from the PR) it('benchmark matmul sq matrix', async done => {
const backend = tf.ENV.backend as MathBackendCPU;
const bs = [32, 48, 64, (64 / 2) + 64, 128, (128 / 2) + 128];
const ns = [64, 128, 192, 256, 239, 398, 512];
const RUNS = 20;
for (const n of ns) {
const a = tf.randomUniform([n, n]) as tf.Tensor2D;
const b = tf.randomUniform([n, n]) as tf.Tensor2D;
// Warmup.
backend.matMulNaive(a, b, false, false).dataSync();
let res: tf.Tensor = null;
const start = now();
for (let i = 0; i < RUNS; i++) {
res = backend.matMulNaive(a, b, false, false);
}
res.dataSync();
const naiveTime = (now() - start) / RUNS;
console.log(`N: ${n}\t ${naiveTime.toFixed(2)}ms`);
for (const blockSize of bs) {
backend.blockSize = blockSize;
const a = tf.randomUniform([n, n]) as tf.Tensor2D;
const b = tf.randomUniform([n, n]) as tf.Tensor2D;
// Warmup.
backend.matMul(a, b, false, false).dataSync();
let res: tf.Tensor = null;
const start = now();
for (let i = 0; i < RUNS; i++) {
res = backend.matMul(a, b, false, false);
}
res.dataSync();
const elapsed = (now() - start) / RUNS;
const speedup = (naiveTime / elapsed).toFixed(2);
console.log(
`mul BS: ${blockSize}\t ${elapsed.toFixed(2)} ms\t speedup: ${
speedup}x\t diff:${(elapsed - naiveTime).toFixed(2)}ms`);
await tf.nextFrame();
}
}
done();
}); |
dsmilkov
approved these changes
Aug 9, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4.
Reviewable status: 0 of 1 approvals obtained
By running tests on matrices we found that the block size for the cache blocked matrix multiply was 48. We suffer a performance penalty of 0.5-1ms on small matrices but gain 100s of milliseconds on large matrices. |
dsmilkov
approved these changes
Aug 9, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5.
Reviewable status: complete! 1 of 1 approvals obtained
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Currently: 1.4x times speed up on a 512x512 matrix-matrix matmul.
In response to tensorflow/tfjs#582
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is