-
Notifications
You must be signed in to change notification settings - Fork 2k
[WASM] Transpose the filter of the convolution before calling xnnpack. #2344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
|
@googlebot I consent. |
Maratyszcza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is more complicated than I expected. It would be sufficient to always do just a 2D transpose [M, N] -> [N, M], where M = kernel height, kernel width, input channels and N = output channels.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @Maratyszcza)
|
Most of the complexity is actually just moving transpose to a shared implementation, however you are right a direct 2d transpose is possible here (we do this internally anyways) but just to be sure I made it 2d. |
Maratyszcza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @nsthorat)
tfjs-backend-wasm/src/cc/kernels/Conv2D.cc, line 112 at r4 (raw file):
// This can be transposed with a 2d transpose to move output_channels to the // outer most dimension. float* transposed_filter = new float[filter_info.size]();
It is better to avoid directly allocating memory via new as it can lead to memory leaks. A safer way would be to create an std::vector<float> transposed_filter(filter_info.size()). The memory for the vector will be automatically released when it goes out of scope.
Maratyszcza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @nsthorat)
nsthorat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @Maratyszcza)
tfjs-backend-wasm/src/cc/kernels/Conv2D.cc, line 112 at r4 (raw file):
Previously, Maratyszcza (Marat Dukhan) wrote…
It is better to avoid directly allocating memory via
newas it can lead to memory leaks. A safer way would be to create anstd::vector<float> transposed_filter(filter_info.size()). The memory for the vector will be automatically released when it goes out of scope.
Thanks! Done!
dsmilkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! That transpose should be pretty fast and it's only a one-time cost.
Reviewed 4 of 11 files at r1, 1 of 6 files at r3, 2 of 3 files at r4, 1 of 1 files at r5.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @Maratyszcza)
|
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |
XNNPack expects kernels in the following format:
[output channels, kernel height, kernel width, input channels]
TensorFlow and TensorFlow.js use the following format:
[kernel height, kernel width, input channels, output channels]
This PR transposes the filter when an XNNPack kernel is created. Since XNNPack keeps a copy of the filter we transpose, call xnn pack, and immediately throw out the transposed kernel.
To share transposed I moved the body of Transpose.cc to a separate transpose_impl.cc/h (they must be named differently beyond capitalization or bazel gets confused).
This change is