Skip to content

Conversation

@xhcao
Copy link
Contributor

@xhcao xhcao commented Dec 22, 2022

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

Copy link

@gyagp gyagp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit

}
else if (k >= depthBegin && k < depthEnd){
var dyi = -2.0 * uniforms.alpha * uniforms.beta
* getInputImage(b ,r ,c, k) * getOutputImage(b, r, c, d) / norm;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
* getInputImage(b ,r ,c, k) * getOutputImage(b, r, c, d) / norm;
* getInputImage(b, r, c, k) * getOutputImage(b, r, c, d) / norm;

You may also fix the WebGL code.

dispatch: [number, number, number];
variableNames = ['inputImage', 'outputImage', 'dy'];
uniforms =
'depth : i32, depthRadius : i32, bias : f32, alpha : f32, beta : f32,';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that depth is not necessary as a uniform. You can use let MAX_DEPTH_END = uniforms.outShape[3]; in shader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let depthEnd = min(uniforms.depth, d + uniforms.depthRadius + 1);

let MIN_DEPTH_BEGIN = 0;
let MAX_DEPTH_END = uniforms.depth;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move L54-L55 out of for (var d = 0; d < uniforms.depth; d++). It seems that you can also use MAX_DEPTH_END for the outermost for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more comments. Sorry missed them in last review.

continue;
}
else if (k >= depthBegin && k < depthEnd) {
norm += getInputImage(b, r, c, k) * getInputImage(b, r, c, k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems getInputImage(b, r, c, k) is called twice. Is it better to cache it like below:

let inputValue = getInputImage(b, r, c, k);
norm += inputValue  * inputValue ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cost one more register here, let compiler optimizes the code, Is it OK?

}
else if (k >= depthBegin && k < depthEnd){
var dyi = -2.0 * uniforms.alpha * uniforms.beta
* getInputImage(b, r, c, k) * getOutputImage(b, r, c, d) / norm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOutputImage(b, r, c, d) should be put out of for(var k = MIN_DEPTH_BEGIN; k < MAX_DEPTH_END; k++) since the arguments are never changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because getOutputImage(b, r, c, d) is in if-else branch, other branches may does not need to access memory, it is not necessary to put the IO access out of for-loop.

Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@xhcao xhcao merged commit 8fa597b into tensorflow:master Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants