You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we should somehow propagate the changes forward if any parameter is an L value reference type. I think we could do this by just forwarding the array_ref for the parameter passed as reference instead of creating a temporary _grad*.
I feel like what I proposed should work. Again, I suppose this requires some debate and some input from some more experienced folks but here is my argument:
Any function call can essentially be swapped into the original function code (or inlined to be more precise), so different combinations and their possible inlines may look like the following:
This is the target function we will use, it will contain a call to a void function or a call-assign to a non-void function
floatfunc(float x, float y){
// some ops
y = helper_*(x, y);
//OR
helper_ *(x, y);
// some ops
}
Only one reference variable + no return.
voidhelper_void_ref(float& x){
x = x * x;
}
results in the following conversion:
floatfunc(float x, float y){
// some opshelper_void_ref(x);
// some ops
}
TO:
floatfunc(float x, float y){
// some ops
x = x * x;
// some ops
}
One/pure reference variable(s) with non-void return.
floathelper_ret_ref(float& x){
return (x = x * x);
}
results in the following conversion:
floatfunc(float x, float y){
// some ops
y = helper_ret_ref(x);
// some ops
}
TO:
floatfunc(float x, float y){
// some ops
x = x * x;
y = x;
// some ops
}
Pass by value with non void return (usual case).
floathelper_ret_val(float x){
return (x = x * x);
}
results in the following conversion:
floatfunc(float x, float y){
// some ops
y = helper_ret_val(x);
// some ops
}
TO:
floatfunc(float x, float y){
// some ops// pass by val so make a copy for x, use the temp value from this point on.float _temp_x = x;
_temp_x = _temp_x * _temp_x;
y = _temp_x;
// some ops; use x as usual beyond this point, destruct temp.
}
Mixed call with void return.
voidhelper_void_mixed(float& x, float y){
x = x * y;
}
results in the following conversion:
floatfunc(float x, float y){
// some opshelper_void_mixed(x, y);
// some ops
}
TO:
floatfunc(float x, float y){
// some ops// pass by val so make a copy for y, use the temp value from this point on for y.// pass by ref for x so use x as is.float _temp_y = y;
x = x * _temp_y;
// some ops; use y as usual beyond this point, destruct temp_y.
}
Mixed call with non-void return.
floathelper_ret_mixed(float& x, float y){
x = x * x;
y = x;
return y;
}
results in the following conversion:
floatfunc(float x, float y){
// some ops
y = helper_ret_mixed(x, y);
// some ops
}
TO:
floatfunc(float x, float y){
// some ops// pass by val so make a copy for y, use the temp value from this point on for y.// pass by ref for x so use x as is.float _temp_y = y;
x = x * x;
_temp_y = x;
// return assign here:
y = _temp_y;
// some ops; use y as usual beyond this point, destruct temp_y.
}
These are good comparisons to test my logic. I feel like we can prove here by how passing the same array ref (and a new temp one for pass by value) results in correct results.
Another "peculiar" case is functions with multiple references and a void return. One way is to actually inline these functions into the code and then continue the differentiation as is. One way to achieve this is to retain the tape types and then differentiate the function normally. To do this, we just have to assume that the void return function actually has a non void type (return type of the target function) and returns a random constant of that type. For example, consider the following:
voidhelperR(float& x, float& y){
x = x*x;
y += x;
}
floatrandomR(float x, float y){
helperR(x, y);
return x + y;
}
Now, lets take two cases:
If we inline the function directly:
Transformed functions:
floatrandomR(float x, float y){
x = x*x;
y += x;
return x + y;
}
// Here we make sure the "dummy" return type is the same as the target function return type.floathelperR(float& x, float& y){
x = x*x;
y += x;
// Here we return 0, or zero equivalent of the return type.// Or just return nothing! that works too.return0;
}
floatrandomR(float x, float y){
// dummy temp value to make sure clad differentiates this right now...float temp = helperR(x, y);
return x + y;
}
Derivative produced:
voidhelperR_grad(float &x, float &y, clad::array_ref<float> _d_x, clad::array_ref<float> _d_y) {
float _t0;
float _t1;
_t1 = x;
_t0 = x;
x = _t1 * _t0;
y += x;
int helperR_return = 0;
goto _label0;
_label0:
;
{ // See how this block is completely identical to the inlined one we saw before?float _r_d1 = * _d_y;
* _d_y += _r_d1;
* _d_x += _r_d1;
* _d_y -= _r_d1;
* _d_y;
}
{
float _r_d0 = * _d_x;
float _r0 = _r_d0 * _t0;
* _d_x += _r0;
float _r1 = _t1 * _r_d0;
* _d_x += _r1;
* _d_x -= _r_d0;
* _d_x;
}
}
voidrandomR_grad(float x, float y, clad::array_ref<float> _d_x, clad::array_ref<float> _d_y) {
float _t0;
float _t1;
float _d_temp = 0;
_t0 = x;
_t1 = y;
float temp = helperR(_t0, _t1);
float randomR_return = x + y;
goto _label0;
_label0:
{
* _d_x += 1;
* _d_y += 1;
}
{
// float _grad0 = 0.F; -- No need// float _grad1 = 0.F; -- No need// here we need to make copies x and y again because these temps have been changed before.helperR_grad(_t0, _t1, _d_x, _d_y);
float _r0 = _d_temp * _grad0; //d_temp unused so = 0; noop
* _d_x += _r0; // Addition with 0; noopfloat _r1 = _d_temp * _grad1; // also 0; noop
* _d_y += _r1; // also addition with 0; noop
}
}
Now, you can probably see that the above two cases are absolutely identical. I suppose now it is just a choice of how we want to implement this.
Again, I have no exhaustively thought of this or tested it out so we probably need some more debate here...but that is my two cents!
The pullback function approach allows to "continue" the reverse mode automatic derivation when required.
This allows correctly computing derivatives when arguments are passed by reference or pointers.
This commit also modifies custom gradient functions to custom pullback functions.
Closesvgvassilev#281, Closesvgvassilev#386, Closesvgvassilev#387
The pullback function approach allows to "continue" the reverse mode automatic derivation when required.
This allows correctly computing derivatives when arguments are passed by reference or pointers.
This commit also modifies custom gradient functions to custom pullback functions.
Closes#281, Closes#386, Closes#387
The thread of interest
Problem:
No, for the following example:
the following gradient is generated:
The gradient is obviously wrong.
Proposed solution:
we should somehow propagate the changes forward if any parameter is an L value reference type. I think we could do this by just forwarding the
array_ref
for the parameter passed as reference instead of creating a temporary_grad*
.Then the gradient would look something like this:
A generalized poc:
I feel like what I proposed should work. Again, I suppose this requires some debate and some input from some more experienced folks but here is my argument:
Any function call can essentially be swapped into the original function code (or inlined to be more precise), so different combinations and their possible inlines may look like the following:
This is the target function we will use, it will contain a call to a void function or a call-assign to a non-void function
Only one reference variable + no return.
results in the following conversion:
TO:
One/pure reference variable(s) with non-void return.
results in the following conversion:
TO:
Pass by value with non void return (usual case).
results in the following conversion:
TO:
Mixed call with void return.
results in the following conversion:
TO:
Mixed call with non-void return.
results in the following conversion:
TO:
These are good comparisons to test my logic. I feel like we can prove here by how passing the same array ref (and a new temp one for pass by value) results in correct results.
Another "peculiar" case is functions with multiple references and a void return. One way is to actually inline these functions into the code and then continue the differentiation as is. One way to achieve this is to retain the tape types and then differentiate the function normally. To do this, we just have to assume that the void return function actually has a non void type (return type of the target function) and returns a random constant of that type. For example, consider the following:
Now, lets take two cases:
Transformed functions:
Derivative Produced:
Transformed Functions:
Derivative produced:
Now, you can probably see that the above two cases are absolutely identical. I suppose now it is just a choice of how we want to implement this.
Again, I have no exhaustively thought of this or tested it out so we probably need some more debate here...but that is my two cents!
P.S. Excuse any typos!
Originally posted by @grimmmyshini in #247 (comment)
The text was updated successfully, but these errors were encountered: