-
Notifications
You must be signed in to change notification settings - Fork 631
Add adjoint differentiation of tfq.math.inner_product() #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jae, this is amazing work!! Nice job adapting the adjoint method code to work on inner product.
I think there are some tweaks that need to be made to the code before we can merge:
-
The forward pass output shape is [batch, n_others], much like our expectation op which has forward pass: [batch, n_ops]. The gradient for our expectation op is the total gradient (i.e
$sum_{op_i} [batch, n_ops, n_symbols] => [batch, n_symbols]$ ). In this case you aren't giving the total gradient summing over$other_i$ which would allow you to simplify the computation in the C++ op (you could have something likeAccumulateOperators) that let's you accumulate all other_programs together so you can compute the total gradient for all states in one pass instead of one by one. -
I know the way TensorFlow handles complex gradients can be a little complicated so I was wondering if you think it might make sense to setup a few tests with a small TF compute graph, using raw tf operations (no TFQ) to implement a small circuit inner product calculation and then compare the gradients that come out of that compute graph with the ones we produce from our op.
-
I think we may be able to remove a lot of the boilerplate function code for the gradient by just using the
@RegisterGradientdecorator.
tensorflow_quantum/core/ops/math_ops/tfq_inner_product_adj_grad.cc
Outdated
Show resolved
Hide resolved
tensorflow_quantum/core/ops/math_ops/tfq_inner_product_adj_grad.cc
Outdated
Show resolved
Hide resolved
tensorflow_quantum/core/ops/math_ops/tfq_inner_product_adj_grad.cc
Outdated
Show resolved
Hide resolved
MichaelBroughton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things are looking pretty good now! A few more small tweaks and we should be ready to go. It will be exciting to start putting circuit inner product calculations inside of compute graphs and Keras models!
tensorflow_quantum/core/ops/math_ops/inner_product_adj_grad_op_test.py
Outdated
Show resolved
Hide resolved
tensorflow_quantum/core/ops/math_ops/tfq_inner_product_adj_grad.cc
Outdated
Show resolved
Hide resolved
|
PTAL. |
MichaelBroughton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tensorflow_quantum/core/ops/math_ops/inner_product_grad_test.py
Outdated
Show resolved
Hide resolved
MichaelBroughton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
updating cards on qsim documentation landing page
This PR adds adjoint differentiation of tfq.math_ops.inner_prod().
Since inner_product() has the specific output tensor shape
[batch_size, inner_size, n_symbols], this PR decided to implement new adjoint gradient op for it. (The original adjoint gradient has only[batch_size, n_symbols]output size, so one more internal nested for-loop is added in this new adjoint gradient op)For edge cases, this PR deals with them like:
[1] empty symbols
The following code shows the default behavior of TensorFlow dealing with gradients with respect to the empty symbols.
So, this PR adds this behavior in Python
tf.custom_gradient()function return function -def grad(dy). For C++inner_product_adj_grad_opitself, this PR adds throwing errors if the symbol is empty.[2] empty circuits
I suspect that our current definition is controversial. for example, we can say that the given empty circuit is just
|0>. Then, we can say that the output inner product is1.0for both empty circuits<0|0>, and its gradient is0.0. However, if only theother_programsis empty, we may have<psi(x)|0>and<dpsi(x)/dx|0>. That's why this PR didn't just return the default value when the circuit is empty.