-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Gradient for SVD op #6503
Comments
the algorithm is in section 3.2 of An extended collection of matrix derivative results |
We are currently working on this internally and I've heard it may be close. @rmlarsen knows more. |
As mentioned, this is underway internally. I believe both the person working on it and I are just back from vacation. I assume this will be available within the coming month. |
Yes, this would help experiment with spectral methods. @rmlarsen @aselle @yaroslavvb Is this still in active development ? |
Is there any update on this? It would be super useful for our work ! |
@rmlarsen what is the update on this ? |
There is a current implementation already but it is not yet complete. I was busy with other projects but I will try to come back to this next week. |
Hello, |
Hello, |
Would be great to have this functionality. |
def svd(A, full_matrices=False, compute_uv=True, name=None):
# since dA = dUSVt + UdSVt + USdVt
# we can simply recompute each matrix using A = USVt
# while blocking gradients to the original op.
_, M, N = A.get_shape().as_list()
P = min(M, N)
S0, U0, V0 = map(tf.stop_gradient, tf.svd(A, full_matrices=True, name=name))
Ui, Vti = map(tf.matrix_inverse, [U0, tf.transpose(V0, (0, 2, 1))])
# A = USVt
# S = UiAVti
S = tf.matmul(Ui, tf.matmul(A, Vti))
S = tf.matrix_diag_part(S)
if not compute_uv:
return S
Si = tf.pad(tf.matrix_diag(1/S0), [[0,0], [0,N-P], [0,M-P]])
# U = AVtiSi
U = tf.matmul(A, tf.matmul(Vti, Si))
U = U if full_matrices else U[:, :M, :P]
# Vt = SiUiA
V = tf.transpose(tf.matmul(Si, tf.matmul(Ui, A)), (0, 2, 1))
V = V if full_matrices else V[:, :N, :P]
return S, U, V |
Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps. def matrix_symmetric(x):
return (x + tf.transpose(x, [0,2,1])) / 2
def get_eigen_K(x, square=False):
"""
Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise
Parameters
----------
x : tf.Tensor with shape as [..., dim,]
Returns
-------
"""
if square:
x = tf.square(x)
res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
res += tf.eye(tf.shape(res)[1])
res = 1 / res
res -= tf.eye(tf.shape(res)[1])
# Keep the results clean
res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
return res
@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
"""
Define the gradient for SVD
References
Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
Parameters
----------
op
grad_s
grad_u
grad_v
Returns
-------
"""
s, u, v = op.outputs
v_t = tf.transpose(v, [0,2,1])
with tf.name_scope('K'):
K = get_eigen_K(s, True)
inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))
# Create the shape accordingly.
u_shape = u.get_shape()[1].value
v_shape = v.get_shape()[1].value
# Recover the complete S matrices and its gradient
eye_mat = tf.eye(v_shape, u_shape)
realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])
real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])
dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
return dxdz |
@kcyu2014 Why don't you make a PR? |
@kcyu2014 Thx for the code, but it is missing the |
@albertpumarola Sorry I forgot it and now its updated :) |
+1 would be very useful :) @rmlarsen |
Yes, please add this feature, super helpful for matrix nuclear norm. @rmlarsen |
Have you test the code @kcyu2014 contribute? Is it work? @albertpumarola |
I tried it and it didn't work for me :/ Can find logs later if it's helpful for people |
The implementation by @kcyu2014 does not have gradients for U, only for S and V (those seem to agree with numerical gradients though). |
I need this feature badly. Could someone get it done fast? |
Hi, any update about this? I tried the code by @kcyu2014 but it didn't work properly unfortunately. |
@rmlarsen. Any update? |
Sorry for the lack of progress on this. I will try to set aside a few days to get this in now. Especially now that we have GPU support for all the linear algebra ops (minus complex SVD), this is a gaping hole. |
here is an implementation that should work for square matrices. |
@psycharo seems to work but sometimes the loss goes to NaN when using svd in the loss (nuclear norm), but that might be the problem of my architecture not the SVD backprop code |
@hicham-eyeem -- TensorFlow SVD has some bugs that cause NaNs sometimes -- #9234 , you could double check if this is fixed using numpy version |
@yaroslavvb ah ok, thank you for pointing that out and actually even adding some regularisation doesn't help. Do you know the reason why it would give NaNs sometimes? |
FYI: I have an initial version of this out for review internally. |
@rmlarsen great, thank you, can't wait to try it out :) |
The code was submitted and should appear on github within a day or so. There are certain restrictions for the gradient computation that I welcome contributions to lift: "This initial version has the following restrictions: Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: |
Let me close this and open a new issue for extending support for more general matrices. |
…(cdi@google.com), using the algorithm outlined in Mike Giles' paper: http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf. This initial version has the following restrictions: Only supports statically known inner matrix dimensions m and n. Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: a) Only supports real tensors. b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1. c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b). Feature request on Github: tensorflow#6503 This CL also adds support for calling tf.real, tf.imag, and tf.angle with real arguments. PiperOrigin-RevId: 171836140
@caisq thanks for the quick push! |
Followup issue is #13641 |
@rmlarsen was the formula from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf used or were there a different formula used? |
this is very useful, we are assuming that we don't use the U matrix when we have decomposed the original matrix A into U s V, since we do not calculate the derivative respect to U anywhere. |
The gradient for the SVD op would be very useful so that it could be used in networks and cost functions. Currently when trying to use SVD I get the follow:
LookupError: No gradient defined for operation 'Svd' (op type: Svd)
So my request is for the gradient for the SVD op
The text was updated successfully, but these errors were encountered: