Feature Request: Gradient for SVD op #6503

kstant0725 · 2016-12-26T14:13:56Z

The gradient for the SVD op would be very useful so that it could be used in networks and cost functions. Currently when trying to use SVD I get the follow:

LookupError: No gradient defined for operation 'Svd' (op type: Svd)

So my request is for the gradient for the SVD op

yaroslavvb · 2016-12-26T19:39:46Z

the algorithm is in section 3.2 of An extended collection of matrix derivative results
for forward and reverse mode algorithmic
differentiation

aselle · 2016-12-28T23:45:56Z

We are currently working on this internally and I've heard it may be close. @rmlarsen knows more.

ddetone · 2017-01-07T21:03:13Z

@aselle @rmlarsen Any update on this?

rmlarsen · 2017-01-10T22:55:01Z

As mentioned, this is underway internally. I believe both the person working on it and I are just back from vacation. I assume this will be available within the coming month.

ddetone · 2017-01-31T17:43:27Z

Hi @rmlarsen, I am looking through the rc-v1.0 and I don't see the SvdGrad op registered here. Is there somewhere else I should look for it?

satyam-cyc · 2017-02-03T03:28:38Z

Yes, this would help experiment with spectral methods. @rmlarsen @aselle @yaroslavvb Is this still in active development ?

shariharan99 · 2017-02-20T23:32:57Z

Is there any update on this? It would be super useful for our work !

shariharan99 · 2017-03-01T23:38:24Z

@rmlarsen what is the update on this ?

cdiwork · 2017-03-09T15:34:49Z

There is a current implementation already but it is not yet complete. I was busy with other projects but I will try to come back to this next week.

mlhengin · 2017-03-22T05:26:09Z

Hello,
Any news on the features? Would be very helpful to me too.

kstant0725 · 2017-04-19T17:52:06Z

Hello,
I was also wondering if there was any progress on this?

schmiflo · 2017-04-25T14:05:27Z

Would be great to have this functionality.

kofd · 2017-04-28T21:45:06Z

def svd(A, full_matrices=False, compute_uv=True, name=None):
  # since dA = dUSVt + UdSVt + USdVt
  # we can simply recompute each matrix using A = USVt
  # while blocking gradients to the original op.
  _, M, N = A.get_shape().as_list()
  P = min(M, N)
  S0, U0, V0 = map(tf.stop_gradient, tf.svd(A, full_matrices=True, name=name))
  Ui, Vti = map(tf.matrix_inverse, [U0, tf.transpose(V0, (0, 2, 1))])
  # A = USVt
  # S = UiAVti
  S = tf.matmul(Ui, tf.matmul(A, Vti))
  S = tf.matrix_diag_part(S)
  if not compute_uv:
    return S
  Si = tf.pad(tf.matrix_diag(1/S0), [[0,0], [0,N-P], [0,M-P]])
  # U = AVtiSi
  U = tf.matmul(A, tf.matmul(Vti, Si))
  U = U if full_matrices else U[:, :M, :P]
  # Vt = SiUiA
  V = tf.transpose(tf.matmul(Si, tf.matmul(Ui, A)), (0, 2, 1))
  V = V if full_matrices else V[:, :N, :P]
  return S, U, V

albertpumarola · 2017-06-09T11:20:43Z

Hi,
@aselle @rmlarsen Any news?

kcyu2014 · 2017-06-09T13:10:14Z

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.

def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
        
    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz

kmyi · 2017-06-09T13:16:56Z

@kcyu2014 Why don't you make a PR?

albertpumarola · 2017-06-09T14:08:46Z

@kcyu2014 Thx for the code, but it is missing the get_eigen_K and matrix_symmetric implementations. Could you post them?

kcyu2014 · 2017-06-09T14:18:40Z

@albertpumarola Sorry I forgot it and now its updated :)

smilli · 2017-07-07T16:28:28Z

+1 would be very useful :) @rmlarsen

JasZhanAva · 2017-07-11T00:39:59Z

Yes, please add this feature, super helpful for matrix nuclear norm. @rmlarsen

JasZhanAva · 2017-07-11T01:23:23Z

Have you test the code @kcyu2014 contribute? Is it work? @albertpumarola

smilli · 2017-07-11T01:28:10Z

I tried it and it didn't work for me :/ Can find logs later if it's helpful for people

psycharo-zz · 2017-07-11T13:51:22Z

The implementation by @kcyu2014 does not have gradients for U, only for S and V (those seem to agree with numerical gradients though).

LionSR · 2017-07-23T06:58:07Z

I need this feature badly. Could someone get it done fast?

hicham-eyeem · 2017-09-26T12:21:57Z

Hi, any update about this? I tried the code by @kcyu2014 but it didn't work properly unfortunately.

aselle · 2017-09-26T20:09:12Z

@rmlarsen. Any update?

rmlarsen · 2017-09-26T20:19:56Z

Sorry for the lack of progress on this. I will try to set aside a few days to get this in now. Especially now that we have GPU support for all the linear algebra ops (minus complex SVD), this is a gaping hole.

psycharo-zz · 2017-09-27T01:27:04Z

here is an implementation that should work for square matrices.

hicham-eyeem · 2017-09-27T12:27:58Z

@psycharo seems to work but sometimes the loss goes to NaN when using svd in the loss (nuclear norm), but that might be the problem of my architecture not the SVD backprop code

yaroslavvb · 2017-09-27T15:07:15Z

@hicham-eyeem -- TensorFlow SVD has some bugs that cause NaNs sometimes -- #9234 , you could double check if this is fixed using numpy version

hicham-eyeem · 2017-09-27T17:09:13Z

@yaroslavvb ah ok, thank you for pointing that out and actually even adding some regularisation doesn't help. Do you know the reason why it would give NaNs sometimes?
I guess also we can avoid using SVD by rather using a matrix factorization formulation if it's used in the loss function, since the matrix factorization formulation would require only matmul and transpose ops (+ some constraints that can be linearized with a proximal form)

rmlarsen · 2017-10-05T22:18:42Z

FYI: I have an initial version of this out for review internally.

hicham-eyeem · 2017-10-09T09:09:03Z

@rmlarsen great, thank you, can't wait to try it out :)

rmlarsen · 2017-10-11T17:03:48Z

The code was submitted and should appear on github within a day or so. There are certain restrictions for the gradient computation that I welcome contributions to lift:

"This initial version has the following restrictions:
Only supports statically known inner matrix dimensions m and n.

Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions:
a) Only supports real tensors.
b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1.
c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b)."

rmlarsen · 2017-10-11T17:14:50Z

Let me close this and open a new issue for extending support for more general matrices.

…(cdi@google.com), using the algorithm outlined in Mike Giles' paper: http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf. This initial version has the following restrictions: Only supports statically known inner matrix dimensions m and n. Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: a) Only supports real tensors. b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1. c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b). Feature request on Github: tensorflow#6503 This CL also adds support for calling tf.real, tf.imag, and tf.angle with real arguments. PiperOrigin-RevId: 171836140

rmlarsen · 2017-10-11T17:23:10Z

@caisq thanks for the quick push!

rmlarsen · 2017-10-11T18:25:22Z

Followup issue is #13641

JaeDukSeo · 2018-08-21T10:55:09Z

@rmlarsen was the formula from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf used or were there a different formula used?

JaeDukSeo · 2018-11-17T18:39:23Z

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.

def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
        
    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz

this is very useful, we are assuming that we don't use the U matrix when we have decomposed the original matrix A into U s V, since we do not calculate the derivative respect to U anywhere.

kstant0725 changed the title ~~Gradient for SVD op~~ Feature Request: Gradient for SVD op Dec 26, 2016

aselle assigned rmlarsen Dec 28, 2016

aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:feature Feature requests labels Dec 28, 2016

rmlarsen removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 11, 2017

aselle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 3, 2017

aselle assigned aselle and rmlarsen and unassigned rmlarsen and aselle Apr 19, 2017

yaroslavvb mentioned this issue Apr 25, 2017

Add support for matrix square root #9202

Closed

ylxdzsw mentioned this issue Aug 1, 2017

training on the tensorflow backend FluxML/Flux.jl#51

Merged

rmlarsen added stat:contribution welcome Status - Contributions welcome and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Oct 11, 2017

rmlarsen closed this as completed Oct 11, 2017

asimshankar mentioned this issue Feb 21, 2018

LookupError: gradient registry has no entry for: Svd #17156

Closed

ninamiolane mentioned this issue Nov 30, 2018

[FIX] tf nan gradients in issue #160 geomstats/geomstats#176

Merged

Feature Request: Gradient for SVD op #6503

Feature Request: Gradient for SVD op #6503

Comments

kstant0725 commented Dec 26, 2016

yaroslavvb commented Dec 26, 2016

aselle commented Dec 28, 2016

ddetone commented Jan 7, 2017

rmlarsen commented Jan 10, 2017 • edited

ddetone commented Jan 31, 2017

satyam-cyc commented Feb 3, 2017 • edited

shariharan99 commented Feb 20, 2017

shariharan99 commented Mar 1, 2017

cdiwork commented Mar 9, 2017

mlhengin commented Mar 22, 2017 • edited

kstant0725 commented Apr 19, 2017

schmiflo commented Apr 25, 2017

kofd commented Apr 28, 2017 • edited

albertpumarola commented Jun 9, 2017 • edited

kcyu2014 commented Jun 9, 2017 • edited

kmyi commented Jun 9, 2017

albertpumarola commented Jun 9, 2017 • edited

kcyu2014 commented Jun 9, 2017

smilli commented Jul 7, 2017

JasZhanAva commented Jul 11, 2017

JasZhanAva commented Jul 11, 2017

smilli commented Jul 11, 2017

psycharo-zz commented Jul 11, 2017

LionSR commented Jul 23, 2017

hicham-eyeem commented Sep 26, 2017

aselle commented Sep 26, 2017

rmlarsen commented Sep 26, 2017

psycharo-zz commented Sep 27, 2017

hicham-eyeem commented Sep 27, 2017 • edited

yaroslavvb commented Sep 27, 2017

hicham-eyeem commented Sep 27, 2017

rmlarsen commented Oct 5, 2017

hicham-eyeem commented Oct 9, 2017

rmlarsen commented Oct 11, 2017 • edited

rmlarsen commented Oct 11, 2017

rmlarsen commented Oct 11, 2017

rmlarsen commented Oct 11, 2017 • edited

JaeDukSeo commented Aug 21, 2018

JaeDukSeo commented Nov 17, 2018

rmlarsen commented Jan 10, 2017 •

edited

satyam-cyc commented Feb 3, 2017 •

edited

mlhengin commented Mar 22, 2017 •

edited

kofd commented Apr 28, 2017 •

edited

albertpumarola commented Jun 9, 2017 •

edited

kcyu2014 commented Jun 9, 2017 •

edited

albertpumarola commented Jun 9, 2017 •

edited

hicham-eyeem commented Sep 27, 2017 •

edited

rmlarsen commented Oct 11, 2017 •

edited

rmlarsen commented Oct 11, 2017 •

edited