Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Gradient for SVD op #6503

Closed
kstant0725 opened this issue Dec 26, 2016 · 39 comments
Closed

Feature Request: Gradient for SVD op #6503

kstant0725 opened this issue Dec 26, 2016 · 39 comments
Assignees
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests

Comments

@kstant0725
Copy link

The gradient for the SVD op would be very useful so that it could be used in networks and cost functions. Currently when trying to use SVD I get the follow:

LookupError: No gradient defined for operation 'Svd' (op type: Svd)

So my request is for the gradient for the SVD op

@kstant0725 kstant0725 changed the title Gradient for SVD op Feature Request: Gradient for SVD op Dec 26, 2016
@yaroslavvb
Copy link
Contributor

the algorithm is in section 3.2 of An extended collection of matrix derivative results
for forward and reverse mode algorithmic
differentiation

@aselle
Copy link
Contributor

aselle commented Dec 28, 2016

We are currently working on this internally and I've heard it may be close. @rmlarsen knows more.

@aselle aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:feature Feature requests labels Dec 28, 2016
@ddetone
Copy link

ddetone commented Jan 7, 2017

@aselle @rmlarsen Any update on this?

@rmlarsen
Copy link
Member

rmlarsen commented Jan 10, 2017

As mentioned, this is underway internally. I believe both the person working on it and I are just back from vacation. I assume this will be available within the coming month.

@rmlarsen rmlarsen removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 11, 2017
@ddetone
Copy link

ddetone commented Jan 31, 2017

Hi @rmlarsen, I am looking through the rc-v1.0 and I don't see the SvdGrad op registered here. Is there somewhere else I should look for it?

@satyam-cyc
Copy link

satyam-cyc commented Feb 3, 2017

Yes, this would help experiment with spectral methods. @rmlarsen @aselle @yaroslavvb Is this still in active development ?

@shariharan99
Copy link

Is there any update on this? It would be super useful for our work !

@shariharan99
Copy link

@rmlarsen what is the update on this ?

@aselle aselle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 3, 2017
@cdiwork
Copy link

cdiwork commented Mar 9, 2017

There is a current implementation already but it is not yet complete. I was busy with other projects but I will try to come back to this next week.

@mlhengin
Copy link

mlhengin commented Mar 22, 2017

Hello,
Any news on the features? Would be very helpful to me too.

@kstant0725
Copy link
Author

Hello,
I was also wondering if there was any progress on this?

@aselle aselle assigned aselle and rmlarsen and unassigned rmlarsen and aselle Apr 19, 2017
@schmiflo
Copy link

Would be great to have this functionality.

@kofd
Copy link

kofd commented Apr 28, 2017

def svd(A, full_matrices=False, compute_uv=True, name=None):
  # since dA = dUSVt + UdSVt + USdVt
  # we can simply recompute each matrix using A = USVt
  # while blocking gradients to the original op.
  _, M, N = A.get_shape().as_list()
  P = min(M, N)
  S0, U0, V0 = map(tf.stop_gradient, tf.svd(A, full_matrices=True, name=name))
  Ui, Vti = map(tf.matrix_inverse, [U0, tf.transpose(V0, (0, 2, 1))])
  # A = USVt
  # S = UiAVti
  S = tf.matmul(Ui, tf.matmul(A, Vti))
  S = tf.matrix_diag_part(S)
  if not compute_uv:
    return S
  Si = tf.pad(tf.matrix_diag(1/S0), [[0,0], [0,N-P], [0,M-P]])
  # U = AVtiSi
  U = tf.matmul(A, tf.matmul(Vti, Si))
  U = U if full_matrices else U[:, :M, :P]
  # Vt = SiUiA
  V = tf.transpose(tf.matmul(Si, tf.matmul(Ui, A)), (0, 2, 1))
  V = V if full_matrices else V[:, :N, :P]
  return S, U, V

@albertpumarola
Copy link

albertpumarola commented Jun 9, 2017

Hi,
@aselle @rmlarsen Any news?

@kcyu2014
Copy link

kcyu2014 commented Jun 9, 2017

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.

def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
        
    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz

@kmyi
Copy link

kmyi commented Jun 9, 2017

@kcyu2014 Why don't you make a PR?

@albertpumarola
Copy link

albertpumarola commented Jun 9, 2017

@kcyu2014 Thx for the code, but it is missing the get_eigen_K and matrix_symmetric implementations. Could you post them?

@kcyu2014
Copy link

kcyu2014 commented Jun 9, 2017

@albertpumarola Sorry I forgot it and now its updated :)

@smilli
Copy link

smilli commented Jul 7, 2017

+1 would be very useful :) @rmlarsen

@JasZhanAva
Copy link

Yes, please add this feature, super helpful for matrix nuclear norm. @rmlarsen

@JasZhanAva
Copy link

Have you test the code @kcyu2014 contribute? Is it work? @albertpumarola

@smilli
Copy link

smilli commented Jul 11, 2017

I tried it and it didn't work for me :/ Can find logs later if it's helpful for people

@psycharo-zz
Copy link

The implementation by @kcyu2014 does not have gradients for U, only for S and V (those seem to agree with numerical gradients though).

@LionSR
Copy link

LionSR commented Jul 23, 2017

I need this feature badly. Could someone get it done fast?

@hicham-eyeem
Copy link

Hi, any update about this? I tried the code by @kcyu2014 but it didn't work properly unfortunately.

@aselle
Copy link
Contributor

aselle commented Sep 26, 2017

@rmlarsen. Any update?

@rmlarsen
Copy link
Member

Sorry for the lack of progress on this. I will try to set aside a few days to get this in now. Especially now that we have GPU support for all the linear algebra ops (minus complex SVD), this is a gaping hole.

@psycharo-zz
Copy link

here is an implementation that should work for square matrices.

@hicham-eyeem
Copy link

hicham-eyeem commented Sep 27, 2017

@psycharo seems to work but sometimes the loss goes to NaN when using svd in the loss (nuclear norm), but that might be the problem of my architecture not the SVD backprop code

@yaroslavvb
Copy link
Contributor

@hicham-eyeem -- TensorFlow SVD has some bugs that cause NaNs sometimes -- #9234 , you could double check if this is fixed using numpy version

@hicham-eyeem
Copy link

@yaroslavvb ah ok, thank you for pointing that out and actually even adding some regularisation doesn't help. Do you know the reason why it would give NaNs sometimes?
I guess also we can avoid using SVD by rather using a matrix factorization formulation if it's used in the loss function, since the matrix factorization formulation would require only matmul and transpose ops (+ some constraints that can be linearized with a proximal form)

@rmlarsen
Copy link
Member

rmlarsen commented Oct 5, 2017

FYI: I have an initial version of this out for review internally.

@hicham-eyeem
Copy link

@rmlarsen great, thank you, can't wait to try it out :)

@rmlarsen
Copy link
Member

rmlarsen commented Oct 11, 2017

The code was submitted and should appear on github within a day or so. There are certain restrictions for the gradient computation that I welcome contributions to lift:

"This initial version has the following restrictions:
Only supports statically known inner matrix dimensions m and n.

Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions:
a) Only supports real tensors.
b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1.
c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b)."

@rmlarsen rmlarsen added stat:contribution welcome Status - Contributions welcome and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Oct 11, 2017
@rmlarsen
Copy link
Member

Let me close this and open a new issue for extending support for more general matrices.

caisq pushed a commit to caisq/tensorflow that referenced this issue Oct 11, 2017
…(cdi@google.com), using the algorithm outlined in Mike Giles' paper: http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf.

This initial version has the following restrictions:
  Only supports statically known inner matrix dimensions m and n.

Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions:
  a) Only supports real tensors.
  b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1.
  c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b).

Feature request on Github:
tensorflow#6503

This CL also adds support for calling tf.real, tf.imag, and tf.angle with real arguments.

PiperOrigin-RevId: 171836140
@rmlarsen
Copy link
Member

@caisq thanks for the quick push!

@rmlarsen
Copy link
Member

rmlarsen commented Oct 11, 2017

Followup issue is #13641

@JaeDukSeo
Copy link

@rmlarsen was the formula from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf used or were there a different formula used?

@JaeDukSeo
Copy link

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.

def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
        
    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz

this is very useful, we are assuming that we don't use the U matrix when we have decomposed the original matrix A into U s V, since we do not calculate the derivative respect to U anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

No branches or pull requests