Generalize slicing and slice assignment ops (including gather and scatter) #206

Closed
girving opened this Issue Nov 13, 2015 · 76 comments

Projects

None yet
@girving
Member
girving commented Nov 13, 2015

We should make our slicing and assignment ops more general to capture more of the functionality of numpy slicing, and add __getitem__ sugar for all of it. Specifically,

  1. We should have a 2.5 dimensional set of ops, with dimensions (1) get vs. set, (2) slice type, and for the assignment ops (3) the update op. Currently we have slice, assign_update, assign_add, assign_sub, gather, scatter_update, scatter_add, scatter_sub. We should also have assign_slice_update, assign_slice_add, assign_slice_sub.
  2. Both slicing and slice assignment should support strides, with no performance cost if strides aren't used.
  3. Ideally, the slice ops should support negative indexing a la Python. Since the slice parameters are already CPU, this is implementable with near zero cost. The unfortunate bit is that since we picked the wrong format for specifying ranges (start + length instead of start : end), negative indexing might be awkward. Thus, it might be best left to a separate bug.
  4. Support numpy-like boolean indexing.
  5. Generalize gather and scatter_* to take an array of input index tensors, efficiently broadcast them, and do multidimensional indexing similar to numpy.
  6. Make __getitem__ provide sugar for all of the above. Ideally we'd have something idiomatically similar at least to __setitem__, but this is problematic since the returned assignment op is important to have, __setitem__ does not return a value, and the nice range sugar is available only inside indexing / assignment calls.

@ebrevdo: I'm assigning this to you for now since you might get to it first, but feel free to grab only the piece of it that you need for now.

@girving girving added the enhancement label Nov 13, 2015
@ebrevdo ebrevdo was assigned by girving Nov 13, 2015
@girving
Member
girving commented Nov 20, 2015

Lasse requests the equivalent of numpy mixed indexing:

x[:, tensor]

which combines slicing with indexing-by-tensor.

@lespeholt

...where tensor can be either a scalar (which would select all the values in that column) or a vector which can select individual columns in the rows, so:

foo = tf.constant([[1,2,3], [4,5,6]])
foo[:, 1] # [2, 5]
indexes = tf.constant([1, 2])
foo[:, indexes] # [2, 6]
@avostryakov

If I understand correctly the following code is exactly what is needed for cross-entropy loss function
indexes = tf.constant([1, 2])
foo[:, indexes] # [2, 6]

If we will have this kind of indexing we can write:
cost = -tf.reduce_sum(tf.log(network_output[:, targets]))

, where targets is a vector of class's indexes instead of

cost = -tf.reduce_sum(targets*tf.log(network_output))

, where targets is a sparse matrix

Am I correct?

@yaroslavvb
Contributor

Numpy also has newaxis and "Ellipsis" objects. IE, to prepend an axis using numpy notation
a[np.newaxis,...]

http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.indexing.html

@girving
Member
girving commented Dec 3, 2015

Yes, newaxis is essential.

@girving
Member
girving commented Dec 8, 2015

As part of this, we should make #418 work.

@olange-google

It would also be helpful if gather supports an axis parameter:

current behavior:
gather(v, indices) --> output[i, ... ] = params[indices[i], ... ]

wanted behavior:
gather(v, indices, axis=1) --> output[:, i, :] = params[:, indices[i], :]

(Please excuse, if this is covered by the list of requirements posted above already)

@girving
Member
girving commented Feb 17, 2016

@olange-google: I think we're unlikely to implement an axis parameter since it's beyond numpy features, but what you want is covered by the combination of slice indexing and advanced indexing.

@ebrevdo
Contributor
ebrevdo commented Feb 17, 2016

Are you sure? If in numpy I use:

array[:, :, [1, 2, 3], :]

then that's equivalent to gather([1, 2, 3], axis=2)

is it not?

On Wed, Feb 17, 2016 at 11:06 AM, Geoffrey Irving notifications@github.com
wrote:

@olange-google https://github.com/olange-google: I think we're unlikely
to implement an axis parameter since it's beyond numpy features, but what
you want is covered by the combination of slice indexing and advanced
indexing.


Reply to this email directly or view it on GitHub
#206 (comment)
.

@girving
Member
girving commented Feb 17, 2016

We're saying the same thing. I'm not objecting to that functionality, just to exposing it via that sort of axis parameter rather than as a special case of the combination of slice indexing and advanced indexing.

@mhejrati

@ebrevdo What is the status on this issue?
I am interested to implement this if no one is working on it.

@ebrevdo
Contributor
ebrevdo commented Feb 28, 2016

I plan to work on this but not until after next week. If you want to work
in this, I suggest supporting GPU and CPU in your kernels. If you can't
implement that, you may want to wait for us to implement it.
On Feb 23, 2016 11:45 AM, "Mohsen Hejrati" notifications@github.com wrote:

@ebrevdo https://github.com/ebrevdo What is the status on this issue?
I am interested to implement this if no one is working on it.


Reply to this email directly or view it on GitHub
#206 (comment)
.

@MInner
MInner commented Apr 2, 2016

Hi :) Could you please give an update on the status of this feature?

The _SliceHelper docstring says that the "stop" of the slice must not be omitted, however this case seem to be handled just fine in function implementation:

if s.stop is None or s.stop == sys.maxsize:
        sizes.append(-1)
  • or I'm getting something wrong?
@ebrevdo
Contributor
ebrevdo commented Apr 2, 2016

OK recently added the gather_nd op, which performs a special subset of the required functionality: given a tensor of indices, gather the requested values.

Advanced slicing is on the radar.

@waleedka
Contributor
waleedka commented Apr 9, 2016

@ebrevdo I tried using the gather_nd op to get the last relevant output from a variable length LSTM network. I'm passing sequence_length to the RNN, which means that the last few outputs of most examples are zeros, so I'm trying to read the last non-zero output. I'm getting this error, though, in the training phase:

NotImplementedError: Gradient for gather_nd is not implemented.

  outputs, state = rnn.rnn(multi_rnn_cell, inputs, dtype=tf.float32, sequence_length=lengths_ph)

  indicies = tf.concat(1, [
      tf.expand_dims(lengths_ph - 1, 1),
      tf.expand_dims(tf.range(tf.shape(vectors_ph)[0]), 1),
      tf.expand_dims(tf.zeros_like(lengths_ph), 1),
      ])
  output_tensor = tf.pack(outputs)
  relevant_output = tf.gather_nd(output_tensor, indicies)
@ebrevdo
Contributor
ebrevdo commented Apr 9, 2016

Yeah - we haven't written the gradient implementation for gather_nd yet.
It's essentially a reshape followed by a call to sparse_to_dense; but
sparse_to_dense doesn't have a GPU implementation (on my TODO) so I'm not
using it yet.

On Fri, Apr 8, 2016 at 7:24 PM, Waleed notifications@github.com wrote:

@ebrevdo https://github.com/ebrevdo I tried using the gather_nd op to
get the last relevant output from a variable length LSTM network. I'm
passing sequence_length to the RNN, which means that the last few outputs
of most examples are zeros, so I'm trying to read the last non-zero output.
I'm getting this error, though, in the training phase:

NotImplementedError: Gradient for gather_nd is not implemented.

outputs, state = rnn.rnn(multi_rnn_cell, inputs, dtype=tf.float32, sequence_length=lengths_ph)

indicies = tf.concat(1, [
tf.expand_dims(lengths_ph - 1, 1),
tf.expand_dims(tf.range(tf.shape(vectors_ph)[0]), 1),
tf.expand_dims(tf.zeros_like(lengths_ph), 1),
])
output_tensor = tf.pack(outputs)
relevant_output = tf.gather_nd(output_tensor, indicies)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#206 (comment)

@hycis
hycis commented Apr 17, 2016

hi @ebrevdo, do you have any timeline to implement gradient for that, as we are currently using that function. and it's quite useful function.

@nova77
nova77 commented Apr 20, 2016 edited

While we wait for gather_nd for supporting gradients, this is a temporary solution:

x = tf.constant([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])
idx = tf.constant([1, 0, 2])
idx_flattened = tf.range(0, x.shape[0]) * x.shape[1] + idx
y = tf.gather(tf.reshape(x, [-1]),  # flatten input
              idx_flattened)  # use flattened indices

with tf.Session(''):
  print y.eval()  # [2 4 9]
@ebrevdo
Contributor
ebrevdo commented Apr 20, 2016

I will implement a gradient in the next week. Keep in mind that it will be
CPU-only for now.

On Wed, Apr 20, 2016 at 2:27 AM, Norman Casagrande <notifications@github.com

wrote:

While we wait for gather_nd for supporting gradients, this is a temporary
solution:

x = tf.constant([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
idx = tf.constant([1, 0, 2])
idx_flattened = tf.range(0, x.shape[1]) * x.shape[0] + idx
y = tf.gather(tf.reshape(x, [-1]), # flatten input
idx_flattened) # use flattened indices
with tf.Session(''):
print y.eval() # [2 4 9]


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#206 (comment)

@danijar
Contributor
danijar commented Apr 25, 2016 edited

@waleedka I adapted @ebrevdo's example to work with an additional dimension for the output neurons of an RNN. This should yield the last relevant output activations while preserving the shape information.

def extract_last_relevant(outputs, length):
    """
    Args:
        outputs: [Tensor(batch_size, output_neurons)]: A list containing the output
            activations of each in the batch for each time step as returned by
            tensorflow.models.rnn.rnn.
        length: Tensor(batch_size): The used sequence length of each example in the
            batch with all later time steps being zeros. Should be of type tf.int32.

    Returns:
        Tensor(batch_size, output_neurons): The last relevant output activation for
            each example in the batch.
    """
    output = tf.transpose(tf.pack(outputs), perm=[1, 0, 2])
    # Query shape.
    batch_size = tf.shape(output)[0]
    max_length = int(output.get_shape()[1])
    num_neurons = int(output.get_shape()[2])
    # Index into flattened array as a workaround.
    index = tf.range(0, batch_size) * max_length + (length - 1)
    flat = tf.reshape(output, [-1, num_neurons])
    relevant = tf.gather(flat, index)
    return relevant
@erickrf
erickrf commented May 3, 2016

@danijar this is a working solution, but when I tried it, I got the following warning from tensorflow:

UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Apparently, this is caused by the index slices returned by tf.gather. Does anyone know if there's a way to avoid this problem?

@danijar
Contributor
danijar commented May 3, 2016

@erickrf Would be glad to hear of a better solution as well. Of course you can hide the warning as any other Python warning though.

@NickShahML

I will implement a gradient in the next week. Keep in mind that it will be
CPU-only for now.

Is this implemented yet by any chance @ebrevdo?

From what I've seen, tf.gather is gpu gradient capable at least correct? Thanks @nova77 for the temporary solution!

@hycis
hycis commented May 6, 2016 edited

here is another implementation for 2d

def gather_2d(params, indices):
    # only for two dim now
    shape = params.get_shape().as_list()
    assert len(shape) == 2, 'only support 2d matrix'
    flat = tf.reshape(params, [np.prod(shape)])
    flat_idx = tf.slice(indices, [0,0], [shape[0],1]) * shape[1] + tf.slice(indices, [0,1], [shape[0],1])
    flat_idx = tf.reshape(flat_idx, [flat_idx.get_shape().as_list()[0]])
    return tf.gather(flat, flat_idx)
@keithshep

@ebrevdo I could try to help out with __getitem__ / __setitem__ if it would be useful. I think these ops are going to be the most intuitive way for people to interact with tensors especially since most are already comfortable with how numpy works. Would you like me to try or would it be better to just leave you to implement it?

@ebrevdo
Contributor
ebrevdo commented May 6, 2016

@keithshep PRs are most welcome. I would start with adding negative indexing to the current getitem/setitem (the one that currently works on the outermost dimension). This will require changes to both the kernel and the python wrappers, and updates to the shape inference code.

@girving
Member
girving commented May 12, 2016

Cc @aselle.

@girving
Member
girving commented May 12, 2016

@keithshep: Do you mind if we give this to @aselle as a starter project? He just joined Brain. :)

@girving girving assigned aselle and unassigned ebrevdo May 12, 2016
@keithshep

@girving Not at all. I was only going to get a chance to dig in this weekend so no problem. Oh, and congrats to @aselle

@girving
Member
girving commented May 12, 2016

For __setitem__: we can't use it directly because it doesn't return a value, and usually you need the return value of assignments (to make sure they execute). However, we could make x[i].assign(y) work with a bit of class trickery.

@mrry mrry referenced this issue May 12, 2016
Closed

Batched gather #2313

@keithshep

@girving I don't know if this is a reasonable path or not (I may be missing the point) but I wonder if borrowing ideas from SSA https://en.wikipedia.org/wiki/Static_single_assignment_form for manipulating the computation graph in cases where there is a side effect and no return value is a way to achieve this?

@girving
Member
girving commented May 12, 2016

@keithshep: Unfortunately I don't know what you mean. We can't manipulate a computation graph until we have one to manipulate, and __setitem__ discards the output of the assign. Normally that return value needs to be used later on, typically as part of a training op which is then run.

@keithshep

@girving my thought was to add some tensor version metadata similar to variable versioning used in SSA. This would mean that some internal tensor version bookkeeping would have to be performed whenever __setitem__ is invoked, and that a tensor version would have to be recorded whenever a tensor is passed to an op since a single tensor python reference could refer to different versions depending on when it's used. Not sure if that makes sense, but as I think about it more it's probably more complicated than is worthwhile anyway.

@girving
Member
girving commented May 12, 2016

@keithshep: That approach would be reasonable, but unfortunately I think it's too big a jump from what we have currently.

@shoyer
Member
shoyer commented May 24, 2016

Just a note -- we do not want to blindly copy NumPy vectorized indexing in __getitem__. Instead, we want so-called "outer indexing" (like MATLAB), which is much more intuitive.

This means that indexing with mixed slices and arrays should differ in TensorFlow compared to NumPy.

See this NumPy PR for examples and a discussion of the relevant issues: https://github.com/numpy/numpy/pull/6256/files

I'm happy to elaborate if that doesn't make sense.

@girving
Member
girving commented May 24, 2016

@shoyer: I don't think you need to elaborate. Outer indexing makes sense, but it isn't what numpy does. Numpy is also more flexible than outer indexing. Finally, I like numpy indexing. :) Thus, unless I see a bunch of support from other people for deviating from numpy, we'll stick with numpy.

@yaroslavvb
Contributor

@shoyer What do the odds look like for that PR being accepted on the Numpy side?

@shoyer
Member
shoyer commented May 25, 2016

@girving I agree that NumPy's vectorized indexing is great. The problem is that NumPy's implementation includes a hacks to make it "more intuitive" when slices and arrays are mixed, which work a lot of the time but then result in some very bizarre edge cases.

The canonical example is indexing operations like x[0, :, [0, 1, 2, 3]], which converts an array of shape (2, 3, 4) to (4, 3) rather than the "obvious" (3, 4). This is point (2) under the motivation in proposal PR (numpy/numpy#6256).

One way to avoid this issue entirely is to prohibit mixed slice/array indexing. But if you do allow mixed slice/array indexing (which is very tempting to handle cases like x[:, [0, 1, 2]]), then this edge case will inevitably arise. Hence the appeal of the alternative indexing operation vindex for vectorized indexing, which lets indexing like x[:, [0, 1, 2]] and x[[0, 1, 2], :] do the obvious "outer indexing" thing without any kludges.

@yaroslavvb There are a few niggling details to figure out in the proposal, but I think it's extremely like that it (and the implementation numpy/numpy#6075) will be merged at some point. There was a long discussion last year about just how unintuitive numpy's current indexing behavior is, and I think all the core developers agreed that this is the obvious path forward.

@girving
Member
girving commented May 25, 2016

@shoyer: Please get it merged on the numpy side and then file a separate issue. I predict a 0% chance that numpy will remove the existing indexing support or make it not the default: that would break millions of lines of code and alienate huge numbers of numpy users. If they add it as an optional feature, TensorFlow could easily do the same thing with the same syntax.

@danijar
Contributor
danijar commented Jun 5, 2016 edited

Will the version @aselle is working on support slicing by start/end tensors? This would be very useful for handling batches of variable-length sequences.

data = [[2, 2, 1, 3], [3, 1, 0, 0], [2, 1, 3, 0]]
length = [4, 2, 3]
data[:, 1: length - 1] == [[2, 1], [0, 0], [1, 0]]
@ypxie
ypxie commented Jun 5, 2016

When will the negative indices be supported? like a[:,-1]

@girving
Member
girving commented Jun 5, 2016

@danijar, @shampool: Yes on both counts.

@danijar
Contributor
danijar commented Jun 5, 2016

@girving Okay, cool. I hate to spam the thread but is there a way to get the behavior from my example without the new syntax?

@girving
Member
girving commented Jun 6, 2016

@danijar: Let's keep this thread focused on the new feature. Please ask side questions on StackOverflow.

@girving girving added the triaged label Jun 6, 2016
@MInner
MInner commented Jun 8, 2016

hi, I attempted to write a temporary workaround (that someone might possibly find useful)
https://gist.github.com/MInner/8b0c0a0e528303b132bf02e277199996

it's more or less clear (at least in simplest case) how to do
A[:, b, :] # => (10, 50, 30) - transpose, gather, transpose back
or
A[a, b, c] # => (50, ) - pack, gather_nd
, but it's not clear to me how could one do A[a, b, :] # => (50, 30) without creating super-large index matrix for gather_nd, though it's clear how to do (50, 50, 30)

A is float [10, 20, 30] in (0..1); a, b, c are ints [50,] in [0..10]

@ebrevdo
Contributor
ebrevdo commented Jun 9, 2016

I'm working on a gather_nd extension to support inner slicing.. Note that
gather_nd still has no gradient.
On Jun 8, 2016 4:51 PM, "Ben Usman" notifications@github.com wrote:

hi, I attempted to write a temporary workaround (that someone might
possibly find useful)
https://gist.github.com/MInner/8b0c0a0e528303b132bf02e277199996

it's more or less clear (at least in simplest case) how to do
A[:, b, :] # => (10, 50, 30) - transpose, gather, transpose back
or
A[a, b, c] # => (50, ) - pack, gather_nd
, but it's not clear to me how could one do A[a, b, :] # => (50, 30)
without creating super-large index matrix for gather_nd, though it's
clear how to do (50, 50, 30)

A is float [10, 20, 30] in (0..1);
a, b, c are ints [50,] in [0..10]


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#206 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABtim2OtDT1jm1ulE1a0NYpSTv0AzN95ks5qJ1VlgaJpZM4GiDCf
.

@benoitsteiner benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Jun 16, 2016
@aselle @tensorflower-gardener aselle + tensorflower-gardener Extended slicing in TensorFlow (similar to NumPy)
This extends slicing to allow a more complete set of the basic slicing
in NumPy. For example, you can now do negative strides while zero strides
are still disallowed.

NOTE: This change does not yet hook up the new slicing behavior to the
python getitem operator, because we do not yet support gradients
(unlike the simpler slice we are extending).

NOTE: This change does not introduce slicing using tensors as arguments
(advanced indexing in NumPy).

foo[5:1:-2]

You can also do ellipsis

foo[5:1:-2,...,3:4]

In addition you can add new dimensions

foo[5:, tf.newaxis, ...]

Fixes part of #206.
Change: 124998811
1f35660
@gpapan
Collaborator
gpapan commented Jun 22, 2016

A workaround for the special case of indexing across some dimension with a list of integers, which might be useful to some folks:

 ind = [3, 5, 0]
 # y = x[:,ind,:]  # this doesn't work right now
 y = tf.concat(1, [tf.expand_dims(x[:, i, :], 1) for i in ind])

I emphasize that this only works if ind a list of integerts but does not cover the case when ind is a Tensor of integers.

@maciekcc maciekcc pushed a commit to maciekcc/tensorflow that referenced this issue Jun 24, 2016
@aselle @tensorflower-gardener aselle + tensorflower-gardener Implement gradient for StridedSlice op.
StridedSliceGrad op implements the gradient of StridedSlice.
Also implement python benchmark for StridedSlice and simple Slice.
Fix bugs in special case optimizations in StridedSlice.

(Toward resolving bug #206)
Change: 125789921
9c4c4cb
@cruvadom
cruvadom commented Jun 30, 2016 edited

Inspired by @nova77, a workaround (until gather_nd has gradients), allowing to use gather_nd normally:

def gather_nd(params, indices, name=None):
  shape = params.get_shape().as_list()
  rank = len(shape)
  flat_params = tf.reshape(params, [-1])
  multipliers = [reduce(lambda x, y: x*y, shape[i+1:], 1) for i in range(0, rank)]
  indices_unpacked = tf.unpack(tf.transpose(indices, [rank - 1] + range(0, rank - 1), name))
  flat_indices = sum([a*b for a,b in zip(multipliers, indices_unpacked)])
  return tf.gather(flat_params, flat_indices, name=name)
@aselle
Member
aselle commented Jul 12, 2016 edited
  • Implement rvalue basic indexing
  • Implement lvalue basic indexing
  • Implement advanced indexing
  • Implement bool mask indexing
@LiamConnell

Here's another workaround for the case where your params variable is of unknown length. Hope it helps, and if there is a better way currently known please let me know! Not sure about differentiability because I'm using it for a policy gradient

# we want output[i] = params[i,indices[i]]
index_mask = tf.reshape(tf.one_hot(indices, 3), [-1,3])
output = tf.reduce_sum(params * index_mask,1)
@aselle aselle removed the triaged label Jul 28, 2016
@Fenugreek

I am looking to do a max_pool that doesn't reduce the input tensor but replaces the non-max elements with zeros. Currently using a clumsy and slow workaround using space_to_depth, argmax, onehot, select and several transposes/reshapes.

Implementing numpy-like indexing for assignment will help simplify this, so I'd like to add my vote for this feature. If anyone has any other ideas on how I can go about this, let me know, Thanks.

@girving
Member
girving commented Aug 2, 2016

@Fenugreek: Please keep this thread focused on the new feature. Please ask side questions on StackOverflow. Note that votes do not count as discussions of the feature.

@hammer hammer referenced this issue in hammerlab/mhcflurry Aug 3, 2016
Closed

Work with TensorFlow backend #38

@tejaskhot

Will this also support multi-indexing such as:
matrix[row_indices, col_indices] where matrix is a mxn tensor and row_indices and col_indices are int32 vectors of sizes k such that k is less than m and n respectively. The slice above returns a length k vector eventually i.e. vectorized operation of matrix[i,j] for all i,j in row_indices, col_indices

@aselle
Member
aselle commented Aug 4, 2016

Eventually we may add support for that (it's what numpy calls advanced indexing), but this first version will just focus on making the so called numpy basic slicing fully supported.

@tejaskhot

@aselle With what we have now, what's the optimal way of achieving this sort of indexing? I was hoping this would be solved in this issue.

@girving
Member
girving commented Aug 4, 2016

@tejaskhot: Please keep this thread focused on the new features. Please ask side questions on StackOverflow.

@cdjkim
cdjkim commented Aug 11, 2016

Any ideas as to when gather_nd will support gradient?

@HokyungLee
HokyungLee commented Aug 13, 2016 edited

We have 3 rank tensor([9, 7, 11)], so we changed into 2 rank tensor([9,11]) with tf.gather_nd(...)
But the problem is, I've noticed that tensorflow optimizer can't support tf.gater_nd(..)
we were already applied to our program in above some idea
That's why, we want to get any idea
Please, help us....

@aselle
Member
aselle commented Sep 28, 2016

Closing this issue. As of 0.10 improved basic indexing and sliced assignment is available. Advanced, mixed advanced and basic, as well as boolean slicing still need implementation. See #4638 and #4639.

@aselle aselle closed this Sep 28, 2016
@sonalgupta

@aselle This issue was closed but the question of when will the gradient for gather_nd be added is still not answered.

@cruvadom

@sonalgupta You can use my workaround from a previous comment.

@danijar
Contributor
danijar commented Sep 29, 2016

@sonalgupta If I understand correctly, you don't need tf.gather_nd() anymore but can use slicing which has implemented gradients.

@aselle
Member
aselle commented Sep 29, 2016

Internally we are working on tf.scatter which will be used to implement tf.gather_nd(). Currently the pythonic syntactic sugar cannot generate tf.gather's but we will add that once tf.scatter is done. We will also need some broadcasting code to generate indexes as well. The advanced slicing is now a separate issue.

@xksteven
xksteven commented Oct 2, 2016

@danijar is it documented that they have gradients? I was trying to find where its stated before I started to use them.

@danijar
Contributor
danijar commented Oct 2, 2016

This example works for me:

import numpy as np
import tensorflow as tf

variable = tf.Variable(np.ones((5, 5)))
cost = tf.reduce_sum(variable[0, :])  # Do more fancy slicing if needed.
optimize = tf.train.GradientDescentOptimizer(0.1).minimize(cost)

sess = tf.Session()
sess.run(tf.initialize_all_variables())

print('Cost (should be decreasing):')
for _ in range(5):
    print(sess.run([cost, optimize])[0])
@aselle
Member
aselle commented Oct 3, 2016

tf.gather_nd does not have gradients yet, but it will soon. tf.gather_nd is not used by pythonic __getitem__ slicing e.g. you cannot yet do
x=[1,2];y=[1,1]; foo[x,y].l @danijar's example is only "basic indexing". Everything that Tensor.getiem supports does support Gradients.

@robsync
robsync commented Oct 4, 2016

Does .10 now support negative indexing? as in X[:,-1] for the last column of a tensor? Thanks

@aselle
Member
aselle commented Oct 4, 2016

0.11RC does (and also master).

@robsync
robsync commented Oct 4, 2016

thanks!

@guillaumeBellec
guillaumeBellec commented Nov 8, 2016 edited

A little inconvenience I came across.
When indexing with the Tensor I get a type error.
I will cast 'a' to int32 in the following code, but it is a bit of a hack I believe.

import tensorflow as tf

A = tf.constant([0,2,3,1])
B = tf.constant([0,1,2,3])

a = tf.argmax(A,1)
B[a]

TypeError: Input 'strides' of 'StridedSlice' Op has type int32 that does not match type int64 of argument 'begin'.

@ravigarg27

Hey all, I want to update a tensor at certain locations (along one dimension only). I understand I can do it via scatter_update however it appears it doesn't have a registered gradient. Is there any workaround this?

@MInner
MInner commented Nov 18, 2016 edited

@ravigarg27 a dumb workaround (if your new values are not Variables you're going to compute gradients over) is to do something conceptually like A - bool(update!=0)*A + update, where update has same shape as A and all not updated entries are equal to 0; however, that is quite out of scope of this thread :)

@ravigarg27

@MInner What I want is something on lines of A[indices, :] = B where A and B are Variables matrices

@MInner
MInner commented Nov 26, 2016 edited

@danijar , I wonder if your def extract_last_relevant() snipped for extracting last non-zero outputs of tensorflow.models.rnn.dynamic_rnns posted above is still required (downside: sparse to dense warning indicating that it might lead to huge matrix allocation) or one could use new (0.11rc) partial implementation of smart indexing in tensorflow to address this issue? (I'm still little confused regarding which parts of smart indexing work now; there was an announcement in 0.11rc changelog regarding improvements in indexing, but they does not seem to address this specific "indexing via another tensor" issue, aren't they?)

@danijar
Contributor
danijar commented Dec 3, 2016

@MInner I played around with. Unfortunately, it doesn't seem like the new indexing can simplify that.

@warmspringwinds

Hey guys.

Sorry for bugging.
Any update on the gradient implementation for tf.gather_nd() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment