TensorFlow equivalent to numpy.repeat #8246

shoyer · 2017-03-09T18:09:38Z

This is a popular question on StackOverflow:
http://stackoverflow.com/questions/35361467/tensorflow-numpy-repeat-alternative

But note that the answer so far only works for some use cases (the one presented in the question).

The best I could come up with for a general solution uses tf.while_loop, which is pretty verbose (and maybe slower than necessary). I'll add a link to the implementation I wrote for tf.contrib.training.resample_at_rate after the next internal/github sync.

The text was updated successfully, but these errors were encountered:

shoyer · 2017-03-15T18:54:55Z

See

tensorflow/tensorflow/contrib/training/python/training/resample.py

Line 32 in 59ecde3

def _repeat_range(counts, name=None):

for my example using tf.while_loop.

lattas · 2017-04-03T01:45:17Z

Dear @shoyer , I am currently working on this with a new op in array_ops and python wrapper function. I will update you soon.

shoyer · 2017-04-03T01:52:15Z

@a-lattas Great! Please replace _repeat_range in resample_at_rate (linked above) with the new op as part of your PR.

shoyer · 2017-04-04T02:39:05Z

CC @zycdragonball @sidjee @zuoxingdong

waleedka · 2017-06-14T08:14:57Z

What's the status of this update?

qianyizhang · 2017-10-17T14:51:22Z

since the thread is still open, I will paste my solution here

def np_repeat(tensor, repeats):
    """
    Args:

    input: A Tensor. 1-D or higher.
    repeats: A list. Number of repeat for each dimension, length must be the same as the number of dimensions in input

    Returns:
    
    A Tensor. Has the same type as input. Has the shape of tensor.shape * repeats
    """
    assert len(repeats) == tensor.ndim, "dimension must match"
    repeated = tensor
    for axis, repeat in enumerate(repeats):
        repeated = np.repeat(repeated, repeat, axis = axis)
    return repeated

def tf_repeat(tensor, repeats):
    """
    Args:

    input: A Tensor. 1-D or higher.
    repeats: A list. Number of repeat for each dimension, length must be the same as the number of dimensions in input

    Returns:
    
    A Tensor. Has the same type as input. Has the shape of tensor.shape * repeats
    """
    with tf.variable_scope("repeat"):
        expanded_tensor = tf.expand_dims(tensor, -1)
        multiples = [1] + repeats
        tiled_tensor = tf.tile(expanded_tensor, multiples = multiples)
        repeated_tesnor = tf.reshape(tiled_tensor, tf.shape(tensor) * repeats)
    return repeated_tesnor

def repeat_test():
    shape = [1,3,3,3,2]
    repeat = [1,2,2,3,1]
    tensor = np.random.randn(*shape)
    np_repeated_tensor = np_repeat(tensor, repeat)
    tf_tensor = tf.constant(tensor)
    g = tf.get_default_graph()
    tf_new = tf_repeat(tf_tensor, repeat)
    with tf.Session(graph=g) as sess:
        tf_repeated_tensor = tf_new.eval()
#    tf_repeated_tensor = np.array(tf_repeated_tensor)
    if np.allclose(np_repeated_tensor, tf_repeated_tensor):
        print("tf_repeat is the same as np_repeat")
    else:
        print("something wrong")

PeterMitrano · 2017-10-25T03:00:52Z

👍 for what seems like a basic feature

Tutufa · 2017-11-20T17:20:35Z

need one!

yongtang · 2017-12-08T21:40:03Z

Added a PR #15224 for the fix.

liuhu-bigeye · 2018-01-03T07:11:12Z

@qianyizhang In numpy.repeat we can assign the number of repetitions for each element, e.g.
x = np.array([[1,2],[3,4]])
np.repeat(x, [1, 2], axis=0)=array([[1, 2], [3, 4], [3, 4]])

itsmeolivia · 2018-02-08T01:13:18Z

This looks like it was fixed bu #15224. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

shoyer · 2018-02-08T01:24:50Z

#15224 was indeed a PR to add tf.repeat, but it was closed without merging. Reopening.

RylanSchaeffer · 2018-06-25T10:34:25Z

@shoyer , any updates on this?

shoyer · 2018-06-25T15:12:04Z

I haven't been looking at this, but check the referenced PRs above.

sachinruk · 2018-07-24T06:51:09Z

Unfortunately @qianyizhang's answer doesn't help with the following use of np.repeat:

x = np.array([0,1,2])
np.repeat(x,[3,4,5])
>>> array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2])

Also has there been a merged PR?

jhultman · 2018-09-08T01:48:13Z

@sachinruk have you found a good solution for that use case? I would love to hear about it. In the meantime, if your scenario permits tf.py_func you can try my simple solution below.

import tensorflow as tf
import numpy as np

def tf_repeat(arr, repeats):
    return tf.py_func(np.repeat, [arr, repeats], tf.float32)

arr = np.float32([4, 2, 3, 5])
repeats = np.int32([2, 3, 0, 2])

with tf.Session() as sess:
    out = sess.run(tf_repeat(arr, repeats))
    print(out)

>>> [4. 4. 2. 2. 2. 5. 5.]

mspinaci · 2018-10-05T14:05:36Z

At least for 1D tensors, a combination of tf.cumsum and tf.sparse_to_dense should do the trick:

import tensorflow as tf


def tf_repeat_1D(x,repeats):
    x = tf.constant(x, dtype=tf.float64)
    repeats = tf.constant(repeats, dtype=tf.int32)

    shape = tf.reduce_sum(repeats)
    idx = tf.concat([tf.constant([0], dtype=tf.int32), tf.cumsum(repeats[:-1])], axis=0)
    y = tf.sparse_to_dense(
        sparse_indices = idx,
        output_shape=(shape,),
        sparse_values=x - tf.concat([tf.constant([0], dtype=tf.float64), x[:-1]], axis=0)
    )

    return tf.cumsum(y)

z1 = tf_repeat_1D([0,1,2], [3,4,5])
z2 = tf_repeat_1D([4,2,5], [1, 3, 2])

with tf.Session() as sess:
    print(z1.eval())
    print(z2.eval())

This prints:

[0. 0. 0. 1. 1. 1. 1. 2. 2. 2. 2. 2.]
[4. 2. 2. 2. 5. 5.]

The trick is to use something like np.concatenate(([x[0]], x[1:]-x[:-1])) (placed at the correct indices) in order to reconstruct the correct values when using cumsum.

Generalizing to higher dim is made complicated by the fact that sparse_to_dense only accepts 1d sparse_values input... so probably using another function there is better.

jackd · 2019-03-06T06:43:49Z

~~For anyone else tired of waiting for a PR to get reviewed/accepted, I've ported this pull request to a separate repo with some minor tweaks. Only CPU version without gradients implemented.~~

Just found this hidden implementation. It's not a part of the exported API, so you'll have to be a little dirty to acces it.

from tensorflow.python.ops.ragged.ragged_util import repeat

yongtang · 2019-03-09T23:01:07Z

cc @jackd Added a PR #26517 to expose repeat.

jackd · 2019-09-02T02:47:19Z

@yongtang Thanks for your work on this. I've been playing with this implementation (and the hidden version) for a while now, but just recently benchmarked performance and... well, it leaves a lot to be desired. For modestly sized arrays it can be significantly improved in terms of both speed and memory usage by gathering a repeated range.

def gather_repeat(values, repeats, axis=0):
    indices = tf.repeat(tf.range(tf.shape(values)[axis]), repeats)
    return tf.gather(values, indices, axis=axis)

(note I haven't tested different axes/confirmed it covers all edge cases)

For the very modestly sized arrays in the example below you can get a ~30% speed up and 75% memory reduction. Larger tensors result in converging computation time, but the memory factor remains roughly constant. From what I understand, gathers aren't particularly economical, so I imagine there's plenty of performance optimization to go from there (though possibly not in python). I'm happy to put the following together into a PR (or if anyone else is more comfortable with the tf PR process, feel free to take it and run with it), but if this motivates the tf team to accept an optimized kernel solution I'd prefer to skip straight to that.

Using pip-installed tf 1.15.0-dev20190821 , GTX-1070, python 3.6

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

max_repeats = 100
nrows = 1000
ndims = 100


def gather_repeat(values, repeats, axis=0):
    indices = tf.repeat(tf.range(tf.shape(values)[axis]), repeats)
    return tf.gather(values, indices, axis=axis)


def get_args():
    r = np.random.RandomState(123)  # pylint: disable=no-member
    repeats = (r.uniform(size=(nrows,),) * max_repeats).astype(np.int64)
    values = r.uniform(size=(nrows, ndims)).astype(np.float32)
    return tf.constant(values), tf.constant(repeats)


def benchmark_base():
    with tf.Graph().as_default():
        values, repeats = get_args()
        op = tf.repeat(values, repeats, axis=0)
        with tf.Session() as sess:
            print('***********************')
            print('base')
            tf.test.Benchmark().run_op_benchmark(sess, op)


def benchmark_gather():
    with tf.Graph().as_default():
        values, repeats = get_args()
        op = gather_repeat(values, repeats, axis=0)
        with tf.Session() as sess:
            print('***********************')
            print('gather')
            tf.test.Benchmark().run_op_benchmark(sess, op)


def ensure_same():
    with tf.Graph().as_default():
        values, repeats = get_args()
        base = tf.repeat(values, repeats, axis=0)
        gathered = gather_repeat(values, repeats, axis=0)
        max_err = tf.reduce_max(tf.abs(gathered - base))
        with tf.Session() as sess:
            print('Max error: {}'.format(sess.run(max_err)))


if __name__ == '__main__':
    benchmark_base()
    benchmark_gather()
    ensure_same()

Output:

***********************
base
entry {
  name: "TensorFlowBenchmark.run_op_benchmark"
  iters: 10
  wall_time: 0.002911686897277832
  extras {
    key: "allocator_maximum_num_bytes_GPU_0_bfc"
    value {
      double_value: 79595528.0
    }
  }
  extras {
    key: "allocator_maximum_num_bytes_gpu_host_bfc"
    value {
      double_value: 24.0
    }
  }
}

***********************
gather
entry {
  name: "TensorFlowBenchmark.run_op_benchmark"
  iters: 10
  wall_time: 0.0021082162857055664
  extras {
    key: "allocator_maximum_num_bytes_GPU_0_bfc"
    value {
      double_value: 20176400.0
    }
  }
  extras {
    key: "allocator_maximum_num_bytes_gpu_host_bfc"
    value {
      double_value: 197768.0
    }
  }
}
Max error: 0.0

edloper · 2019-10-02T19:25:22Z

@jackd If you have the cycles to write up an optimized kernel solution for this (incl. a gradient function), I'll help make sure it gets accepted. (I'm the original author of the pure-python version, and a faster version would help speed up several RaggedTensor operations.)

mohantym · 2021-12-03T01:46:35Z

Hi @shoyer ! Have you checked latest document on tf.repeat yet? Thanks!

mohantym · 2022-02-11T06:55:05Z

Are you satisfied with the resolution of your issue?
Yes
No

prb12 added type:feature Feature requests stat:contribution welcome Status - Contributions welcome labels Mar 9, 2017

shoyer mentioned this issue Apr 4, 2017

Add tf.repeat equivalent to np.repeat #8521

Closed

zycdragonball mentioned this issue Apr 4, 2017

Add TensorFlow equivalent to np.repeat #8954

Closed

yongtang mentioned this issue Dec 8, 2017

Add TensorFlow support for tf.repeat (equivalent to np.repeat) #15224

Closed

itsmeolivia closed this as completed Feb 8, 2018

shoyer reopened this Feb 8, 2018

tehZevo mentioned this issue Apr 1, 2018

Add "repeat" op & gradients / tf.image.resizeBilinear gradients tensorflow/tfjs#38

Closed

imsheridan mentioned this issue Apr 9, 2018

Add support for tf.repeat (equivalent to np.repeat) #18350

Closed

jackd mentioned this issue Feb 15, 2019

Run length encoding + utils for shapenet voxel dataset tensorflow/datasets#32

Closed

yongtang mentioned this issue Mar 9, 2019

Add tf.repeat support equivalent to numpy.repeat #26517

Closed

mohantym self-assigned this Dec 3, 2021

shoyer closed this as completed Dec 3, 2021

praveenkumar66d mentioned this issue Nov 29, 2022

ValueError: You called set_weights(weights) on layer "conv2dbn_block_2" with a weight list of length 5, but the layer was expecting 0 weights. Provided weights: sony/model_optimization#428

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow equivalent to numpy.repeat #8246

TensorFlow equivalent to numpy.repeat #8246

shoyer commented Mar 9, 2017

shoyer commented Mar 15, 2017

lattas commented Apr 3, 2017

shoyer commented Apr 3, 2017

shoyer commented Apr 4, 2017

waleedka commented Jun 14, 2017

qianyizhang commented Oct 17, 2017 •

edited

Loading

PeterMitrano commented Oct 25, 2017

Tutufa commented Nov 20, 2017

yongtang commented Dec 8, 2017

liuhu-bigeye commented Jan 3, 2018

itsmeolivia commented Feb 8, 2018

shoyer commented Feb 8, 2018

RylanSchaeffer commented Jun 25, 2018

shoyer commented Jun 25, 2018

sachinruk commented Jul 24, 2018

jhultman commented Sep 8, 2018

mspinaci commented Oct 5, 2018 •

edited

Loading

jackd commented Mar 6, 2019 •

edited

Loading

yongtang commented Mar 9, 2019

jackd commented Sep 2, 2019

edloper commented Oct 2, 2019

mohantym commented Dec 3, 2021 •

edited

Loading

mohantym commented Feb 11, 2022

TensorFlow equivalent to numpy.repeat #8246

TensorFlow equivalent to numpy.repeat #8246

Comments

shoyer commented Mar 9, 2017

shoyer commented Mar 15, 2017

lattas commented Apr 3, 2017

shoyer commented Apr 3, 2017

shoyer commented Apr 4, 2017

waleedka commented Jun 14, 2017

qianyizhang commented Oct 17, 2017 • edited Loading

PeterMitrano commented Oct 25, 2017

Tutufa commented Nov 20, 2017

yongtang commented Dec 8, 2017

liuhu-bigeye commented Jan 3, 2018

itsmeolivia commented Feb 8, 2018

shoyer commented Feb 8, 2018

RylanSchaeffer commented Jun 25, 2018

shoyer commented Jun 25, 2018

sachinruk commented Jul 24, 2018

jhultman commented Sep 8, 2018

mspinaci commented Oct 5, 2018 • edited Loading

jackd commented Mar 6, 2019 • edited Loading

yongtang commented Mar 9, 2019

jackd commented Sep 2, 2019

edloper commented Oct 2, 2019

mohantym commented Dec 3, 2021 • edited Loading

mohantym commented Feb 11, 2022

qianyizhang commented Oct 17, 2017 •

edited

Loading

mspinaci commented Oct 5, 2018 •

edited

Loading

jackd commented Mar 6, 2019 •

edited

Loading

mohantym commented Dec 3, 2021 •

edited

Loading