Add MovingMNIST dataset #28

jackd · 2019-01-28T09:04:14Z

moving_sequence doesn't actually provide a dataset per-se, but it's obviously strongly linked to moving_mnist. Not sure if that's precisely what this repo is intended for, but had fun writing it (and watching bouncing shoes/coats from fashion_mnist was interesting).

Conchylicultor

Thanks for this contribution. Can you explain what was the issue with tfds.feature.Video ?

tensorflow_datasets/video/moving_mnist.py

Conchylicultor · 2019-01-28T17:47:58Z

tensorflow_datasets/video/moving_mnist.py

+    # sequence = tfds.features.Image(shape=shape)
+
+    # as video - doesn't work with 1 as final dim?
+    # sequence = tfds.features.Video(shape=shape)


Video should works.
tfds.features.Video(shape=(seq_lenqth, height, width, 1))

If not, this is a bug from our end. Which error are you seeing ?

rsepassi

Thanks for adding this @jackd! And yes, let's try to get Video to work.

tensorflow_datasets/video/__init__.py

tensorflow_datasets/video/moving_mnist.py

jackd · 2019-01-29T02:09:40Z

Well that was a fun rabbit hole to dive down. Found/fixed a bug in numpy related to squeeze on lists - it was ignoring the axis argument and being a little overzealous and squeezing out the final rank of the 20, 64, 64, 1 sequence, resulting in shape errors down the line.

That being said, I feel the fixed implementation of squeeze should make the existing implementation of np_to_list (this repo) raise an error. As far as I can tell, the old implementation of np_to_list in this repo and bugged squeeze was effectively the identity function. Fixed np_to_list here, but maybe this isn't the best place to issue such a change?

Conchylicultor · 2019-01-29T09:13:09Z

tensorflow_datasets/core/features/video_feature.py

@@ -61,6 +62,8 @@ def __init__(self, shape):
      raise ValueError('Video shape should be of rank 4')
    if shape.count(None) > 1:
      raise ValueError('Video shape cannot have more than 1 unknown dim')
+    if shape[-1] not in (1, 3):


Please remove. The test is already in Image() called bellow: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/features/image_feature.py#L108

done, but does this mean videos with 0 or 2 channels are also accepted? In the interest of keeping documentation up to date (had me confused initially when it said channels had to be 3).

tensorflow_datasets/video/moving_mnist.py

Conchylicultor · 2019-01-29T09:21:45Z

tensorflow_datasets/core/features/sequence_feature.py

@@ -260,7 +260,7 @@ def np_to_list(elem):
    return elem
  elif isinstance(elem, np.ndarray):
    elem = np.split(elem, elem.shape[0])
-    elem = np.squeeze(elem, axis=0)
+    elem = [np.squeeze(e, axis=0) for e in elem]


Nice catch. Thanks for fixing this.

Had to go away and come back before realizing there was a much better way of doing this, unless I'm failing to appreciate some corner cases (elements won't automatically be converted np arrays... but given the name, I'm guessing that shouldn't be relevant?

rsepassi

The dataset looks good, but I'm not sure about accepting the moving_sequence module into TFDS. The videos are usable in an ML model as-is; what is moving_sequence useful for?

jackd · 2019-02-01T00:57:42Z

The moving_mnist dataset is the testing dataset used/provided by the authors. They also provided code to dynamically produce their training/validation data. moving_sequence is a port of that to tensorflow with some generalizations (customizable number of images, speed, base dataset, output size) and a simplified bouncing mechanism (visually indistinguishable - original code clipped movement passed boundary to the boundary before reversing direction, this implementation reflects is directly resulting in a much simpler implementation that doesn't involve looping over all time steps. Happy to ellaborate more).

In the interest of reproducibility I think it's appropriate to package them together. Anyone wanting to test on this dataset will, presumably, want a similar dataset to train with, and accessing it from the same source as the test data makes sense to me. How one packages it is another question, and I'm open to suggestions.

My original implementation of moving_mnist overrode as_dataset for train/validation splits to return a moving_sequence mapped mnist dataset. This makes the interface uniform with other datasets, but it also raised other questions.

What's an epoch?
Should the base mnist dataset be shuffled? If so, how do we discourage users from shuffling the output dataset?
Should both train and test sets from mnist be used in the train moving_mnist dataset?

Someone could just make those decisions (or moving_mnist could just have a very large set of builder_kwargs for the train/val case) but I feel if users already understand the Dataset interface it would be easier to give them the lower level tools and examples on how to use them. We could always do both.

A lot of these questions above are related to dynamically generated datasets in general (or those involving non-trivial mapping operations at least). Does tfds have a policy on these? Most of the infrastructure seems tailored to datasets on file, but in terms of my understanding of the projects goals and interface I'm not sure this is the limitation of its scope.

tensorflow_datasets/video/moving_mnist.py

tensorflow_datasets/video/moving_sequence.py

rsepassi · 2019-02-03T02:43:36Z

Thank you for explaining!

I think you made the right call to only include the test data and to include this function here so that users can create the moving sequences themselves from the MNIST dataset.

My main request is that we limit the surface area for the new module to just the 1 key method and to add a test for images_as_moving_sequence on some dummy data and use the test_utils.run_in_graph_and_eager_modes decorator.

jackd · 2019-02-04T03:42:34Z

No disagreement here. I've added a few more optional kwargs to the main function in place of other publicly visible functions and adjusted for dynamic sizing, but the implementation remains very much the same.

jackd · 2019-02-04T04:25:57Z

... lunch-time brought the revelation that I'm overcomplicating this trying to do this for multiple images in one shot - and that removing the foldl entirely is likely much easier. Will make changes and fix test errors now...

rsepassi · 2019-02-04T05:30:00Z

Thanks! Simpler is better, especially when considering maintenance cost. Doing things for one image at a time sounds good. Update here when it’s ready for another look.

…

On Sun, Feb 3, 2019 at 8:25 PM Dominic Jack ***@***.***> wrote: ... lunch-time brought the revelation that I'm overcomplicating this trying to do this for multiple images in one shot - and that removing the foldl entirely is likely much easier. Will make changes and fix test errors now... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABEGW_9G0471HxCi8Shop_kuxGiIIT9Yks5vJ7ZWgaJpZM4aVdwD> .

jackd · 2019-02-04T05:44:44Z

Finally, an excuse to play with tf 2.0 :). Merged in master changes and updated to single-image implementation.

rsepassi

Ok, just a few little things in the test and then we're good to go!

tensorflow_datasets/video/moving_sequence.py

tensorflow_datasets/video/moving_sequence_test.py

jackd · 2019-02-05T03:27:56Z

ack, just found tf 2.0 bug with assert_greater. will put in the pull request and link when done, but until it gets merged in the compat.v1 workaround might have to stay.

Update: I have no idea how assert_xxx_v2 are supposed to be used in graph mode. None of them have return statements, but the omissions look intentional - the Returns part of the documentation is also removed.

rsepassi

LGTM, thanks! Will merge after I've verified on my end that the dataset generates ok.

PiperOrigin-RevId: 232784325

MechCoder · 2019-05-28T23:14:06Z

tensorflow_datasets/video/moving_sequence.py

+  return tf.math.minimum(2 - points, points)
+
+
+def _get_random_unit_vector(ndims=2, dtype=tf.float32):


The current code generates the direction from a normal distribution. I believe this is not what the original code does which generates the direction uniformly.

Please see:

line 245 in data_handlers.py in http://www.cs.toronto.edu/~nitish/unsupervised_video/unsup_video_lstm.tar.gz
where the direction is sampled uniformly from 0 - 2*pi

Is that correct or am I missing something?

Glad to see someone's actually reading the code! The original takes a unit vector with angle sampled uniformly. Here we take random normal coordinates and normalize them. The resulting distributions are equivalent. See e.g. alternative method 1 or convince yourself with the below code.

import numpy as np import matplotlib.pyplot as plt n = 100000 x, y = np.random.normal(size=(2, n)) angle = np.arctan2(y, x) plt.hist(angle) plt.show()

I see, cool thanks!

added moving_mnist test data and mapping fns for other splits

a55bf80

googlebot added the cla: yes Author has signed CLA label Jan 28, 2019

jackd mentioned this pull request Jan 28, 2019

[data request] Moving MNIST #13

Closed

Conchylicultor reviewed Jan 28, 2019

View reviewed changes

rsepassi suggested changes Jan 28, 2019

View reviewed changes

tensorflow_datasets/video/__init__.py Outdated Show resolved Hide resolved

tensorflow_datasets/video/moving_mnist.py Outdated Show resolved Hide resolved

rsepassi changed the title ~~added moving_mnist test data and mapping fns for other splits~~ Add MovingMNIST dataset Jan 28, 2019

requested changes + np_to_list change

3e0940e

jackd added 3 commits January 29, 2019 12:27

updated video documentation, added channels error check

5f0a8c7

removed unused private variables

e312617

clean up

8d3f388

Conchylicultor reviewed Jan 29, 2019

View reviewed changes

jackd added 2 commits January 29, 2019 20:18

requested changes

e3615c4

simplified np_to_list

3a5fe00

rsepassi reviewed Jan 31, 2019

View reviewed changes

rsepassi suggested changes Feb 3, 2019

View reviewed changes

jackd added 2 commits February 4, 2019 13:28

adjusted for dynamic shaped images, limited interface

38cd512

test tweak and n_channels_out adjustment

faea0ee

jackd added 3 commits February 4, 2019 15:06

removed foldl loop, unvectorized image_as_moving_sequence

c4164b2

Merge branch 'master' into moving_mnist

9d8b1d0

merged master changes, fixed tests

05be775

Small tweaks

2383f64

rsepassi suggested changes Feb 5, 2019

View reviewed changes

requested changes

df56746

rsepassi approved these changes Feb 6, 2019

View reviewed changes

tfds-copybara merged commit df56746 into tensorflow:master Feb 7, 2019

tfds-copybara pushed a commit that referenced this pull request Feb 7, 2019

Merge pull request #28 from jackd:moving_mnist

97b0093

PiperOrigin-RevId: 232784325

jackd deleted the moving_mnist branch February 7, 2019 23:20

MechCoder reviewed May 28, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MovingMNIST dataset #28

Add MovingMNIST dataset #28

jackd commented Jan 28, 2019

Conchylicultor left a comment

Conchylicultor Jan 28, 2019

rsepassi left a comment

jackd commented Jan 29, 2019

Conchylicultor Jan 29, 2019

jackd Jan 29, 2019

Conchylicultor Jan 29, 2019

jackd Jan 30, 2019

rsepassi left a comment

jackd commented Feb 1, 2019

rsepassi commented Feb 3, 2019

jackd commented Feb 4, 2019

jackd commented Feb 4, 2019

rsepassi commented Feb 4, 2019 via email

jackd commented Feb 4, 2019

rsepassi left a comment

jackd commented Feb 5, 2019 •

edited

rsepassi left a comment

MechCoder May 28, 2019

jackd May 28, 2019

MechCoder May 29, 2019

		return tf.math.minimum(2 - points, points)


		def _get_random_unit_vector(ndims=2, dtype=tf.float32):

Add MovingMNIST dataset #28

Add MovingMNIST dataset #28

Conversation

jackd commented Jan 28, 2019

Conchylicultor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rsepassi left a comment

Choose a reason for hiding this comment

jackd commented Jan 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rsepassi left a comment

Choose a reason for hiding this comment

jackd commented Feb 1, 2019

rsepassi commented Feb 3, 2019

jackd commented Feb 4, 2019

jackd commented Feb 4, 2019

rsepassi commented Feb 4, 2019 via email

jackd commented Feb 4, 2019

rsepassi left a comment

Choose a reason for hiding this comment

jackd commented Feb 5, 2019 • edited

rsepassi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackd commented Feb 5, 2019 •

edited