Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducing the video prediction model #553

Closed
falcondai opened this issue Oct 18, 2016 · 13 comments
Closed

reproducing the video prediction model #553

falcondai opened this issue Oct 18, 2016 · 13 comments

Comments

@falcondai
Copy link

models/video_prediction @cbfinn

Thank you for generously sharing the code! I have three questions about the released code:

  • are the hyperparameters used in the paper the same as the default options in prediction_train.py? in particular the number of training steps.
  • can you share some figures on the expected performance of the trained model over the val/train sets? I observed some strange val_loss trend line so i wonder if i made a mistake.
    screen shot 2016-10-17 at 20 44 50
  • is there a plan to also release the valuation/visualization script for the model? if not, i would love to contribute (and i am sure many other users will as well).
@cbfinn
Copy link
Contributor

cbfinn commented Oct 18, 2016

are the hyperparameters used in the paper the same as the default options in prediction_train.py?

For the most part, yes. There are a few differences:

  • For the paper, I downsampled with PIL's antialiasing method, outside of tensorflow. In this code, the images are downsampled in tensorflow, using bicubic interpolation. This isn't a great option, as it causes the images to be a bit pixelated. A convolution-based downsampling would be a better option.
  • I use layer norm after every layer, which I didn't do in the paper. I think this only makes things more stable.
  • train/val split is different from what I used
  • The PSNR calculation that is saved in a scalar summary is not quite correct. It is done for an entire batch of images, but should be done for each image independently and then averaged. This is pretty easy to fix (and I probably should have fixed it earlier; I've just been really busy).

I observed some strange val_loss trend line so i wonder if i made a mistake.

That curve is about what I would expect. It looks strange is because of scheduled sampling, a curriculum which stochastically passes in ground truth frames at some times during the beginning of training. The curriculum ends around 12k. (See citation [2] in paper for details). To turn off scheduled sampling, you can set --schedsamp_k=-1
Alternatively, you could make a change to the code to set schedsamp_k=-1 for the validation model, regardless of what's used for the training model. This might be nice.

can you share some figures on the expected performance of the trained model over the val/train sets?

I did this work when I was an intern at Google Brain, and I no longer have access to data/code/training curves that I used for the paper.

is there a plan to also release the valuation/visualization script for the model?

I'm not planning on doing this in the immediate future, but love to have something like this added to the released code. I'd be happy to help review code for this, and potentially add to it. For example, I think that tiling animated gifs is a great way to visualize the model's predictions, as seen here: https://sites.google.com/site/robotprediction/ (scroll down about halfway). I have the code for tiling predictions together and saving into a gif, which I'd be happy to share.

It would also be really useful to visualize the gifs during training, e.g., in tensorboard (tensorflow/tensorflow#3936)

@asimshankar
Copy link
Contributor

Thanks for the response @cbfinn.
@falcondai : seems you got the answers you were seeking.

Closing this out. If you have more concerns, please do file a new issue/check with @cbfinn

@falcondai
Copy link
Author

falcondai commented Oct 18, 2016

@cbfinn Thanks for the clarifications and pointers! i will follow up with more specific issues should they arise.

@tegg89
Copy link

tegg89 commented Apr 18, 2017

@cbfinn
For your previous commentary, how can I get tiling animated gifs visualizing result of model's prediction? I have been tried to analyze and modify input and training files, but I couldn't do well. Can I get any help for that?

@cbfinn
Copy link
Contributor

cbfinn commented Apr 18, 2017

Here's an example script that loads images from the pushing dataset and exports them to gifs, using the moviepy package (though does not tile them).
grab_train_images.py.zip

It is straightforward to use moviepy to stack gifs side-by-side, to form a tiling.
http://zulko.github.io/moviepy/getting_started/compositing.html#stacking-and-concatenating-clips

@tegg89
Copy link

tegg89 commented Apr 18, 2017

@cbfinn @falcondai
Thanks for your generous reply! I will follow rest of the codes with respect to included codes :)

@falcondai
Copy link
Author

falcondai commented Apr 18, 2017

@tegg89 i ended up using imageio for creating GIF. Its API is pretty straightforward. For an example (ipython notebook): https://gist.github.com/falcondai/1e22919e6ce8d6a8e3dd3da5a6a0ad94

@tegg89
Copy link

tegg89 commented Apr 20, 2017

@cbfinn @falcondai
When I put data to network for evaluation, result GIF file is created that is not sequential.
I have switched shuffle option in prediction_input.py to disable.

@cbfinn
Copy link
Contributor

cbfinn commented Apr 20, 2017

@tegg89 Make sure you are only calling session.run() once for the entire sequence, rather than once for each frame. The script grab_train_images.py shows how to extract a sequence of images in order, with a single sess.run() per sequence.

@tegg89
Copy link

tegg89 commented Apr 21, 2017

@cbfinn @falcondai
Sorry for bothering to keep ask you questions. But I have keep troubles on visualizing test data.

Referring to grab_train_images.py, I have changed the input file that returns sequential video frames. However, when I have put this file through network model, the gen_images came out with not sequential form. Modified code is in here. The steps
that I ran through are followed:

  1. Get images, states and actions from prediction_input.py (I already checked that images are shown sequential order)
  2. Put three data into network by Model class in prediction_train.py.
  3. Create session and load trained model. (Up to this step, I could load model with some minimal code changes)
  4. gen_images = sess.run([model.gen_images], feed_dict={model.iter_num: -1})
    (learning_rate term from feed_dict is deleted because of not using)
  5. Denormalize gen_images
  6. Transform gen_images to gif

Then it came out with no sequential form. In my opinion, the network model makes input file not sequential form. How did you do visualizing evaluation?

@carsonDB
Copy link

carsonDB commented Jun 9, 2017

@cbfinn Thanks for your paper and codes. And sorry to bother you for a little detail.

  1. As you said above, "train/val split is different from what I used".
    In the paper, train_val_split is 0.95. And I saw that this tf version also uses the same setting as default.
  2. In the complete training (iteration 10K), I found that the validation psnr (I use this evaluation) is not always accordance with test psnr. I choose the best model through selecting best validation psnr when training. But sometimes, some periodic checkpoint models' psnr are higher than the selected best model (gap up to 0.5).

Is train_val_split == 0.95 not enough in practice?

@cbfinn
Copy link
Contributor

cbfinn commented Jun 9, 2017

@carsonDB The percentage is the same, but the actual videos used for training and validation are different (as they are randomized).

@carsonDB
Copy link

@cbfinn Thanks for your quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants