Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I haven't understand your network. #15

Open
ztrobertyang opened this issue Dec 8, 2021 · 1 comment
Open

I haven't understand your network. #15

ztrobertyang opened this issue Dec 8, 2021 · 1 comment

Comments

@ztrobertyang
Copy link

Hi,
Thanks for providing code. I look at your code, I find you train one video, and then use the same to do the inference. I think it is tricky. The CNN should be train with multiple videos, and then use different video to do inference. Same video to train and same video to inference, it is of cause produce a good result.

I try to understand your model. Is kind like the input video has an foreground object and its mask, then, you give another mask for augmentation. Then, after training, the output video will have no foreground and the background is re-drawed. Am I right?

Can I train your model with multiple videos then inference with a different video? For example, I want to train 10 different videos, then I inference with another different video? What will be happened on the inference? How can I train the model with multiple videos?

@mgwillia
Copy link

Hi, I am not one of the authors of this paper, but it seems you misunderstand the ideas that motivate implicit representation, and in particular, implicit representation for video inpainting.

Long story short, this is a zero-shot task. There is no "training" and "testing" in the traditional ML sense. The model receives as inputs the frames and some masks (as few as one mask!) for inpainting. It iterates on those frames, learning how to inpaint for that specific video, until convergence, which is when it delivers a satisfactory inpainting. It doesn't need general visual priors from expensive pretraining- this is not desirable. As an implicit method, the visual priors the network learns from the video of interest are sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants