Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the training from scratch #23

Open
gonglixue opened this issue Sep 15, 2019 · 9 comments
Open

Questions about the training from scratch #23

gonglixue opened this issue Sep 15, 2019 · 9 comments

Comments

@gonglixue
Copy link

Hi. I used the provided code to train TimeCycle on some other video datasets. Finetuning the network with the provided checkpoint_14.pth.tar works fine. But when I training the network from scratch, both the inlier loss and theta loss did not decrease. Is there any training tips when training TimeCycle from scratch?

@JBKnights
Copy link

@gonglixue - when you visualised your training results, did you ever get a blocky output for your visualization? We're running into similar problems, and I'm wondering if this is a problem for you too

@roeez
Copy link

roeez commented Sep 23, 2019

same problem... :(

@gonglixue
Copy link
Author

I have trained it from scratch successfully : )
Firstly set detach_network=True in model_simple.py, which means freezing the feature extractor. And then set detach_network=False to train the whole network end-to-end.

@roeez
Copy link

roeez commented Sep 23, 2019

Thanks! i will try

@gonglixue
Copy link
Author

@gonglixue - when you visualised your training results, did you ever get a blocky output for your visualization? We're running into similar problems, and I'm wondering if this is a problem for you too

I didn't come across the blocky output problem. Using the code in transformation.py to transform an image with a given affine matrix works correctly.

@roeez
Copy link

roeez commented Sep 23, 2019

can you please provide me more details?
did you set also can_detach=True in forward_base method?
first you detach the encoder and train the transformation network for few epochs and then you set detach_network=False and train more epochs?
The optimizer and the optimizer settings are as stated in the paper?

my loss_targ_theta_skip is very noisy and the back_inliers is vanishing very early...

Thanks :)

@gonglixue
Copy link
Author

gonglixue commented Sep 23, 2019

can you please provide me more details?
did you set also can_detach=True in forward_base method?
first you detach the encoder and train the transformation network for few epochs and then you set detach_network=False and train more epochs?
The optimizer and the optimizer settings are as stated in the paper?

my loss_targ_theta_skip is very noisy and the back_inliers is vanishing very early...

Thanks :)

My full training process is as follow:

  1. Completely detach the feature extractor. That means
    https://github.com/xiaolonw/TimeCycle/blob/16d33ac0fb0a08105a9ca781c7b1b36898e3b601/models/videos/model_simple.py#L166 is always True
# detach_network=True in __init__()
# if self.detach_network and can_detach:
if self.detach_network:
    x_pre = x_pre.detach()

In this step, I set lamda=0.3, lr=2e-4. And the inlier loss only decrease a little bit.

  1. After step-1 converges, set detach_network=False to train the whole network and everything other is the same as original code.
if self.detach_netwrok and can_detach:
    x_pre = x_pre.detach()

In this step, I find that the theta loss almost converges while the inlier loss decreases slowly. So decrease the weight of theta loss with lamda=0.1 and use a larger learning rate (lr=3e-4)

  1. Use a smaller learning rate (lr=2e-4, lamda=0.1) to finetune.

My training process seems a little complicated. For some video data, I have to adjust the hyper parameters back and forth...

@roeez
Copy link

roeez commented Sep 23, 2019

Thank you very much for the detailed answer, you are great!

@JBKnights
Copy link

Thanks so much for the help! Out of curiosity how many epochs did each of the steps take? i.e. How much training did you do before you unfroze the feature extractor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants