Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze() doesn't set requires_grad to False #51

Closed
kamisoel opened this issue Jan 22, 2021 · 8 comments
Closed

freeze() doesn't set requires_grad to False #51

kamisoel opened this issue Jan 22, 2021 · 8 comments
Labels
good first issue Good for newcomers

Comments

@kamisoel
Copy link

kamisoel commented Jan 22, 2021

While playing around with the TSBert notebook, I noticed that the models parameters don't seem to get frozen when loading pretrained weights. Even if you call freeze() on the Learner, the parameters do NOT get frozen! I used the count_parameters() method (which as I checked just counts the parameters with requires_grad==True) to confirm this behavior.

Code sample:

dsid = 'LSST'
X, y, splits = get_UCR_data(dsid, split_data=False)

tfms = [None, TSClassification()]
batch_tfms = [TSStandardize(by_sample=True)]
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)

learn = ts_learner(dls, InceptionTimePlus, fc_dropout=.1, metrics=accuracy)

learn.freeze()
print("Trainable parameters frozen:\t", count_parameters(learn.model))  # outputs 457614
learn.unfreeze()
print("Trainable parameters unfrozen:\t", count_parameters(learn.model))  # outputs 457614
for p in learn.model.parameters():
  p.requires_grad=False
print("All parameter frozen manually:\t", count_parameters(learn.model))  # outputs 0
@kamisoel
Copy link
Author

kamisoel commented Jan 22, 2021

The problem seems to be in the way the parameters are structured. For fastai.vision models learn.opt.param_lists return a count of 3 lists and freeze() deactivates gradients for the first two parameter lists. In tsai models all parameters are in one list. So freeze() works on an empty list.

## Pseudocode for freeze implementation
def freeze(): 
   freeze_to(-1)
def freeze_to(n):
   for p in learn.opt.all_params(slice(None, len(learn.opt.param_lists) - n:
      p.require_grad=False

EDIT:
I think I found the source of the problem. The Learner has to create models as a Sequential (init, body, head), so the optimizer knows how to freeze init and body but not the head. At least that seems to be the case from fastai.vision.learner

@kamisoel
Copy link
Author

My workaround for now, if anyone else wants to use TSBERT / Fine-Tuning

def freeze(learn):
  assert hasattr(learn.model, "head"), f"you can only use this with models that have .head attribute"
  for p in learn.model.parameters():
    p.requires_grad=False
  for p in learn.model.head.parameters():
    p.requires_grad=True

def unfreeze(learn):
  for p in learn.model.parameters():
    p.requires_grad=True

def fine_tune(learn, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
  "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
  freeze(learn)
  learn.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
  base_lr /= 2
  unfreeze(learn)
  learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

@oguiza
Copy link
Contributor

oguiza commented Jan 23, 2021

Hi @kamisoel,
Thanks for raising this issue. It's important to fix it now that we have a way to pretrain models. I'll look into it to check what needs to be updated in tsai archs to support fine-tuning.
In the meantime, have you seen any difference in performance when using your workaround?

@oguiza oguiza added the good first issue Good for newcomers label Jan 23, 2021
@kamisoel
Copy link
Author

Hi @oguiza,
Perfect, I hope my debugging work is of use for this :)
My workaround seems to work quite fine and should have more or less the same performance, because it's pretty close to the fastai implementation. It's just less flexible in splitting the models head and body, which shouldn't be a huge problem for TSBert since it has the same restriction (only works for models with the head property)

@oguiza
Copy link
Contributor

oguiza commented Jan 27, 2021

Hi @kamisoel,
It's taken me a while, but I've already fixed this issue.
From now on, all models that have 'Plus' in their name will be able to use pre-trained weights and be fine-tuned.
Unlike vision models, where model parameters are split into 3 groups, time series models have only 2 groups of parameters (for backbone and head). Vision models are split into 3 groups as the initial layers need to be trained in many cases (especially if passing a number of filters different to 3).
Based on this I've re-run the TSBERT tutorial, and the results are practically identical. So there was no benefit in fine-tuning the model in this particular case.
It'd be good if you could test the change to make sure everything is working as expected.
Thanks again for raising this issue!

@oguiza
Copy link
Contributor

oguiza commented Feb 3, 2021

I will close this issue for lack of response. If the issue persists, please, feel free to re-open.

@oguiza oguiza closed this as completed Feb 3, 2021
@kamisoel
Copy link
Author

kamisoel commented Feb 3, 2021

Hi @oguiza
Thanks for the fast fix - and sorry for my lack of response ^^' The change seems to work just fine! 👍

Just another small request: Would it be possible to allow the use of the XCM model for pre-training as well? It already has a separate head and can be used with TSBert as well

@oguiza
Copy link
Contributor

oguiza commented Feb 4, 2021

Hi @kamisoel, I'm glad to hear the issue is now fixed.
As to your 2nd request, I've already uploaded a new XCMPlus model you can pre-train. I haven't fully tested it, but it has the same structure as the rest, so I think it should work. It's already loaded in GitHub and will create a new pip release shortly (probably later today or tomorrow).
If you try it, please let me know if it works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants