freeze() doesn't set requires_grad to False #51

kamisoel · 2021-01-22T22:51:25Z

While playing around with the TSBert notebook, I noticed that the models parameters don't seem to get frozen when loading pretrained weights. Even if you call freeze() on the Learner, the parameters do NOT get frozen! I used the count_parameters() method (which as I checked just counts the parameters with requires_grad==True) to confirm this behavior.

Code sample:

dsid = 'LSST'
X, y, splits = get_UCR_data(dsid, split_data=False)

tfms = [None, TSClassification()]
batch_tfms = [TSStandardize(by_sample=True)]
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)

learn = ts_learner(dls, InceptionTimePlus, fc_dropout=.1, metrics=accuracy)

learn.freeze()
print("Trainable parameters frozen:\t", count_parameters(learn.model))  # outputs 457614
learn.unfreeze()
print("Trainable parameters unfrozen:\t", count_parameters(learn.model))  # outputs 457614
for p in learn.model.parameters():
  p.requires_grad=False
print("All parameter frozen manually:\t", count_parameters(learn.model))  # outputs 0

The text was updated successfully, but these errors were encountered:

kamisoel · 2021-01-22T23:37:29Z

The problem seems to be in the way the parameters are structured. For fastai.vision models learn.opt.param_lists return a count of 3 lists and freeze() deactivates gradients for the first two parameter lists. In tsai models all parameters are in one list. So freeze() works on an empty list.

## Pseudocode for freeze implementation
def freeze(): 
   freeze_to(-1)
def freeze_to(n):
   for p in learn.opt.all_params(slice(None, len(learn.opt.param_lists) - n:
      p.require_grad=False

EDIT:
I think I found the source of the problem. The Learner has to create models as a Sequential (init, body, head), so the optimizer knows how to freeze init and body but not the head. At least that seems to be the case from fastai.vision.learner

kamisoel · 2021-01-23T17:01:10Z

My workaround for now, if anyone else wants to use TSBERT / Fine-Tuning

def freeze(learn):
  assert hasattr(learn.model, "head"), f"you can only use this with models that have .head attribute"
  for p in learn.model.parameters():
    p.requires_grad=False
  for p in learn.model.head.parameters():
    p.requires_grad=True

def unfreeze(learn):
  for p in learn.model.parameters():
    p.requires_grad=True

def fine_tune(learn, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
  "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
  freeze(learn)
  learn.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
  base_lr /= 2
  unfreeze(learn)
  learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

oguiza · 2021-01-23T17:19:36Z

Hi @kamisoel,
Thanks for raising this issue. It's important to fix it now that we have a way to pretrain models. I'll look into it to check what needs to be updated in tsai archs to support fine-tuning.
In the meantime, have you seen any difference in performance when using your workaround?

kamisoel · 2021-01-25T12:57:02Z

Hi @oguiza,
Perfect, I hope my debugging work is of use for this :)
My workaround seems to work quite fine and should have more or less the same performance, because it's pretty close to the fastai implementation. It's just less flexible in splitting the models head and body, which shouldn't be a huge problem for TSBert since it has the same restriction (only works for models with the head property)

oguiza · 2021-01-27T19:24:27Z

Hi @kamisoel,
It's taken me a while, but I've already fixed this issue.
From now on, all models that have 'Plus' in their name will be able to use pre-trained weights and be fine-tuned.
Unlike vision models, where model parameters are split into 3 groups, time series models have only 2 groups of parameters (for backbone and head). Vision models are split into 3 groups as the initial layers need to be trained in many cases (especially if passing a number of filters different to 3).
Based on this I've re-run the TSBERT tutorial, and the results are practically identical. So there was no benefit in fine-tuning the model in this particular case.
It'd be good if you could test the change to make sure everything is working as expected.
Thanks again for raising this issue!

oguiza · 2021-02-03T21:38:13Z

I will close this issue for lack of response. If the issue persists, please, feel free to re-open.

kamisoel · 2021-02-03T22:42:21Z

Hi @oguiza
Thanks for the fast fix - and sorry for my lack of response ^^' The change seems to work just fine! 👍

Just another small request: Would it be possible to allow the use of the XCM model for pre-training as well? It already has a separate head and can be used with TSBert as well

oguiza · 2021-02-04T09:44:28Z

Hi @kamisoel, I'm glad to hear the issue is now fixed.
As to your 2nd request, I've already uploaded a new XCMPlus model you can pre-train. I haven't fully tested it, but it has the same structure as the rest, so I think it should work. It's already loaded in GitHub and will create a new pip release shortly (probably later today or tomorrow).
If you try it, please let me know if it works well.

oguiza added the good first issue Good for newcomers label Jan 23, 2021

oguiza closed this as completed Feb 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

freeze() doesn't set requires_grad to False #51

freeze() doesn't set requires_grad to False #51

kamisoel commented Jan 22, 2021 •

edited

kamisoel commented Jan 22, 2021 •

edited

kamisoel commented Jan 23, 2021

oguiza commented Jan 23, 2021

kamisoel commented Jan 25, 2021

oguiza commented Jan 27, 2021 •

edited

oguiza commented Feb 3, 2021

kamisoel commented Feb 3, 2021 •

edited

oguiza commented Feb 4, 2021

freeze() doesn't set requires_grad to False #51

freeze() doesn't set requires_grad to False #51

Comments

kamisoel commented Jan 22, 2021 • edited

kamisoel commented Jan 22, 2021 • edited

kamisoel commented Jan 23, 2021

oguiza commented Jan 23, 2021

kamisoel commented Jan 25, 2021

oguiza commented Jan 27, 2021 • edited

oguiza commented Feb 3, 2021

kamisoel commented Feb 3, 2021 • edited

oguiza commented Feb 4, 2021

kamisoel commented Jan 22, 2021 •

edited

kamisoel commented Jan 22, 2021 •

edited

oguiza commented Jan 27, 2021 •

edited

kamisoel commented Feb 3, 2021 •

edited