Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer Learning with Frozen Layers #1314

Open
Tracked by #22
glenn-jocher opened this issue Nov 6, 2020 · 54 comments
Open
Tracked by #22

Transfer Learning with Frozen Layers #1314

glenn-jocher opened this issue Nov 6, 2020 · 54 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 6, 2020

📚 This guide explains how to freeze YOLOv5 🚀 layers when transfer learning. Transfer learning is a useful way to quickly retrain a model on new data without having to retrain the entire network. Instead, part of the initial weights are frozen in place, and the rest of the weights are used to compute loss and are updated by the optimizer. This requires less resources than normal training and allows for faster training times, though it may also results in reductions to final trained accuracy. UPDATED 28 March 2023.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Freeze Backbone

All layers that match the freeze list in train.py will be frozen by setting their gradients to zero before training starts.

yolov5/train.py

Lines 119 to 126 in 771ac6c

# Freeze
freeze = [f'model.{x}.' for x in range(freeze)] # layers to freeze
for k, v in model.named_parameters():
v.requires_grad = True # train all layers
if any(x in k for x in freeze):
print(f'freezing {k}')
v.requires_grad = False

To see a list of module names:

for k, v in model.named_parameters():
    print(k)

# Output
model.0.conv.conv.weight
model.0.conv.bn.weight
model.0.conv.bn.bias
model.1.conv.weight
model.1.bn.weight
model.1.bn.bias
model.2.cv1.conv.weight
model.2.cv1.bn.weight
...
model.23.m.0.cv2.bn.weight
model.23.m.0.cv2.bn.bias
model.24.m.0.weight
model.24.m.0.bias
model.24.m.1.weight
model.24.m.1.bias
model.24.m.2.weight
model.24.m.2.bias

Looking at the model architecture we can see that the model backbone is layers 0-9:

# YOLOv5 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]], # 9
]
# YOLOv5 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, BottleneckCSP, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

so we can define the freeze list to contain all modules with 'model.0.' - 'model.9.' in their names:

python train.py --freeze 10

Freeze All Layers

To freeze the full model except for the final output convolution layers in Detect(), we set freeze list to contain all modules with 'model.0.' - 'model.23.' in their names:

python train.py --freeze 24

Results

We train YOLOv5m on VOC on both of the above scenarios, along with a default model (no freezing), starting from the official COCO pretrained --weights yolov5m.pt:

$ train.py --batch 48 --weights yolov5m.pt --data voc.yaml --epochs 50 --cache --img 512 --hyp hyp.finetune.yaml

Accuracy Comparison

The results show that freezing speeds up training, but reduces final accuracy slightly.

Screenshot 2020-11-06 at 18 08 13

GPU Utilization Comparison

Interestingly, the more modules are frozen the less GPU memory is required to train, and the lower GPU utilization. This indicates that larger models, or models trained at larger --image-size may benefit from freezing in order to train faster.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher glenn-jocher added enhancement New feature or request documentation Improvements or additions to documentation labels Nov 6, 2020
@glenn-jocher glenn-jocher self-assigned this Nov 6, 2020
@omobayode1
Copy link

I noticed that there's an argument in yolov3 train.py code "--freeze-layer"?

Please, what does it do?

It states that it freezes all non-output layer?

Please can you provide more clarification about this?

Thank you.

Omobayode

@mphillips-valleyit
Copy link

@glenn-jocher Another dimension to this is generalization. I assume your results are shown for a test dataset. But for generalization to new datasets, freezing might also help prevent overfitting to ttraining data (and therefore improve robustness/generalization).

@glenn-jocher
Copy link
Member Author

@mphillips-valleyit interesting point, though hard to quantify beyond existing val/test metrics.

@mphillips-valleyit
Copy link

It could be done with separate datasets--models pretrained on COCO, measure their generalization (freezing vs. non-freezing fine-tuning) to OpenImages for common categories. If I'm able to post results on this at some point, I will.

@nanhui69
Copy link

nanhui69 commented Dec 7, 2020

@mphillips-valleyit how is your custom data training result by transfer lerning ? could you attach you train log here?

@ramonhollands
Copy link
Contributor

@glenn-jocher Might be interesting to do a final step by unfreezing and training the complete netwerk again with differentiated learning rate. So complete training process would be (default method in fast.ai):

  1. Freeze the backbone
  2. (optional reset the head weights)
  3. Train the head for a while
  4. Unfreeze the complete network
  5. Train the complete network with lower learning rate for backbone

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Dec 13, 2020

@ramonhollands that's an interesting idea, though I'm sure the devil is in the details, such as the epochs you take these actions at, the LRs used, dataset and model etc. I don't have time to investigate further, but you should be able to reproduce the above tutorial and apply the extra steps you propose to quantify differences. If you do please share your results with us.

One point to mention is that classification and detection may not share a common set of optimal training steps, so what works for fast.ai may not correlate perfectly to detection architectures like YOLO. Would be very interested to see experimental results.

@ramonhollands
Copy link
Contributor

Ill take that challenge the coming weeks. Trying to wrap your amazing work in the fast.ai framework to be able to use best of both worlds, including the fastai learning rate finder and discriminate learning rates etc. The method should work for detection architectures as well (https://www.youtube.com/watch?v=0frKXR-2PBY). Ill keep you updated.

@aritzLizoain
Copy link

I trained a model with an online dataset containing 5 categories, and now I'm trying to fine-tune it with my own images, which contain the same 5 categories plus an additional one. My images are similar to the ones from the online dataset, so I thought that transfer learning would work. However, this is what I obtain while fine-tuning:

Class = all, Images = 3, Targets = 0, P = 0, R = 0, mAP@.5 = 0, mAP@.5:.95 = 0

When I visualize the labels everything looks correct, so I don't understand is why Targets is 0. I also modified the dataset configuration yaml file adding the new category.

The fine-tuning works when I remove my additional category and fine-tune with the same 5 categories.

Does anybody know what am I doing wrong here?

Thanks in advance!

@yushuinanrong
Copy link

@aritzLizoain
I have done something similar. In my case, I created a dataset that borrows certain classes from COCO and OpenImages. Then I fine tuned a pretrained yolov5 (trained on COCO) model on my custom dataset. The performance of the fine-tuned model isn't good.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Dec 15, 2020

@ramonhollands LR finder sounds very cool, but be careful because sometimes LRs that work well for training can cause instabilities without a warmup to ramp the LR from 0 to it's initial value.

@aritzLizoain no targets found during testing means no labels are found for your images. Follow the Custom training tutorial to create a custom dataset:
https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

@Misterion777
Copy link

Misterion777 commented Dec 15, 2020

While using Transfer Learning (both with layers freeze and without), it happens that model "forgets" data it was trained on (metrics on original data are getting worse). So, I think problem might be in too large learning rate. Can you please give a little bit more details on what hyperparameters should be changed when finetuning the model? (maybe change lr0 to the last lr that was during original training and removing warmup epochs, or is it a wrong approach?)

@tusharnitharwal
Copy link

@glenn-jocher Might be interesting to do a final step by unfreezing and training the complete netwerk again with differentiated learning rate. So complete training process would be (default method in fast.ai):

  1. Freeze the backbone
  2. (optional reset the head weights)
  3. Train the head for a while
  4. Unfreeze the complete network
  5. Train the complete network with lower learning rate for backbone

How do you set a different learning rate for the backbone?

@ramonhollands
Copy link
Contributor

You have to split the backbone and head parameters and add additional param groups for both with different 'lr' argument (https://pytorch.org/docs/stable/optim.html). I wrote some initial code which Ill post later today.

@ramonhollands
Copy link
Contributor

See https://github.com/ramonhollands/different_learning_rates/blob/master/train.py

I have added two parameters to experiment with:

  • freeze-backone (which freezes backbone on start and unfreezes after 4 epoch
  • diff-backbone (which lowers the learning rate for backbone, divided by 10)

I started some experiments which where encouraging but did not have enough time to finish up yet.

@laisimiao
Copy link

@glenn-jocher I am curious about result pics in "Accuracy Comparison", why can the mAP of exp9_freeze_all increase as training progresses? Now that all params are frozed, they won't be optimized and performance should be a flat line?

@glenn-jocher
Copy link
Member Author

@laisimiao exp9_freeze_all freezes all layer except output layer, which has an active gradient.

@glenn-jocher glenn-jocher mentioned this issue Nov 6, 2022
1 task
@feimadada
Copy link

image
Hi, I want to know what hardware and pytorch version do you use in this experiment?

@simplecoderx
Copy link

@glenn-jocher sir, when im doing my transfer learning in yolov5, the electricity cuts off and the training stops, my question is, can i still continue the training? can i use --resume? how to continue interrupted transfer learning?

1 similar comment
@simplecoderx
Copy link

@glenn-jocher sir, when im doing my transfer learning in yolov5, the electricity cuts off and the training stops, my question is, can i still continue the training? can i use --resume? how to continue interrupted transfer learning?

@Mr-ind1fferent
Copy link

Ill take that challenge the coming weeks. Trying to wrap your amazing work in the fast.ai framework to be able to use best of both worlds, including the fastai learning rate finder and discriminate learning rates etc. The method should work for detection architectures as well (https://www.youtube.com/watch?v=0frKXR-2PBY). Ill keep you updated.

so have you figure out how to unfreeze backbone after epoch?

@KhaKimThuy
Copy link

I just want to detect person and motorcycle classes, how can I reduce the number of parameters?
image

@bryanbocao
Copy link

bryanbocao commented Feb 1, 2023

I just want to detect person and motorcycle classes, how can I reduce the number of parameters? image

The most straightforward way would be to use a smaller existing model of nano than small in the Terminal output you showed YOLOv5s:
--cfg yolov5n.yaml

You can further set the numbers to be smaller if you want a smaller model than the nano one:

depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple

https://github.com/ultralytics/yolov5/blob/master/models/yolov5n.yaml#L5-L6

@glenn-jocher
Copy link
Member Author

@bryanbocao hi there! It's great that you're looking to optimize the model for your specific use case. To reduce the number of parameters, you can consider using a smaller existing model such as nano instead of small in the YOLOv5 model configuration. Additionally, you can adjust the depth_multiple and width_multiple parameters in the configuration file to further reduce the size of the model. You can find these settings in the yolov5n.yaml file at lines 5-6. I hope this helps!

@bryanbocao
Copy link

bryanbocao commented Nov 15, 2023

@glenn-jocher Thanks for your reply!

Additionally, you can adjust the depth_multiple and width_multiple parameters in the configuration file to further reduce the size of the model.

That's what I did eventually :)

@glenn-jocher
Copy link
Member Author

@bryanbocao you're welcome! Great to hear that you found a solution by adjusting the depth_multiple and width_multiple parameters. If you have any more questions or need further assistance, feel free to ask. Good luck with your YOLOv5 project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests