Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train models from scratch #60

Open
8 of 9 tasks
zhiqwang opened this issue Feb 12, 2021 · 11 comments 路 May be fixed by #408
Open
8 of 9 tasks

Train models from scratch #60

zhiqwang opened this issue Feb 12, 2021 · 11 comments 路 May be fixed by #408
Assignees
Labels
API Library use interface enhancement New feature or request help wanted Extra attention is needed

Comments

@zhiqwang
Copy link
Owner

zhiqwang commented Feb 12, 2021

@zhiqwang zhiqwang added enhancement New feature or request help wanted Extra attention is needed API Library use interface labels Feb 12, 2021
@kartik4949
Copy link

@zhiqwang Hi, library seems good, so i was thinking to contribute and make it as flexible as possible by adding support for many other backbones, losses and FPN
and even add own architecture which is tweaked for performance purpose.
let me know if u have a slack channel or other platform to discuss above
Thanks

@zhiqwang
Copy link
Owner Author

zhiqwang commented Feb 19, 2021

Hi @kartik4949

Some modular design does require more careful consideration, we are eager for your help, and join on Slack here .

@stereomatchingkiss
Copy link
Contributor

Could I train the model by yolov5-rt with custom dataset?Or I need to train the model by yolov5 v4.0 then convert the weights by

from yolort.utils import update_module_state_from_ultralytics

# Update module state from ultralytics
model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0', custom_path_or_model = torch.load('path/to/model.pt'), num_classes = 1)
# Save updated module
torch.save(model.state_dict(), 'yolov5s_updated.pt')

Thanks

@zhiqwang
Copy link
Owner Author

zhiqwang commented Apr 18, 2021

Hi @stereomatchingkiss , Both of these are feasible, but I recommend the second approach now.

Training with yolort is now in the experimental phase, you can check the following for more details.

https://github.com/zhiqwang/yolov5-rt-stack/blob/2125d06f8cf8726401211a152890e46e3b3416e6/test/test_engine.py#L101-L110

@zhiqwang
Copy link
Owner Author

FYI I aim to release a version that supports training before 7th May, I guess that it will not train as well as ultralytics, but it will be more friendly 馃槃

@Tomakko
Copy link
Contributor

Tomakko commented Jun 4, 2021

Hi @zhiqwang , thanks for your awesome repo! Do you have any news on the training release? I started from your codebase to implement training myself. It is working fine now, i.e. i can run training steps, however i am running into one issue.

When i apply default_train_transforms in your data modules. It happends that after transforming, there are no targets left, probably because they lie outside of the crop.

Can you give me some hints how to deal best with empty targets in box_head.py? Particularily in those functions:

targets_cls, targets_box, indices, anchors = self.select_training_samples(head_outputs, targets)
losses = self.compute_loss(head_outputs, targets_cls, targets_box, indices, anchors)

Thanks a lot in adavance!

@zhiqwang
Copy link
Owner Author

zhiqwang commented Jun 5, 2021

Hi @Tomakko

Thanks for your carefully debug information, I guess it is due to the poorly implementation of the data augmentation, as you mentioned, the default_train_transforms in

https://github.com/zhiqwang/yolov5-rt-stack/blob/b0af4a1b17805543f415df705deb66f398b10170/yolort/data/data_module.py#L81-L92

will filter most targets.

I think we should fix this augmentation to make sure there are at least one targets left when the losses are computed in

https://github.com/zhiqwang/yolov5-rt-stack/blob/b0af4a1b17805543f415df705deb66f398b10170/yolort/models/box_head.py#L129-L130

Do you have any news on the training release?

My next plan is to learn from the realization of data augmentation in torchvision, they recently upload the augmentation methods when they are training the SSD models, we can borrow some of their codes here to make the augmentation acceptable.

Your feedback is very important to me, and feel free to file new issues about the trainer here, and let's train a good model together. 馃殌

@Tomakko
Copy link
Contributor

Tomakko commented Jun 11, 2021

Thanks you @zhiqwang! I currently need to relalize an embedded yolo model in the short term and therefore do training with ultralytics, but afterwards i would be willing to contribute here. The training pipeline in ultralytics is just super cumbersome ;)

@denguir
Copy link

denguir commented Jun 16, 2022

Hi @zhiqwang, thanks for the awesome work !
I was wondering how to load a pretrained model if the number of classes differs from the default, something like that:

from yolort.models import yolov5s
model = yolov5s(pretrained=True, score_thresh=0.45, num_classes=5)

This piece of code throws the following error due to dimension mismatch:

RuntimeError: Error(s) in loading state_dict for YOLO:
	size mismatch for head.head.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 128, 1, 1]).
	size mismatch for head.head.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).
	size mismatch for head.head.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 256, 1, 1]).
	size mismatch for head.head.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).
	size mismatch for head.head.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 512, 1, 1]).
	size mismatch for head.head.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).

As we can see, only the weights & biases of head.head are mismatching, and I think that the formula to get that first dimension is (num_classes + 5) * 3.

Is there any function/method that I'm not aware of that would allow us to match these dimensions, some method that would work like that (if integrated in the YOLO class):

def load_state_dict(self, state_dict, num_classes):
        weights_to_skip = [f"head.head.{i}.weight" for i in range(3)]
        bias_to_skip = [f"head.head.{i}.bias" for i in range(3)]
        for weight in weights_to_skip + bias_to_skip:
            state_dict[weight] = state_dict[weight][:(num_classes + 5) * 3, ...]
        super().load_state_dict(state_dict)

Currently the only way I found to load a YOLO model that has a different number of classes is to use the load_from_yolov5 method which requires us to already have a checkpoint model.

@zhiqwang
Copy link
Owner Author

zhiqwang commented Jun 16, 2022

Hi @denguir , Thanks for asking this questions first.

Is there any function/method that I'm not aware of that would allow us to match these dimensions.

We don't currently offer a solution to deal with this problem. But I guess you can load only the backbone parts to partially solve the problem. (I modified the snippets from https://discuss.pytorch.org/t/how-to-load-part-of-pre-trained-model/1113/3)

from yolort.models import yolov5s
from yolort.utils import load_state_dict_from_url

model = yolov5s(pretrained=False, score_thresh=0.45, num_classes=5)

checkpoint_path = "/home/user/.cache/torch/hub/checkpoints/yolov5_darknet_pan_s_r60_coco-9f44bf3f.pt"
pretrained_dict = load_state_dict_from_url(checkpoint_path)

# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if "backbone" in k}
# 2. load the filted state dict
model.model.load_state_dict(pretrained_dict, strict=False)

BTW, The training mechanism of yolort is still not well developed and any kind of contribution is welcome here.

@denguir
Copy link

denguir commented Jun 16, 2022

Thanks @zhiqwang, I will definitely explore further the training process of yolort and I will try to help there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Library use interface enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants