New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix overrides
training cfg bug
#10002
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10002 +/- ##
==========================================
- Coverage 75.99% 75.19% -0.81%
==========================================
Files 121 121
Lines 15332 15332
==========================================
- Hits 11652 11529 -123
- Misses 3680 3803 +123
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@DseidLi thanks for the PR! Yes I understand your concern here. Will take a look. @Laughing-q what do you think here? |
Thank you for reviewing my PR! Perhaps my previous explanation was too detailed. I will summarize the purpose of this PR in one sentence: To help simplify understanding, I will give a specific example below to reproduce this bug: model = YOLO('models/pretrained/yolov8s-cls.pt')
config = {
"task": args.task,
"mode": args.mode,
"amp": args.amp,
"hsv_h": args.hsv_h,
"hsv_s": args.hsv_s,
"hsv_v": args.hsv_v,
"degrees": args.degrees,
"translate": args.translate,
"scale": args.scale,
"shear": args.shear,
"perspective": args.perspective,
"flipud": args.flipud,
"fliplr": args.fliplr,
"mosaic": args.mosaic,
"mixup": args.mixup,
"copy_paste": args.copy_paste,
}
new_config_path = create_new_config_file(args.config_dir, config)
model.train(
data=args.dataset_dir,
epochs=1000,
batch=256,
device=[0, 1, 2, 3], # This ensures multi-GPU training which is necessary to reproduce this bug
imgsz=224,
cache='disk',
val=False,
workers=5,
cfg=new_config_path,
name=args.timestamp,
) As you can see, this uses multi-GPU training with a default DDP (Distributed Data Parallel) strategy. Initially, I instantiated the model object as follows: model = YOLO('models/pretrained/yolov8s-cls.pt') Therefore, I am certain of my model configuration when passing
By the way, after my testing, this bug seems to only appear in multi-GPU training, not single-GPU. The results of single-GPU training are normal and do not require my PR to fix it. |
Hey there! 👋 First off, huge thanks for making your PR explanation even clearer. You're definitely on point with the scenario you've described. It sounds like the crux of the issue is how the Your example does a great job highlighting the subtle yet impactful bug. 🐛 It's clear how this situation could trip up the model configuration under multi-GPU training scenarios. We’ll take a closer look into this, considering the multi-GPU context you’ve pointed out. Your effort to bring this to attention and provide a detailed explanation is much appreciated! Let's work together to streamline this part of the training process. 🚀 |
@glenn-jocher @DseidLi Yeah I think the model created from ultralytics/ultralytics/engine/model.py Line 645 in d608565
Meanwhile I found another issue when putting data into cfg (which could also happen in user cases), and it turns out that the would be overwritten by custom and the training would start from coco8.yaml by default:ultralytics/ultralytics/engine/model.py Lines 644 to 645 in d608565
Reproduce the data overwritten issue:
from ultralytics.utils import DEFAULT_CFG_PATH, yaml_save, yaml_load
from ultralytics import YOLO
cfg = yaml_load(DEFAULT_CFG_PATH)
cfg["data"] = "coco128.yaml"
yaml_save("test_cfg.yaml", cfg)
model = YOLO("yolov8n.pt")
model.train(cfg="./test_cfg.yaml") I expected the training to start from |
@glenn-jocher @DseidLi I made a update to handle the priority issue of |
Hi @Laughing-q I've tested the updates and both issues are resolved effectively. Everything looks good to me. Great work on addressing these bugs! |
@DseidLi @Laughing-q thanks for contributions guys, but this is not that simple. We can NOT change cfg handling for a single method, there are many model methods (predict, val, train, export, track, etc) that all currently handle the various levels of arguments in the same indentical way, and this PR is breaking that consistency: args = {**overrides, **custom, **kwargs, "mode": "train"} # highest priority args on the right |
@glenn-jocher yep I saw that, but figured that ultralytics/ultralytics/engine/model.py Line 643 in d608565
EDIT: We can revert the |
@Laughing-q @DseidLi we have an existing argument called custom = {"data": DEFAULT_CFG_DICT["data"] or TASK2DATA[self.task], ... anything else here} # method defaults overrides = yaml_load(checks.check_yaml(kwargs["cfg"])) if kwargs.get("cfg") else self.overrides
custom = {"data": DEFAULT_CFG_DICT["data"] or TASK2DATA[self.task]} # method defaults
args = {**overrides, **custom, **kwargs, "mode": "train"} # highest priority args on the right You can add any keys you want to the - custom = {"data": DEFAULT_CFG_DICT["data"] or TASK2DATA[self.task]} # method defaults
+ custom = {"data": DEFAULT_CFG_DICT["data"] or TASK2DATA[self.task], "model": self.overrides["model"], "task": self.task} |
@glenn-jocher yeah you are right! we should move overrides["data"] = overrides.get("data") or DEFAULT_CFG_DICT["data"] or TASK2DATA[self.task] |
@glenn-jocher @DseidLi This 0637bfb should be better. :) EDIT: I just forgot about the |
@Laughing-q hey that is a lot better!! |
@Laughing-q @DseidLi PR merged, nice work guys!! @DseidLi let us know if you spot any other problems :) |
Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: Ultralytics AI Assistant <135830346+UltralyticsAssistant@users.noreply.github.com> Co-authored-by: Laughing-q <1185102784@qq.com>
Pull Request Overview
This PR addresses a bug related to configuration overrides in the training code of the YOLO model within the
ultralytics
framework. The issue was identified during routine training operations and stems from the improper handling of theoverrides
dictionary, which could potentially lead to missing critical model parameters during training sessions.Background
While working with the YOLO model, instantiated from the
ultralytics.engine.model
class, I observed that model parameters could be lost during the configuration merging process. This realization came about during a deep dive into how theultralytics.engine.model
module initializes and loads models.Issue Description
Here is a succinct breakdown of the problematic behavior observed:
Model Initialization:
model = YOLO('models/pretrained/yolov8s-cls.pt')
, which relies on the underlying mechanism provided byultralytics.engine.model.py
..yaml
or.yml
or proceeds to load an existing model with methods like_load
.Model Configuration Handling:
_load
),overrides
which include important model parameters like model path and task, are set.cfg
parameter), the existingoverrides
dictionary is completely replaced by the new configuration without merging the important existing parameters.Consequences:
model
) and the task type (task
), which are vital for the model’s training operations.Solution
The proposed fix involves modifying the configuration merging process to ensure that any new configurations are merged with existing ones rather than replacing them outright. This preserves all necessary parameters and ensures stable and reliable model training operations.
Technical Details
In this code, I first load the model using
model = YOLO('models/pretrained/yolov8s-cls.pt')
. At this point, themodel
object is actually a classYOLO(Model)
, which inherits fromfrom ultralytics.engine.model import Model
.Then, I explored the
ultralytics.engine.model
, which certainly requires initialization. I found that it initializes here inultralytics/engine/model.py
with the following content:Specifically, this line:
self._load(model, task=task)
.The initialization process of the
load
method is detailed inultralytics/engine/model.py
as follows:weights
here is actully something likemodels/pretrained/yolov8s-cls.pt
As you can see, this process updates the
overrides
dictionary and includes themodel
parameter, which is actually the pathmodels/pretrained/yolov8s-cls.pt
frommodel = YOLO('models/pretrained/yolov8s-cls.pt')
. However, following the original code inultralytics/engine/model.py
:Replacing it directly like this would lead to the loss of parameters, specifically the recently created
self.overrides["model"]
andself.overrides["task"]
. This would cause issues when calling the trainer later on:This results in the loss of the model's name, ultimately causing an issue where
str(self.model)
unexpectedly becomesNone
. Therefore, I have fixed this bug.ultralytics/models/yolo/classify/train.py
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced configuration handling in training method.
📊 Key Changes
data
,model
, andtask
configuration parameters, ensuring that user-defined settings (cfg
) take precedence if provided.🎯 Purpose & Impact