Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter Tuning error #7995

Closed
1 task done
xaiopi opened this issue Feb 3, 2024 · 10 comments
Closed
1 task done

Hyperparameter Tuning error #7995

xaiopi opened this issue Feb 3, 2024 · 10 comments
Labels
question Further information is requested Stale

Comments

@xaiopi
Copy link

xaiopi commented Feb 3, 2024

Search before asking

Question

After referring to the doc documentation, I tried to do hyperparameter tuning, and when using ray.tun( ), each iteration always reported an error that the training failed, causing the hyperparameter search to fail.

#code:
`from ultralytics import YOLO

model = YOLO('C://Users//13229//Desktop//ultralytics-main//yolov8n.pt')

model.tune(data='coco8.yaml', epochs=30, iterations=300, optimizer='AdamW', plots=False, save=False, val=False)`

#error output:
`Tuner: Initialized Tuner instance with 'tune_dir=runs\detect\tune3'
Tuner: 💡 Learn about tuning at https://docs.ultralytics.com/guides/hyperparameter-tuning
Tuner: Starting iteration 1/300 with hyperparameters: {'lr0': 0.01, 'lrf': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'box': 7.5, 'cls': 0.5, 'dfl': 1.5, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0, 'copy_paste': 0.0}
WARNING ❌️ training failure for hyperparameter tuning iteration 1
Command '['yolo', 'train', 'task=detect', 'mode=train', 'model=C://Users//13229//Desktop//ultralytics-main//yolov8n.pt', 'data=coco8.yaml', 'epochs=30', 'time=None', 'patience=50', 'batch=16', 'imgsz=640', 'save=False', 'save_period=-1', 'cache=False', 'device=None', 'workers=8', 'project=None', 'name=None', 'exist_ok=False', 'pretrained=True', 'optimizer=AdamW', 'verbose=True', 'seed=0', 'deterministic=True', 'single_cls=False', 'rect=False', 'cos_lr=False', 'close_mosaic=10', 'resume=False', 'amp=True', 'fraction=1.0', 'profile=False', 'freeze=None', 'multi_scale=False', 'overlap_mask=True', 'mask_ratio=4', 'dropout=0.0', 'val=False', 'split=val', 'save_json=False', 'save_hybrid=False', 'conf=None', 'iou=0.7', 'max_det=300', 'half=False', 'dnn=False', 'plots=False', 'source=None', 'vid_stride=1', 'stream_buffer=False', 'visualize=False', 'augment=False', 'agnostic_nms=False', 'classes=None', 'retina_masks=False', 'embed=None', 'show=False', 'save_frames=False', 'save_txt=False', 'save_conf=False', 'save_crop=False', 'show_labels=True', 'show_conf=True', 'show_boxes=True', 'line_width=None', 'format=torchscript', 'keras=False', 'optimize=False', 'int8=False', 'dynamic=False', 'simplify=False', 'opset=None', 'workspace=4', 'nms=False', 'lr0=0.01', 'lrf=0.01', 'momentum=0.937', 'weight_decay=0.0005', 'warmup_epochs=3.0', 'warmup_momentum=0.8', 'warmup_bias_lr=0.1', 'box=7.5', 'cls=0.5', 'dfl=1.5', 'pose=12.0', 'kobj=1.0', 'label_smoothing=0.0', 'nbs=64', 'hsv_h=0.015', 'hsv_s=0.7', 'hsv_v=0.4', 'degrees=0.0', 'translate=0.1', 'scale=0.5', 'shear=0.0', 'perspective=0.0', 'flipud=0.0', 'fliplr=0.5', 'mosaic=1.0', 'mixup=0.0', 'copy_paste=0.0', 'auto_augment=randaugment', 'erasing=0.4', 'crop_fraction=1.0', 'cfg=None', 'tracker=botsort.yaml']' returned non-zero exit status 1.
Saved runs\detect\tune3\tune_scatter_plots.png
Saved runs\detect\tune3\tune_fitness.png

Tuner: 1/300 iterations complete ✅ (121.81s)
Tuner: Results saved to runs\detect\tune3
Tuner: Best fitness=0.0 observed at iteration 1
Tuner: Best fitness metrics are {}
Tuner: Best fitness model is runs\detect\train13
Tuner: Best fitness hyperparameters are printed below.

Printing 'runs\detect\tune3\best_hyperparameters.yaml'`

as you see nothing get changed and the hyperparameters is still the default

Additional

No response

@xaiopi xaiopi added the question Further information is requested label Feb 3, 2024
Copy link

github-actions bot commented Feb 3, 2024

👋 Hello @xaiopi, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@xaiopi hello! It seems like you're encountering an issue with hyperparameter tuning where the training fails during the first iteration. This could be due to a variety of reasons, such as incorrect file paths, insufficient resources, or incompatible hyperparameter values.

To troubleshoot this, please ensure that:

  • Your dataset path and format are correct.
  • You have the necessary computational resources available.
  • The initial hyperparameters are within a reasonable range.

If the problem persists, consider running a single training session with the default hyperparameters to verify that your setup works outside of the tuning context. Also, review the error logs for any specific messages that could point to the cause of the failure.

For further guidance, you can refer to our documentation on hyperparameter tuning. If you need more personalized assistance, feel free to open an issue in the Ultralytics YOLOv8 repo with detailed error logs and system information. Our community is here to help! 🚀

Copy link

github-actions bot commented Mar 5, 2024

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Mar 5, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2024
@xaiopi
Copy link
Author

xaiopi commented Apr 7, 2024

@xaiopi hello! It seems like you're encountering an issue with hyperparameter tuning where the training fails during the first iteration. This could be due to a variety of reasons, such as incorrect file paths, insufficient resources, or incompatible hyperparameter values.

To troubleshoot this, please ensure that:

  • Your dataset path and format are correct.
  • You have the necessary computational resources available.
  • The initial hyperparameters are within a reasonable range.

If the problem persists, consider running a single training session with the default hyperparameters to verify that your setup works outside of the tuning context. Also, review the error logs for any specific messages that could point to the cause of the failure.

For further guidance, you can refer to our documentation on hyperparameter tuning. If you need more personalized assistance, feel free to open an issue in the Ultralytics YOLOv8 repo with detailed error logs and system information. Our community is here to help! 🚀

@xaiopi hello! It seems like you're encountering an issue with hyperparameter tuning where the training fails during the first iteration. This could be due to a variety of reasons, such as incorrect file paths, insufficient resources, or incompatible hyperparameter values.

To troubleshoot this, please ensure that:

  • Your dataset path and format are correct.
  • You have the necessary computational resources available.
  • The initial hyperparameters are within a reasonable range.

If the problem persists, consider running a single training session with the default hyperparameters to verify that your setup works outside of the tuning context. Also, review the error logs for any specific messages that could point to the cause of the failure.

For further guidance, you can refer to our documentation on hyperparameter tuning. If you need more personalized assistance, feel free to open an issue in the Ultralytics YOLOv8 repo with detailed error logs and system information. Our community is here to help! 🚀

I have recheak these.

But a new issue occur.
When I was using the example code in the official documentation, although everything went smoothly and there were no errors, the fitness obtained for each iteration was 0, as shown below. There were no errors, metrics/precision (B) ': 0.0,' metrics/recall (B) ': 0.0,' metrics/mAP50 (B) ': 0.0,' metrics/mAP50-95 (B) ': 0.0,' val/box_loss': 3.10227, 'val/cls loss': 68.1803,' val/dfl loss': 4.03045, 'fit. ess': 0.0}

My code is right here

`from ultralytics import YOLO

Initialize the YOLO model

model = YOLO('yolov8n.pt')

Tune hyperparameters on COCO8 for 30 epochs

model.tune(data='coco8.yaml', epochs=30, iterations=30, workers=4, optimizer='AdamW', plots=False, save=False, val=False)`

The result is right here

`Tuner: 1/30 iterations complete ✅ (42.44s)
Tuner: Results saved to runs/detect/tune9
Tuner: Best fitness=0.0 observed at iteration 1
Tuner: Best fitness metrics are {'metrics/precision(B)': 0.0, 'metrics/recall(B)': 0.0, 'metrics/mAP50(B)': 0.0, 'metrics/mAP50-95(B)': 0.0, 'val/box_loss': 3.10227, 'val/cls_loss': 68.1803, 'val/dfl_loss': 4.03045, 'fitness': 0.0}
Tuner: Best fitness model is runs/detect/train74
Tuner: Best fitness hyperparameters are printed below.

Printing 'runs/detect/tune9/best_hyperparameters.yaml'`

Please help me,I get no idea about what is going wrong

Best wishes

@glenn-jocher
Copy link
Member

Hey @xaiopi! 🌟 It sounds like your tuning process runs smoothly but you're seeing 0's across your fitness metrics. This is unusual and can sometimes result from the model not learning effectively from the training data. A few things you might want to check or try include:

  1. Validation Set: Ensure coco8.yaml points to the correct dataset and that your validation set is properly set up and not empty.
  2. Learning Rate: Starting with a very small or very large learning rate might lead to poor training. Experiment with adjusting lr0 in your tune() function.
  3. Data Augmentation: Overly aggressive augmentation can make the task too difficult, especially on a small dataset like COCO8. Consider scaling back.
  4. Run a Baseline: Try running a standard training session without tuning to ensure the model learns something with default hyperparameters.
model.train(data='coco8.yaml', epochs=30)

If you're still getting zeros for fitness, it might help to look more closely at the training logs for any warnings or errors that might clue you in on what's happening. Also, check that your COCO8 dataset is correctly formatted and accessible to the model during training.

Feel free to share more details or logs if you need further assistance. The Ultralytics community is here to support you! 🚀

@xaiopi
Copy link
Author

xaiopi commented Apr 8, 2024

Hey @xaiopi! 🌟 It sounds like your tuning process runs smoothly but you're seeing 0's across your fitness metrics. This is unusual and can sometimes result from the model not learning effectively from the training data. A few things you might want to check or try include:

  1. Validation Set: Ensure coco8.yaml points to the correct dataset and that your validation set is properly set up and not empty.
  2. Learning Rate: Starting with a very small or very large learning rate might lead to poor training. Experiment with adjusting lr0 in your tune() function.
  3. Data Augmentation: Overly aggressive augmentation can make the task too difficult, especially on a small dataset like COCO8. Consider scaling back.
  4. Run a Baseline: Try running a standard training session without tuning to ensure the model learns something with default hyperparameters.
model.train(data='coco8.yaml', epochs=30)

If you're still getting zeros for fitness, it might help to look more closely at the training logs for any warnings or errors that might clue you in on what's happening. Also, check that your COCO8 dataset is correctly formatted and accessible to the model during training.

Feel free to share more details or logs if you need further assistance. The Ultralytics community is here to support you! 🚀

I've tried training the model directly with model.train(data='coco8.yaml', epochs=30) and the result is normal, and the metric is displayed normally, but using model.tun still doesn't display the metric properly. So I'm going to use GA hyperparameter search myself, but I'm currently having a hard time because I don't know how to load hyperparameters when training a model with python, so I only know how to load hyperparameters using the terminal yolo cfg=exp.yaml

@glenn-jocher
Copy link
Member

Hey @xaiopi! Glad to hear that direct training works well for you. 🎉 If you're looking into using GA for hyperparameter tuning and prefer to set up everything in Python rather than the terminal, you can load hyperparameters directly into your model initialization or training function like this:

from ultralytics import YOLO

# Load model with custom hyperparameters
model = YOLO('yolov8n.pt', cfg='path/to/your/exp.yaml')

# Or, directly pass hyperparameters during training
model.train(data='coco8.yaml', epochs=30, **{'lr0': 0.01, 'momentum': 0.937})

This way, you can adjust hyperparameters programmatically. Remember, the ** notation is used to unpack the dictionary directly into the function's arguments.

Keep experimenting and feel free to reach back if you need more help! 🚀

@xaiopi
Copy link
Author

xaiopi commented Apr 16, 2024

# Or, directly pass hyperparameters during training
model.train(data='coco8.yaml', epochs=30, **{'lr0': 0.01, 'momentum': 0.937})

Thank you for your patient reply,

Load model with custom hyperparameters

Model=YOLO ('yolov8n. pt ', cfg='path/to/your/exp. yaml') '
This command is not working properly, but it is currently being used
#Or, directly pass hyperparameters during training
Model. train (data='coco8. yaml ', epochs=30, * * {'r0': 0.01, 'moment': 0.937})
It can work normally.
In addition, my initial issue has been resolved, and the official GA hyperparameter search can now run. Just replace optimizer with auto, as shown in the following line

Model. tune (data='coco8. yaml ', epochs=30, iterations=30, workers=4, optimizer='auto', plots=False, save=False, val=False)`

I want to know how to define the search range of hyperparameters when using 'model. tune', how to specify how to adjust some values while fixing others, and which hyperparameters are commonly tuned.
Thanks a lot.
Best Wishes!

@xaiopi
Copy link
Author

xaiopi commented Apr 16, 2024

Hey @xaiopi! Glad to hear that direct training works well for you. 🎉 If you're looking into using GA for hyperparameter tuning and prefer to set up everything in Python rather than the terminal, you can load hyperparameters directly into your model initialization or training function like this:

from ultralytics import YOLO

# Load model with custom hyperparameters
model = YOLO('yolov8n.pt', cfg='path/to/your/exp.yaml')

# Or, directly pass hyperparameters during training
model.train(data='coco8.yaml', epochs=30, **{'lr0': 0.01, 'momentum': 0.937})

This way, you can adjust hyperparameters programmatically. Remember, the ** notation is used to unpack the dictionary directly into the function's arguments.

Keep experimenting and feel free to reach back if you need more help! 🚀

At the same time, I am still confused about the definition of fitness. The ultra-lytics/utils/metrics. py file does not specify the definition of fitness for masks and boxes in segment training. Therefore, how does this file evaluate the fitness of segment tasks? Thank you very much

@glenn-jocher
Copy link
Member

Hey there! 🌟 I'm thrilled that direct training is now functioning smoothly for you! Regarding the model.tune() and specifying hyperparameter search ranges or fixing certain parameters, currently in YOLOv8, model.tune() automatically selects hyperparameters based on predefined strategies when you set optimizer='auto'. Customizing specific hyperparameters to be tuned or fixed isn't directly supported through a simple API call but can be achieved through manual adjustment or implementation.

For commonly tuned hyperparameters, focusing on learning rate (lr0), momentum, and weight decay can significantly impact model performance. The optimizer type can also play a key role.

Concerning your inquiry about fitness in segmentation tasks, the concept of fitness generally combines several metrics to form a composite score reflecting the model's overall performance. While metrics.py may not explicitly define fitness for segmentations like it does for detection, the evaluation often revolves around metrics such as Intersection over Union (IoU), binary cross-entropy for the masks, and possibly additional task-specific metrics.

In YOLOv8, similarly to detection, evaluations for segmentation tasks would rely on the model's ability to accurately predict the segment masks compared to the ground truth, with fitness potentially being an aggregate measure reflecting these accuracies.

Keep exploring and refining your models! If more specific customization or clarification is needed, our team at Ultralytics is always here to dive deeper! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants