Multiprocessing problem when training YOLOV8 on dataset #10475

Notenlish · 2024-04-30T17:33:08Z

Search before asking

I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

I'm trying to train a yolov8 model using ultralytics module.

My code is like this:

from ultralytics import YOLO

model = YOLO('yolov8n.pt')

metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)

I'm using this dataset here: https://universe.roboflow.com/rizwan-babar/ms_thesis/dataset/10 which has around 5k images.

Everything seems to be running fine but for some reason it gives an multiprocessing error at the start. In a similiar error they said that there must be enough ram but I don't think this is the case since at the highest memory usage I had around 4gb free ram and 3.3 gb free vram.

Any help would be appreciated, thanks.

Additional

Here's the log:

Ultralytics YOLOv8.2.5 🚀 Python-3.11.9 torch-2.3.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=datasets/deneme-5k/data.yaml, epochs=100, time=None, patience=100, batch=2, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=8, project=None, name=train12, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train12
Overriding model.yaml nc=80 with nc=7

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    752677  ultralytics.nn.modules.head.Detect           [7, [64, 128, 256]]
Model summary: 225 layers, 3012213 parameters, 3012197 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs\detect\train12', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning C:\Users\ihsan\npmbugsolve\UAV_teknofest\datasets\deneme-5k\train\labels.cache... 3408 images, 7 b
Ultralytics YOLOv8.2.5 🚀 Python-3.11.9 torch-2.3.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=datasets/deneme-5k/data.yaml, epochs=100, time=None, patience=100, batch=2, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=8, project=None, name=train13, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train13
Overriding model.yaml nc=80 with nc=7

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    752677  ultralytics.nn.modules.head.Detect           [7, [64, 128, 256]]
Model summary: 225 layers, 3012213 parameters, 3012197 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs\detect\train13', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning C:\Users\ihsan\npmbugsolve\UAV_teknofest\datasets\deneme-5k\train\labels.cache... 3408 images, 7 b
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 131, in _main   
    prepare(preparation_data)
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 246, in prepare 
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\ihsan\npmbugsolve\UAV_teknofest\ai\a.py", line 5, in <module>
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\model.py", line 673, in train
    self.trainer.train()
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 199, in train
    self._do_train(world_size)
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 313, in _do_train
    self._setup_train(world_size)
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 277, in _setup_train
    self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode="train")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
    return build_dataloader(dataset, batch_size, workers, shuffle, rank)  # return dataloader
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\data\build.py", line 137, in build_dataloader
    return InfiniteDataLoader(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\data\build.py", line 41, in __init__
    self.iterator = super().__iter__()
                    ^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in __init__
    w.start()
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start 
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2024-05-01T01:46:32Z

Hey there! It looks like the issue you're encountering is related to how the multiprocessing module is initialized in Windows environments. Python's multiprocessing requires a specific if-statement guard in scripts that are using the multiprocessing functionalities to avoid recursion in Windows.

To correct this issue in your code, you'll want to wrap your training call in an if __name__ == '__main__': block like this:

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolov8n.pt')
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)

Essentially, this ensures that your training only begins if your script is run as the main program. It prevents the recursive importing that's happening when using multiprocessing on Windows. Give this a try, and it should resolve the multiprocessing error you're seeing. Let me know how it goes! 🚀

RyanSunn · 2024-05-30T09:54:39Z

Hey there! It looks like the issue you're encountering is related to how the multiprocessing module is initialized in Windows environments. Python's multiprocessing requires a specific if-statement guard in scripts that are using the multiprocessing functionalities to avoid recursion in Windows.

To correct this issue in your code, you'll want to wrap your training call in an if __name__ == '__main__': block like this:
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolov8n.pt')
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)
Essentially, this ensures that your training only begins if your script is run as the main program. It prevents the recursive importing that's happening when using multiprocessing on Windows. Give this a try, and it should resolve the multiprocessing error you're seeing. Let me know how it goes! 🚀

thank you ,that is working

glenn-jocher · 2024-05-30T17:58:55Z

Great to hear that the solution worked for you! If you encounter any more issues or have further questions, feel free to reach out. Happy training with YOLOv8! 🚀

Notenlish added the question Further information is requested label Apr 30, 2024

Notenlish changed the title ~~Multiprocessing error when training YOLOV8 on dataset~~ Multiprocessing problem when training YOLOV8 on dataset Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing problem when training YOLOV8 on dataset #10475

Multiprocessing problem when training YOLOV8 on dataset #10475

Notenlish commented Apr 30, 2024

glenn-jocher commented May 1, 2024

RyanSunn commented May 30, 2024

glenn-jocher commented May 30, 2024

Multiprocessing problem when training YOLOV8 on dataset #10475

Multiprocessing problem when training YOLOV8 on dataset #10475

Comments

Notenlish commented Apr 30, 2024

Search before asking

Question

Additional

glenn-jocher commented May 1, 2024

RyanSunn commented May 30, 2024

glenn-jocher commented May 30, 2024