Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing problem when training YOLOV8 on dataset #10475

Open
1 task done
Notenlish opened this issue Apr 30, 2024 · 3 comments
Open
1 task done

Multiprocessing problem when training YOLOV8 on dataset #10475

Notenlish opened this issue Apr 30, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Notenlish
Copy link

Search before asking

Question

I'm trying to train a yolov8 model using ultralytics module.

My code is like this:

from ultralytics import YOLO

model = YOLO('yolov8n.pt')

metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)

I'm using this dataset here: https://universe.roboflow.com/rizwan-babar/ms_thesis/dataset/10 which has around 5k images.

Everything seems to be running fine but for some reason it gives an multiprocessing error at the start. In a similiar error they said that there must be enough ram but I don't think this is the case since at the highest memory usage I had around 4gb free ram and 3.3 gb free vram.

Any help would be appreciated, thanks.

Additional

Here's the log:

Ultralytics YOLOv8.2.5 🚀 Python-3.11.9 torch-2.3.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=datasets/deneme-5k/data.yaml, epochs=100, time=None, patience=100, batch=2, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=8, project=None, name=train12, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train12
Overriding model.yaml nc=80 with nc=7

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    752677  ultralytics.nn.modules.head.Detect           [7, [64, 128, 256]]
Model summary: 225 layers, 3012213 parameters, 3012197 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs\detect\train12', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning C:\Users\ihsan\npmbugsolve\UAV_teknofest\datasets\deneme-5k\train\labels.cache... 3408 images, 7 b
Ultralytics YOLOv8.2.5 🚀 Python-3.11.9 torch-2.3.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=datasets/deneme-5k/data.yaml, epochs=100, time=None, patience=100, batch=2, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=8, project=None, name=train13, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train13
Overriding model.yaml nc=80 with nc=7

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    752677  ultralytics.nn.modules.head.Detect           [7, [64, 128, 256]]
Model summary: 225 layers, 3012213 parameters, 3012197 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs\detect\train13', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning C:\Users\ihsan\npmbugsolve\UAV_teknofest\datasets\deneme-5k\train\labels.cache... 3408 images, 7 b
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 131, in _main   
    prepare(preparation_data)
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 246, in prepare 
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\ihsan\npmbugsolve\UAV_teknofest\ai\a.py", line 5, in <module>
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\model.py", line 673, in train
    self.trainer.train()
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 199, in train
    self._do_train(world_size)
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 313, in _do_train
    self._setup_train(world_size)
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\engine\trainer.py", line 277, in _setup_train
    self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode="train")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
    return build_dataloader(dataset, batch_size, workers, shuffle, rank)  # return dataloader
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\data\build.py", line 137, in build_dataloader
    return InfiniteDataLoader(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\ultralytics\data\build.py", line 41, in __init__
    self.iterator = super().__iter__()
                    ^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ihsan\npmbugsolve\UAV_teknofest\venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in __init__
    w.start()
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start 
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\İhsan\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html

@Notenlish Notenlish added the question Further information is requested label Apr 30, 2024
@Notenlish Notenlish changed the title Multiprocessing error when training YOLOV8 on dataset Multiprocessing problem when training YOLOV8 on dataset Apr 30, 2024
@glenn-jocher
Copy link
Member

Hey there! It looks like the issue you're encountering is related to how the multiprocessing module is initialized in Windows environments. Python's multiprocessing requires a specific if-statement guard in scripts that are using the multiprocessing functionalities to avoid recursion in Windows.

To correct this issue in your code, you'll want to wrap your training call in an if __name__ == '__main__': block like this:

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolov8n.pt')
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)

Essentially, this ensures that your training only begins if your script is run as the main program. It prevents the recursive importing that's happening when using multiprocessing on Windows. Give this a try, and it should resolve the multiprocessing error you're seeing. Let me know how it goes! 🚀

@RyanSunn
Copy link

Hey there! It looks like the issue you're encountering is related to how the multiprocessing module is initialized in Windows environments. Python's multiprocessing requires a specific if-statement guard in scripts that are using the multiprocessing functionalities to avoid recursion in Windows.

To correct this issue in your code, you'll want to wrap your training call in an if __name__ == '__main__': block like this:

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolov8n.pt')
    metrics = model.train(data="datasets/deneme-5k/data.yaml", device="cuda", batch=2)

Essentially, this ensures that your training only begins if your script is run as the main program. It prevents the recursive importing that's happening when using multiprocessing on Windows. Give this a try, and it should resolve the multiprocessing error you're seeing. Let me know how it goes! 🚀

thank you ,that is working

@glenn-jocher
Copy link
Member

Great to hear that the solution worked for you! If you encounter any more issues or have further questions, feel free to reach out. Happy training with YOLOv8! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants