Train different tasks at same time #20

acai66 · 2020-06-06T08:56:10Z

🚀 Feature

Train different tasks at same time.

Motivation

there always are multi gpu in a machine, We should have been able to train different models at same time, but outputs and results are stored in same directory now, it may be conflict.

Pitch

split outputs and results include weights in separate directories.

Alternatives

Additional context

I made a temporary change to train.py so i can train different tasks, but i really hope this funiction will be official support.
tkanks.

    wdir = 'weights' + os.sep + opt.name + os.sep  # weights dir
    if not os.path.exists(wdir):
        os.mkdir(wdir) 
    last = wdir + 'last.pt'
    best = wdir + 'best.pt'
    results_dir = 'logs' + os.sep + opt.name + os.sep
    results_file = results_dir + 'results.txt'

The text was updated successfully, but these errors were encountered:

github-actions · 2020-06-06T08:56:52Z

Hello @acai66, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI surveillance systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-06-07T05:34:56Z

@acai66 yes you make a good point. We use multiple docker containers on a single machine to exploit multiple single-gpu trainings simultaneously.

Without docker containers you might simply copy the directory, one per gpu.

For a more comprehensive solution, we might be better off depositing all run-related items (jpgs, results.txt, checkpoints etc.) into the unique ./runs directory already created automatically by tensorboard when a training run starts. What do you think?

acai66 · 2020-06-07T09:14:08Z

@acai66 yes you make a good point. We use multiple docker containers on a single machine to exploit multiple single-gpu trainings simultaneously.

Without docker containers you might simply copy the directory, one per gpu.

For a more comprehensive solution, we might be better off depositing all run-related items (jpgs, results.txt, checkpoints etc.) into the unique ./runs directory already created automatically by tensorboard when a training run starts. What do you think?

good idea, thanks

glenn-jocher · 2020-06-07T16:13:20Z

The unique directory is defined in

yolov5/train.py

Lines 394 to 399 in b810b21

    
           # Train 
        
           if not opt.evolve: 
        
               tb_writer = SummaryWriter(comment=opt.name) 
        
               print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/') 
        
               train(hyp)

tb_writer.log_dir
Out[3]: 'runs/Jun07_09-10-55_Glenns-MBP.attlocal.net'

glenn-jocher · 2020-06-17T19:22:41Z

@acai66 see #104, this PR seems to address many of your concerns. Perhaps you could look it over and give feedback to the PR author.

github-actions · 2020-08-01T05:27:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

speed up evaluation

acai66 added the enhancement New feature or request label Jun 6, 2020

glenn-jocher mentioned this issue Jun 17, 2020

Log command line options, hyperparameters, and weights per run in runs/ #104

Merged

matinhosseiny mentioned this issue Jun 23, 2020

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

Closed

DLLXW mentioned this issue Jul 3, 2020

RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127) #281

Closed

github-actions bot added the Stale label Aug 1, 2020

Polary-L mentioned this issue Aug 7, 2020

I changed backbone and head of yolov5s,train process done,error happened in detect process. #658

Closed

github-actions bot closed this as completed Aug 12, 2020

wuzuiyuzui mentioned this issue Nov 27, 2020

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #1546

Closed

jerryWTMH mentioned this issue Dec 22, 2020

RuntimeError: CUDA error: unspecified launch failure #1752

Closed

113HQ mentioned this issue Feb 21, 2021

Dear author, can you provide a visualization scheme for YOLOV5 feature graphs during detect.py? Thank you! #2259

Closed

alicera mentioned this issue Jun 23, 2021

different gpus to train #3736

Closed

coallar mentioned this issue Sep 18, 2021

CUDA error: the launch timed out and was terminated #4851

Closed

zldrobit pushed a commit to zldrobit/yolov5 that referenced this issue Sep 3, 2022

Merge pull request ultralytics#20 from Laughing-q/instance_seg

2eb1a71

speed up evaluation

manole-alexandru added a commit to manole-alexandru/yolov5-uolo that referenced this issue Apr 25, 2023

ultralytics#20 Added extra traditional CV input

a0a1074

manole-alexandru added a commit to manole-alexandru/yolov5-uolo that referenced this issue Apr 25, 2023

Extra preprocessed input (Fixed issues) ultralytics#20

37a5080

cool112624 mentioned this issue May 16, 2023

DDP training with multiple gpu using wsl #11519

Closed

1 task

jcluo1994 mentioned this issue Oct 10, 2023

Using multi-GPU training reports errors #12213

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train different tasks at same time #20

Train different tasks at same time #20

acai66 commented Jun 6, 2020

github-actions bot commented Jun 6, 2020 •

edited by glenn-jocher

glenn-jocher commented Jun 7, 2020

acai66 commented Jun 7, 2020

glenn-jocher commented Jun 7, 2020

glenn-jocher commented Jun 17, 2020

github-actions bot commented Aug 1, 2020

Train different tasks at same time #20

Train different tasks at same time #20

Comments

acai66 commented Jun 6, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

github-actions bot commented Jun 6, 2020 • edited by glenn-jocher

glenn-jocher commented Jun 7, 2020

acai66 commented Jun 7, 2020

glenn-jocher commented Jun 7, 2020

glenn-jocher commented Jun 17, 2020

github-actions bot commented Aug 1, 2020

github-actions bot commented Jun 6, 2020 •

edited by glenn-jocher