Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset yaml not found error #1201

Closed
1 of 2 tasks
luhgit opened this issue Feb 28, 2023 · 78 comments
Closed
1 of 2 tasks

dataset yaml not found error #1201

luhgit opened this issue Feb 28, 2023 · 78 comments
Labels
bug Something isn't working Stale

Comments

@luhgit
Copy link

luhgit commented Feb 28, 2023

Search before asking

  • I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Training

Bug

After I created the dataset and splits i.e. train, test and val and stored them in datatset/ directory. I tried to train the model using python API but fell into this Dataset config yaml file not found error. The path it complains about is an old path that does not exist as I renamed the project directory. So in the screenshot test_notebook/ directory does not exist but it assumes the config to be existing on this old path.

image

image

Then I tried with CLI and I got the same problem. As I was running all this in Jupyter Lab so I thought perhaps it could be due to jupyter notebook might cached somewhere the old path so I tried on terminal instead but the problem persisted. I was wondering if this could be something to do with yolov8? Has anyone faced such issues?

Environment

  • YOLO: Ultralytics YOLOv8== 8.0.43
  • OS: MacOS Ventura
  • Python: 3.9.13

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@luhgit luhgit added the bug Something isn't working label Feb 28, 2023
@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 28, 2023

👋 Hello! Thanks for asking about YOLOv8 🚀 dataset formatting. To train correctly your data must be in YOLO format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. This applies to both YOLOv5 and YOLOv8. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths) and 2) a class names dictionary:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

  • One row per object
  • Each row is class x_center y_center width height format.
  • Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
  • Class numbers are zero-indexed (start from 0).

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLO assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLO locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

@pax7
Copy link

pax7 commented Feb 28, 2023

@glenn-jocher that tutorial link, maybe be pointing at something else than you intended.

do you have a tutorial link to train yolov8*.yaml with one or two classes?

such that I can run this code and train my custom model?


modelDetector = YOLO("yolov8mCUSTOM.yaml")  # build a new model from scratch

results = modelDetector.train(data='CUSTOM.yaml', epochs=1)  # train the model
results = modelDetector.val()  # evaluate model performance on the validation set
success = modelDetector.export(format='onnx')  # export the model to ONNX format


does CUSTOM.yaml look like this:


path: ../datasets/custom2
train: images/train
val: images/val
test: images/test

# Classes
names:
  0: class1
  1: class2

and does yolov8mCUSTOM.yaml look like this for 2 classes or are there any other changes needed?


# Parameters
nc: 2  # number of classes
depth_multiple: 0.67  # scales module repeats
width_multiple: 0.75  # scales convolution channels

# YOLOv8.0m backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [768, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [768, True]]
  - [-1, 1, SPPF, [768, 5]]  # 9

# YOLOv8.0m head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [768]]  # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)


EDIT: the answer for above seems to be yes

@luhgit
Copy link
Author

luhgit commented Mar 1, 2023

@glenn-jocher : I have followed the steps you mentioned but still the problem persists. This is how my dataset.yaml looks like:
image

I have now moved my project directory to a completely new location but still, it looks for the old path where I created the project initially. This old path does not exist anymore hence it throws an error.

I even updated the path in the yaml file as you can see in the screenshot but it somehow is still looking for the old path.

Update!: It works though when I recreate the old path and move the project files to this old path location. Seems like this path is cached somewhere.

@ShiKeQuan
Copy link

ShiKeQuan commented Mar 2, 2023

I met the same problem yesterday,
image
it seems that there was a fixed path generated when the codes ran first time.

@ShiKeQuan
Copy link

I met the same problem yesterday, image it seems that there was a fixed path generated when the codes ran first time.

My new path is "D:/ultralytics-main/".
After testing, I find this path is linked with "./ultralytics/yolo/utils/init.py",
image
The path can't be changed.

@ShiKeQuan
Copy link

I soved this probem by changing another computer.
5XYBQ$VJ{XC0RUF8LR8H6F7

@ShiKeQuan
Copy link

In fact, you can change the path in your *.yaml, the code runs successfully after I change the path.

@TimbusCalin
Copy link

TimbusCalin commented Mar 4, 2023

So the solution to this is:

  1. Either you train on Ubuntu (UNIX system)
  2. You can train on Windows and see where the network "expects" the input dataset location and put the dataset in that specific "default/hardcoded folder" where the it is expected.
  3. You can train on Windows and provide the entire full path in the data.yaml (e.g .D:\folder_1\folder_2\datasets\my_dataset\train\images( for example full path in .yaml for train images)

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 4, 2023

@TimbusCalin @ShiKeQuan I think you guys might be referring to your dataset path in your yolo settings. Just type yolo settings to see this info and directly modify your settings YAML if you'd like to update the path.

@luhgit
Copy link
Author

luhgit commented Mar 5, 2023

Update!

Replacing the relative path with the absolute path to your dataset directory in yaml file solves the issue on Apple M2 Mac.

@github-actions
Copy link

github-actions bot commented Apr 5, 2023

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Apr 5, 2023
@luhgit
Copy link
Author

luhgit commented Apr 14, 2023

I have encountered this issue again. This time I am running a classification model.

Here is my config.yaml file:
image

The dataset directory structure under datasets/:

image

Each of these folders (train/val/test) contains class folders (A/B/C/D) which contain images.

I do not understand why does it always look for incorrect path i.e. /Users/aamit/datasets/dataset_config.yaml/train. Why is it appending yaml file in the path?

This is the error message that I get:

Ultralytics YOLOv8.0.77 🚀 Python-3.9.16 torch-2.0.0 CPU
yolo/engine/trainer: task=classify, mode=train, model=yolov8s-cls.pt, data=dataset_config.yaml, epochs=15, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/classify/train3

Dataset not found ⚠️, missing path /Users/aamit/datasets/dataset_config.yaml, attempting download...
Downloading https://github.com/ultralytics/yolov5/releases/download/v1.0/dataset_config.yaml.zip to /Users/aamit/datasets/dataset_config.yaml.zip...
⚠️ Download failure, retrying 1/3 https://github.com/ultralytics/yolov5/releases/download/v1.0/dataset_config.yaml.zip...
######################################################################## 100.0%
Unzipping /Users/aamit/datasets/dataset_config.yaml.zip to /Users/aamit/datasets...
Dataset download success ✅ (0.6s), saved to /Users/aamit/datasets/dataset_config.yaml

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py:120, in BaseTrainer.__init__(self, cfg, overrides, _callbacks)
    119 if self.args.task == 'classify':
--> 120     self.data = check_cls_dataset(self.args.data)
    121 elif self.args.data.endswith('.yaml') or self.args.task in ('detect', 'segment'):

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py:303, in check_cls_dataset(dataset)
    302 nc = len([x for x in (data_dir / 'train').glob('*') if x.is_dir()])  # number of classes
--> 303 names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
    304 names = dict(enumerate(sorted(names)))

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/data/utils.py:303, in <listcomp>(.0)
    302 nc = len([x for x in (data_dir / 'train').glob('*') if x.is_dir()])  # number of classes
--> 303 names = [x.name for x in (data_dir / 'train').iterdir() if x.is_dir()]  # class names list
    304 names = dict(enumerate(sorted(names)))

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/pathlib.py:1160, in Path.iterdir(self)
   1157 """Iterate over the files in this directory.  Does not yield any
   1158 result for the special paths '.' and '..'.
   1159 """
-> 1160 for name in self._accessor.listdir(self):
   1161     if name in {'.', '..'}:
   1162         # Yielding a path object for these makes little sense

FileNotFoundError: [Errno 2] No such file or directory: '/Users/aamit/datasets/dataset_config.yaml/train'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[10], line 2
      1 model = YOLO('yolov8s-cls.pt')
----> 2 model.train(data='dataset_config.yaml', epochs=15, imgsz=640)

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/engine/model.py:362, in YOLO.train(self, **kwargs)
    360     overrides['resume'] = self.ckpt_path
    361 self.task = overrides.get('task') or self.task
--> 362 self.trainer = TASK_MAP[self.task][1](overrides=overrides, _callbacks=self.callbacks)
    363 if not overrides.get('resume'):  # manually set model only if not resuming
    364     self.trainer.model = self.trainer.get_model(weights=self.model if self.ckpt else None, cfg=self.model.yaml)

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/v8/classify/train.py:20, in ClassificationTrainer.__init__(self, cfg, overrides, _callbacks)
     18     overrides = {}
     19 overrides['task'] = 'classify'
---> 20 super().__init__(cfg, overrides, _callbacks)

File /usr/local/Caskroom/miniconda/base/envs/common_x86/lib/python3.9/site-packages/ultralytics/yolo/engine/trainer.py:126, in BaseTrainer.__init__(self, cfg, overrides, _callbacks)
    124             self.args.data = self.data['yaml_file']  # for validating 'yolo train data=url.zip' usage
    125 except Exception as e:
--> 126     raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
    128 self.trainset, self.testset = self.get_dataset(self.data)
    129 self.ema = None

RuntimeError: Dataset 'dataset_config.yaml' error ❌ [Errno 2] No such file or directory: '/Users/aamit/datasets/dataset_config.yaml/train'

@github-actions github-actions bot removed the Stale label Apr 15, 2023
@treoa
Copy link

treoa commented Apr 17, 2023

I got the same error, when it was always addressing the different directory.
Resolved the issue following this: dataset download directory is '/mnt/nvme1_1T/senior/ultralytics/examples/datasets'. You can update this in '/root/.config/Ultralytics/settings.yaml'

@glenn-jocher
Copy link
Member

@treoa it seems that your dataset path for your classification model is not being correctly recognized by the YOLOv5 engine. The error message you provided shows that it is trying to download your dataset and saves it in the wrong directory.

One possible reason for this issue is that the path to your dataset directory is incorrectly defined in your yaml file. You may need to double-check that the path is specified correctly under train: dir:, val: dir:, and test: dir:.

Another solution is to update your dataset download directory in ~/.config/Ultralytics/settings.yaml. You should change download_dir to match your actual dataset directory, which may solve the problem of the datasets being saved in the wrong directory.

Once you have updated the dataset path or the download directory, you should be able to run your classification model without any issues.

@realgump
Copy link

So how to train more than one yolov8 project conveniently? The yolo engine always use the same setting in ~/.config/Ultralytics/settings.yaml, while different project needs different settings.

@Votun
Copy link

Votun commented Apr 24, 2023

I also met this error.
I managed to solve it replacing "data.yml" with "data.yaml". The file format wasn't an issue in YOLOv5.
I believe the root is in file validator.py, line 124 of the ultralytics package.

@glenn-jocher
Copy link
Member

It seems that the error experienced by @Votun is due to a file naming mismatch. This error can be resolved by ensuring that the filename specified within the code matched the actual filename of the file.

To elaborate, when working with external files, it is important to ensure that the filenames are accurate so that the code can reference the required file. By making a naming mistake in the filename, such as using .yml instead of .yaml, the code may not be able to locate the file and raise an error.

It is important to double-check the filename and file extension during the development process to ensure that the code can access the appropriate files needed for the program to run without errors. In general, developers should also provide clear and meaningful error messages to assist other developers and end-users in understanding and fixing issues they may encounter during the runtime.

@HasanBeratSoke
Copy link

image
image

i was the same issue with data.yaml in Colab notebook
i added a path into the data.yaml with /content/
its works for me guys :)

@glenn-jocher
Copy link
Member

@HasanBeratSoke hello,

We appreciate your feedback and we are glad to hear that adding the path /content/ into the data.yaml file has resolved the issue for you.

For others who may encounter a similar issue, it is possible that the data.yaml file may need some modifications to make it compatible with the environment. The data.yaml file is responsible for listing the files and paths of the dataset used in training the YOLOv8 model.

To make sure that the path to the dataset is correctly specified, you may need to modify the data.yaml file by adding the root directory path of the dataset. This can be done by adding a forward slash / before the directory name or a full path like /content/dataset/ (assuming the dataset is in the content folder).

We hope that this helps and please let us know if you encounter any other issues.

@jaideep11061982
Copy link

.config/Ultralytics/settings.yaml

@glenn-jocher unable to find this ghost file..
could you please help how and where we can find this.. this adding to our debug time for simple thing

@ghost
Copy link

ghost commented Jul 27, 2023

Снимок экрана 2023-07-27 183026

How to fix?

@glenn-jocher
Copy link
Member

@jaideep11061982 hello,

The error message "no module named 'utils'" indicates that Python is unable to locate the module 'utils'. The 'utils' module is likely specific to the YOLOv8 project and should be located within the project's directory.

Now, there could be two reasons why you're facing this issue:

  1. The 'utils' module is missing from the project's directory. Ensure that the 'utils' module file is present in the project directory.

  2. If the 'utils' module file is present, then the issue could be related to the Python PATH environment variable. The Python interpreter might be looking in the wrong place for the 'utils' module. The Python PATH should include the directory that contains the 'utils.py' file.

To verify this, you can print your Python PATH in your Python script by importing the 'sys' module and adding print(sys.path). This will print a list of directories that Python is currently looking in for modules. Make sure that the directory containing the 'utils.py' module is in that list. If it is not, you need to add it.

If these steps don't resolve your issue, it could be helpful to provide more information so that one can further diagnose the problem. It would be helpful to know more about your development environment, e.g. operating system, Python version, and the project's directory structure.

Hope this helps! Let us know if you need any further assistance.

@ftmhabibii
Copy link

ftmhabibii commented Jul 29, 2023

How to add 'utils.py' module to python directory path?
I also encounter this error
2023-07-29_170823

@glenn-jocher
Copy link
Member

@ftmhabibii in order to import the 'utils.py' file, Python must be able to locate it. The Python interpreter searches for modules in the directories specified in sys.path, which is a list that includes the directory of the script you are running, followed by the Python installation's lib directory, and then any directories specified by the PYTHONPATH environment variable.

To add the 'utils.py' module to the Python directory path, you'll need to ensure that the directory containing 'utils.py' is included in sys.path. This can be done by adding a line of code at the top of your script that appends the 'utils.py' module's directory path to sys.path.

Here is the general syntax for what that would look like:

import sys
sys.path.append('/path/to/your/utils')

Replace '/path/to/your/utils' with the actual directory path where your 'utils.py' file is located.

Remember to include this line of code at the start of your script to ensure Python can find 'utils.py' whenever it needs to import it.

If your 'utils.py' file is in the same directory as your main script, you shouldn't have to modify sys.path at all, Python should be able to find it without any issues.

Lastly, ensure that your 'utils.py' is error-free and is named correctly (including the .py file extension).

I hope this helps! Let me know if you have any further questions.

@github-actions
Copy link

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Aug 30, 2023
@Akucoolzz
Copy link

Hi, I am currently facing some issues with this. Would anyone be able to help me? Thank you

InkedBug2
InkedBug

@github-actions github-actions bot added the Stale label Dec 13, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2023
@sjussjs
Copy link

sjussjs commented Jan 11, 2024

Hello, glenn-jocher please help me
Why it cannot find the images which I already put in the folders.
IMG_2598

@glenn-jocher
Copy link
Member

@sjussjs hello,

It appears there may be an issue with the path configuration in your dataset. Please ensure that your data.yaml file correctly points to the directories where your images are stored. The paths should be relative to the data.yaml file or absolute paths on your system. Also, verify that the image file extensions in your dataset match those listed in your data.yaml. If the paths and extensions are correct, the images should be detected by the model. If you continue to face issues, please provide more details about your directory structure and data.yaml contents.

@sjussjs
Copy link

sjussjs commented Jan 13, 2024

@sjussjs hello,

It appears there may be an issue with the path configuration in your dataset. Please ensure that your data.yaml file correctly points to the directories where your images are stored. The paths should be relative to the data.yaml file or absolute paths on your system. Also, verify that the image file extensions in your dataset match those listed in your data.yaml. If the paths and extensions are correct, the images should be detected by the model. If you continue to face issues, please provide more details about your directory structure and data.yaml contents.

thanks you sir @glenn-jocher
After I already train the picture, why is it unable to detect the picture that was included in the training?
This is my first time to use Linux and learning code
Screenshot from 2024-01-14 02-03-18

@glenn-jocher
Copy link
Member

Hello @sjussjs,

If your model is trained but not detecting objects in images it was trained on, consider the following:

  1. Check Confidence Threshold: The default confidence threshold might be too high. Try lowering it when running predictions.
  2. Review Training Process: Ensure the model has trained adequately with a suitable learning rate and for enough epochs.
  3. Inspect Annotations: Confirm that the training data was correctly labeled and that the model was exposed to a variety of examples for each class.
  4. Evaluate Model: Use the Val mode to evaluate the model on a validation set and review the metrics provided.

For further assistance, please share details about your training process, including the number of epochs, batch size, and any output logs that indicate the model's performance during training.

@sjussjs
Copy link

sjussjs commented Jan 15, 2024

I changed to train with 338 images of tomatoes, including two types: ripe tomatoes and unripe tomatoes that download from Kaggle. https://www.kaggle.com/datasets/techkhid/riped-and-unriped-tomato-dataset/discussion
Should I do in co-lab instead?

Screenshot from 2024-01-15 19-27-31

Screenshot from 2024-01-15 19-27-51

@glenn-jocher
Copy link
Member

@sjussjs hello,

Training with a specific dataset like the one you've downloaded for ripe and unripe tomatoes should work fine. Whether you use Colab or another environment depends on your preference and resource availability. Colab can provide free access to GPUs, which can speed up training. Just ensure your dataset is correctly formatted for YOLOv8, with a proper data.yaml file, and that you've set the paths correctly. If you encounter any issues, please provide details about the error messages or problems you're facing.

@Johnny-zbb
Copy link

@glenn-jocher Hi,I am having the same problem with the tune() function:
image
image
It doesn't recognize my data.yaml, but if I use the train function it is possible to train, I checked my catalogs are all right:
image
image
Can you help me with this, please?

@glenn-jocher
Copy link
Member

@Johnny-zbb hello,

It seems there might be an issue with how the tune() function is accessing your data.yaml. Please ensure that the file path is correctly specified and accessible from your current working directory. If train() works with the same data.yaml, it suggests the file and paths are correct, but there might be a context issue within the tune() function. Double-check the working directory and the path passed to tune(). If the problem persists, please provide the exact error message and the command you're using to call tune().

@Johnny-zbb
Copy link

@glenn-jocher Hi,I was using the jupyter compiler before, and this time I tried to run the python script, and it still says that my dataset does not exist:

2024-02-03 05:37:10,115 ERROR tune_controller.py:1374 -- Trial task failed for trial _tune_42e52_00003
Traceback (most recent call last):
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::ImplicitFunc.train() (pid=247922, ip=172.19.0.9, actor_id=3cafa13616b52c49613a1a8001000000, repr=_tune)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/data/utils.py", line 253, in check_det_dataset
    file = check_file(dataset)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/utils/checks.py", line 460, in check_file
    raise FileNotFoundError(f"'{file}' does not exist")
FileNotFoundError: 'dataset/data.yaml' does not exist

The above exception was the direct cause of the following exception:

ray::ImplicitFunc.train() (pid=247922, ip=172.19.0.9, actor_id=3cafa13616b52c49613a1a8001000000, repr=_tune)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/air/_internal/util.py", line 88, in run
    self._ret = self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 115, in <lambda>
    training_func=lambda: self._trainable_func(self.config),
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 332, in _trainable_func
    output = fn()
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/utils/tuner.py", line 104, in _tune
    results = model_to_train.train(**config)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/engine/model.py", line 351, in train
    self.trainer = (trainer or self._smart_load('trainer'))(overrides=args, _callbacks=self.callbacks)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/models/yolo/segment/train.py", line 30, in __init__
    super().__init__(cfg, overrides, _callbacks)
  File "/root/miniconda3/envs/pytorch21/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 120, in __init__
    raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset 'dataset/data.yaml' error ❌ 'dataset/data.yaml' does not exist```

But actually my file exists, here is my code and environment:

import os
from ultralytics import YOLO

print(os.getcwd())

file_path = "/root/coding/yolo-seg/tune/dataset/data.yaml"
if os.path.exists(file_path):
    print(f"The file {file_path} exists.")
else:
    print(f"The file {file_path} does not exist.")

model = YOLO('yolov8n-seg.pt')
results = model.train(data='dataset/data.yaml', epochs=1, imgsz=640)

# Define a YOLO model
model = YOLO("yolov8n-seg.pt")

# Run Ray Tune on the model
result_grid = model.tune(data="dataset/data.yaml",
                         epochs=100,
                         use_ray=True)

Command line run directory:/coding/yolo-seg/tune

@glenn-jocher
Copy link
Member

Hello @Johnny-zbb,

The error indicates that the tune() function cannot find the data.yaml file at the specified location. Since you've confirmed the file exists and the path is correct, it's possible that the working directory might be different when the tune() function is called.

Please ensure that the path to data.yaml is correct relative to the working directory from which you are running the script. You might want to use an absolute path for the data.yaml file when calling the tune() function to avoid any path-related issues.

If the problem persists, verify that the environment where you're running the script has the necessary permissions to access the file and that there are no typos in the file path.

@Johnny-zbb
Copy link

@glenn-jocher Hello, I am running my code in the /root/coding/yolo-seg/tune directory and data.yaml is in the /root/coding/yolo-seg/tune/dataset/data.yaml directory. The os.path.exists(file_path) code verifies that the file exists

@glenn-jocher
Copy link
Member

Hello @Johnny-zbb,

Given that you've verified the existence of data.yaml using os.path.exists(file_path), it seems there might be a discrepancy in how the path is being resolved within the tune() function. Please try using the absolute path to data.yaml when calling tune() to ensure the correct file is referenced. If the issue persists, it may be related to the environment or context in which tune() is executed.

@kyatha
Copy link

kyatha commented Feb 6, 2024

@glenn-jocher Hey, I'm facing the same issue, using the absolute path doesnt solve it. RuntimeError: Dataset '/work/bbu/xxx/PeopleCounting/gender_data/data.yaml' error ❌ [Errno 20] Not a directory: '/work/bbu/xxx/PeopleCounting/gender_data/data.yaml/train' .

my data.yaml is in /work/bbu/xxx/PeopleCounting/gender_data/ while the data is in /work/bbu/xxx/PeopleCounting/PeopleCounting/gender_data/dataset

data.yaml has the following info:

train: gender_data/dataset/train/images
val: gender_data/dataset/val/images

nc: 2 # Number of classes (male and female)
names: ['male', 'female'] # Class names
# Load the model.
model_gender = YOLO('models/yolov8m-cls.pt')

result_grid = model_gender.tune(data= "/work/bbu/xxx/PeopleCounting/gender_data/data.yaml",epochs=50,gpu_per_trial=1,use_ray=True)
```

@glenn-jocher
Copy link
Member

@kyatha hello,

The error suggests that the tune() function is appending /train to the data.yaml path, which should not happen if the file structure is correct. Ensure that the paths in your data.yaml are relative to the location of data.yaml itself and not the working directory. Also, double-check that there are no typos or formatting issues in data.yaml. If the problem continues, consider validating the structure and content of your data.yaml against the expected format in the documentation.

@kyatha
Copy link

kyatha commented Feb 14, 2024

@kyatha hello,

The error suggests that the tune() function is appending /train to the data.yaml path, which should not happen if the file structure is correct. Ensure that the paths in your data.yaml are relative to the location of data.yaml itself and not the working directory. Also, double-check that there are no typos or formatting issues in data.yaml. If the problem continues, consider validating the structure and content of your data.yaml against the expected format in the documentation.

works with a custom model but not the classification model. Is the data supposed to be organized differently? In a different post, you mentioned something about having separate folders for the images i.e male and female within the train and val folders.

@glenn-jocher
Copy link
Member

@kyatha it seems like the issue you're encountering is related to the dataset configuration YAML file not being found at the expected location. This could be due to a few reasons:

  1. The dataset YAML file path is hardcoded or cached somewhere in your environment or code.
  2. The model you're trying to train has retained the old path as an attribute from a previous training session.
  3. There might be an issue with the working directory in Jupyter Lab or the terminal.

Here are a few steps you can take to resolve the issue:

  1. Clear Jupyter Cache: If you're using Jupyter Lab, try to clear any cached outputs and restart the kernel to ensure it's not holding onto the old path.

  2. Check Model Attributes: If you're loading a pre-trained model or a checkpoint, it might have the old dataset path saved as an attribute. You can check and update this by accessing the model's data attribute and setting it to the correct path.

  3. Verify Working Directory: Ensure that your working directory is set to the correct location where your dataset YAML file exists. You can check the current working directory with os.getcwd() and change it with os.chdir(path) if necessary.

  4. Update Dataset YAML Path: When initializing the YOLO object or calling the train method, explicitly pass the correct path to your dataset YAML file using the data argument.

  5. Check YAML File: Ensure that the dataset YAML file is correctly formatted and contains the right paths to your train, val, and test splits.

  6. Use Absolute Paths: To avoid any confusion with relative paths, consider using absolute paths when specifying the location of your dataset YAML file.

  7. Recreate YAML File: If the issue persists, try creating a new dataset YAML file from scratch and ensure it's saved in the correct location.

If you continue to face issues, please provide the exact command or code snippet you're using to initialize the YOLO object and start the training process, as well as the contents of your dataset YAML file. This will help in diagnosing the problem more accurately.

@RoadToML
Copy link

RoadToML commented Mar 10, 2024

Hey @glenn-jocher I am running into the same issue with model.tune() and haven't been able to solve it based on the suggestions provided.

Minimum example would be:

from ultralytics import YOLO

model = YOLO('path/to/yolov8s.yaml')

# This works - data.yaml is found
model.train(data='path/to/data.yaml', cfg='path/to/cfg.yaml', **other_params)

# this does not work - data.yaml is not found
model.tune(data='same/path/to/data.yaml', **other params)

What could be the issue?

@crapthings
Copy link

@RoadToML

#8823

what is cfg inside? is there any example?

@RoadToML
Copy link

RoadToML commented Mar 11, 2024

@RoadToML

#8823

what is cfg inside? is there any example?

In my cfg file, the data param was empty (as I provided it in train()).

The solution in my case is to provide the data.yaml path in the config file and the tune() param.

I think there is some bug as we shouldn't need to provide this twice, (considering not providing it in the cfg file works for train()?)

@glenn-jocher
Copy link
Member

Hey @RoadToML 👋,

Thanks for reaching out! The cfg parameter typically refers to a configuration file that outlines model architecture, training hyperparameters, and other settings. It's used to customize the training process.

For tune(), it seems like a workaround for now is indeed to specify the data.yaml path both in your config file and directly as a parameter to tune(). Here's a quick example of how you might set it up:

# Inside your cfg.yaml
data: path/to/data.yaml
# Other configurations...

And then in your Python script:

model.tune(data='path/to/data.yaml', cfg='path/to/cfg.yaml', **other_params)

We'll look into why this discrepancy exists between train() and tune()—it does sound like something we need to address. Thanks for bringing this to our attention! 🛠️

@bozekry
Copy link

bozekry commented Mar 18, 2024

TypeError: 'str' object does not support item assignment

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/ultralytics/engine/trainer.py in init(self, cfg, overrides, _callbacks)
140 self.args.data = self.data["yaml_file"] # for validating 'yolo train data=url.zip' usage
141 except Exception as e:
--> 142 raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
143
144 self.trainset, self.testset = self.get_dataset(self.data)

RuntimeError: Dataset '/content/drive/MyDrive/data/data.yaml' error ❌ 'str' object does not support item assignment can any one help me please?

@glenn-jocher
Copy link
Member

@bozekry hey there! 🌟 It seems like the issue you're encountering is trying to assign a value to a string, which is not allowed in Python. Given the context, it looks like there’s an attempt to modify the self.args.data string directly in the trainer.py.

Here’s a quick suggestion: ensure that any modifications to paths or configurations are done using correct data structures or through methods designed for such updates. If you’re trying to update paths or configurations, use a dictionary or similar data structure rather than a string.

Without knowing the exact intention behind assigning values to self.args.data, it's a bit challenging to provide an exact solution. However, if you're looking to dynamically set the path to your dataset's YAML file, consider adjusting how you're handling the data attribute or argument.

If the issue persists or if there's more to the context, feel free to share more details! Happy coding! 😊

@srikarnikhil
Copy link

@glenn-jocher I tried every solution here, but it didn't work. i connected my google drive to google colab and accessing the data from there,

here is my setup
error code:
image

google drive file hierarchy:
image

coco.yaml file:
image

I'm not sure what's the issue, trying this since a week now but I keep getting the same error.

it says "You can update this in '/root/.config/Ultralytics/settings.yaml'"

I did updated the file with my actual location (I tried without quotes in the datasets_dir: but that didn't work as well)

image

could someone please help me

@glenn-jocher
Copy link
Member

Hey there! 😊 It looks like you've encountered quite a stubborn issue with setting up your dataset from Google Drive in Colab. Let's try to address this. First, ensure that your Google Drive is correctly mounted in Colab:

from google.colab import drive
drive.mount('/content/drive')

After mounting, verify that your data.yaml file's paths are correctly pointing to the dataset's location within Google Drive, something similar to this:

train: /content/drive/MyDrive/your_dataset_path/train/images
val: /content/drive/MyDrive/your_dataset_path/val/images

If you've already done that and updated the settings.yaml but still face issues, try directly providing the path to the data.yaml file when initializing your model or running training/prediction, bypassing the need to modify settings.yaml:

from ultralytics import YOLO
model = YOLO(data='/content/drive/MyDrive/your_dataset_path/data.yaml')

Ensure there are no typos in paths and that you're using the full, correct path to your dataset.

Hope this helps! If you're still stuck, could you share the exact error message you're seeing? That could give us more clues. 🕵️‍♂️

@srikarnikhil
Copy link

from ultralytics import YOLO
model = YOLO(data='/content/drive/MyDrive/your_dataset_path/data.yaml')

@glenn-jocher seems there is no argument data for YOLO (using yolov8)

image

this line used to work before,
image

so I tried to use both at the time but it still the same issue.
image

@glenn-jocher
Copy link
Member

Hey there! 😊 It seems there's been a slight confusion around using the data argument with the YOLO class in YOLOv8. The correct approach to specify your dataset configuration when initializing a YOLO model is through the train() function, rather than passing it directly to the YOLO class initializer.

Here's how you can do it:

from ultralytics import YOLO

# Initialize the model
model = YOLO('path/to/model.yaml')

# Now, specify your dataset YAML when calling train
model.train(data='/content/drive/MyDrive/your_dataset_path/data.yaml')

This should help you get past the issue you're facing. The direct data argument in the YOLO constructor was from an older version, and it seems it's no longer supported in the same way with YOLOv8. Let me know if you need more help!

@srikarnikhil
Copy link

Hi @glenn-jocher thanks for your reply, unfortunately, the above solution didn't work, it's the same thing that I tried at the start. I want to use the pre-trained model so I was using this line model = YOLO('yolov8n.pt') instead of your line

# Initialize the model
model = YOLO('path/to/model.yaml')

I still get the same error

image

here is my yaml file https://drive.google.com/file/d/1YTXGlEhrvQaRnA33qbE8HoOOcek1tkHv/view?usp=sharing

@glenn-jocher
Copy link
Member

@srikarnikhil hi there! Thanks for reaching out again 🌼. I'm sorry to hear that the solution didn't resolve the issue. Since you're using a pre-trained model with model = YOLO('yolov8n.pt'), and if you are still seeing the same error, it looks like we might need a slight adjustment in the approach, particularly when it comes to specifying the dataset for training.

For using a pre-trained model and specifying your dataset, ensure you're invoking the train() method correctly with the path to your data.yaml. Here's an example:

from ultralytics import YOLO

# Load the pre-trained model
model = YOLO('yolov8n.pt') 

# Specify your dataset YAML when calling train
model.train(data='/content/drive/MyDrive/your_dataset_path/data.yaml')

Make sure /content/drive/MyDrive/your_dataset_path/data.yaml is the correct path where your_dataset_path should be replaced by the actual path to your dataset in Google Drive.

Also, could you check that your Google Drive is properly mounted in your notebook? Sometimes, the path issues may stem from the Drive not being correctly accessible.

If the issue persists, please double-check the YAML file's format and paths to ensure they're all correct and accessible. The YAML file you've shared seems fine, but unfortunately, I can't access it directly due to Google Drive's permissions - ensuring it follows the correct formatting is key.

Hope this helps, and looking forward to getting you past this hiccup! 🛠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests