Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

博主,数据集能告诉我下吗! #1

Closed
hhamm opened this issue Sep 11, 2021 · 15 comments
Closed

博主,数据集能告诉我下吗! #1

hhamm opened this issue Sep 11, 2021 · 15 comments

Comments

@hhamm
Copy link

hhamm commented Sep 11, 2021

No description provided.

@yatengLG
Copy link
Owner

数据增强部分对数据集格式没有要求,只要输出是 image, boxes, labels, image_name 就可以。
你可以按照这里写一个。

return image, boxes, labels, image_name

如果自己用labelimg标注了数据,可以通过 这个函数生成voc格式的数据集。(也提供了yolo格式数据集转换voc格式的函数)

def generate_voc(root:str, img_dir:str, xml_dir:str, train_rate:float=0.8):

你也可以直接下载voc数据集,具体链接: http://host.robots.ox.ac.uk/pascal/VOC/

@liu-ai-z
Copy link

博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误
File "E:/yolox-trans/train.py", line 234, in
fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch,
File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch
for iteration, batch in enumerate(gen):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call
images = default_collate(images)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 63, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: result type Float can't be cast to the desired output type Byte

@yatengLG
Copy link
Owner

@liu-ai-z

我看了下你的报错,应该是dataset 的 getitem 中返回的image数据类型不是numpy.ndarray

dataset返回的__getitem__是这个。

return image, boxes, labels, image_name

其中 image是

return image

@yatengLG
Copy link
Owner

@liu-ai-z
你可以初始化你的dataset
然后 dateset__geiitem__(0)拿出一个数据来,看看返回的具体是什么

@hhamm
Copy link
Author

hhamm commented Nov 12, 2021 via email

@liu-ai-z
Copy link

liu-ai-z commented Nov 22, 2021

@yatengLG 您好我的image输出是adarray格式 ,在使用您给的小demo时转化几张图片过后也会出现这个问题,包括使用num_work=1时也会出现。
捕获

@liu-ai-z
Copy link

我的数据集是自己的不是VOC数据集,但标注方式是一样的。

@yatengLG
Copy link
Owner

@liu-ai-z 你是不是跑到一半出问题了。

如果是,那就是数据存在问题, 你把dataloader里面的num_workers 改成1 。
然后去跑,看看是哪个数据有问题。

从你最开始贴的结果里面:

博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误
File "E:/yolox-trans/train.py", line 234, in
fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch,
File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch
for iteration, batch in enumerate(gen):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.

提示是在第三个进程里面出现问题了,所以我推断你这情况是跑一半出现的,是由于某个数据存在问题导致的。

在你贴的结果中,存在这一行

File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call
images = default_collate(images)

应该就是有一个图片数据存在问题,导致的报错。

@liu-ai-z
Copy link

from changeable.utils.display import draw_boxes, plot_image
from changeable.dataset import VOCDataset
from changeable.dataloader import dataloader
from changeable.transforms import *
from changeable.anchor import AnchorsAssignerWH,AnchorsGenerator
if name == 'main':

with open('pcb_classes.txt', 'r')as f:		# 类别名文件,每行一个类别名
    lines = f.readlines()
    classes_name = tuple([line.rstrip('\n') for line in lines])

dataset = VOCDataset(root='PCB',		# voc数据集根目录
                     classes_name=classes_name,
                     is_train=True,
                     transforms=Compose([		# 这里添加了所有的数据增强方式,只做例子演示用。
                         Resize((300, 300)),
                         AdaptiveResize((300, 300)),
                         Scaled(1.1),
                         CropIou(0.5),
                         CropSize((300, 300)),
                         DivideStds((1,1,1)),
                         SubtractMeans((0,0,0)),
                         GaussNoise(),
                         SalePepperNoise(),
                         GaussBlur(),
                         MotionBlue(),
                         Cutout(),
                         RandomFlipLR(),
                         RandomFlipUD(),
                         ShuffleChannels(),
                         ChangeContrast(),
                         ChangeHue(),
                         ChangeBrightness(),
                         ChangeSaturation(),
                         ConvertBoxesToPercentage(),
                         ConvertBoxesToValue(),
                         ConvertBoxesForm('xyxy', 'cxcywh'),
                         ConvertBoxesForm('cxcywh', 'xyxy'),
                     ])
                     )

anchors = AnchorsGenerator(image_size=(600, 600),
                           feature_maps_size=((76, 76), (38, 38), (19, 19)),
                           anchors_size=(((10, 13), (16, 30), (33, 23)),
                                         ((30, 61), (62, 45), (59, 119)),
                                         ((116, 90), (156, 198), (373, 326))),
                           form='xyxy',
                           clip=True
                           )

anchors_assigner = AnchorsAssignerWH(anchors, 3)

loader = dataloader(dataset, batch_size=4, resize=(600, 600),  use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=True, num_workers=4)

for i, (img, box, lab, ids) in enumerate(loader):
    print(i)
    print(img.size())
    print(box.size())
    print(lab.size())
    img, box, lab, id = img[0], box[0], lab[0], ids[0]
    img = img.permute((1, 2, 0)).numpy()
    box, lab = box.numpy(), lab.numpy()
    box, lab = box[lab>0], lab[lab>0]
    img = draw_boxes(img, box, lab, label_name=classes_name)
    plot_image(img)

@liu-ai-z
Copy link

liu-ai-z commented Nov 22, 2021 via email

@yatengLG
Copy link
Owner

快速排查哪个图片存在问题,你可以这么写:

## 将 shuffle =False, 不打乱数据顺序,num_workers = 1 使用一个进程去处理数据。
loader = dataloader(dataset, batch_size=4, resize=(600, 600),  use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=False, num_workers=1)

for i, (img, box, lab, ids) in enumerate(loader):
    print(ids)

如果是没有问题的数据,就会打印图片名。
直到报错,然后去你数据集的那个训练.txt 找没打印的下一个数据,应该就是那张图片出问题了

@liu-ai-z
Copy link

好的我试一下

@liu-ai-z
Copy link

我试了以下发现是使用use_mosaic出现的情况,在不适用use_mosaic是没有任何问题的

@yatengLG
Copy link
Owner

@liu-ai-z 你可以发你的联系方式到 yatenglg,如果你需要的话,我可以线上远程帮你看看

@liu-ai-z
Copy link

联系方式已经发送到了您的邮箱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants