Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

understand model output #5304

Closed
Kieran31 opened this issue Oct 22, 2021 · 47 comments
Closed

understand model output #5304

Kieran31 opened this issue Oct 22, 2021 · 47 comments
Labels
question Further information is requested Stale

Comments

@Kieran31
Copy link

Kieran31 commented Oct 22, 2021

❔Question

Hi team,
I trained the model on 512x512 images. Now I want to do detection on a huge image, for example 5000x5000. So I chopped the huge image to 512x512 images with a tiler and created a dataloader with batch size = 8.

Say my input_batch is of shape [8, 3, 512, 512]

model = torch.hub.load('', 'custom', path=weights_path, source='local')
model.eval()
output = model(input_batch)

Now I have difficulty understanding the model output. Can someone help me interpret these?

  • output is a tuple of length 2.
    • output[0] is a tensor of size [8, 16128, 6].
    • output[1] is a list of length 3.
      • output[1][0] is a tensor of size [8, 3, 64, 64, 6]
      • output[1][1] is a tensor of size [8, 3, 32, 32, 6]
      • output[1][2] is a tensor of size [8, 3, 16, 16, 6]

Additional context

I didn't find a tool for integrating multiple image detection results. If this repo does have one, please tell me.
Thanks very much.

@Kieran31 Kieran31 added the question Further information is requested label Oct 22, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Oct 22, 2021

👋 Hello @Kieran31, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 22, 2021

@Kieran31 see PyTorch Hub tutorial for full inference examples on trained custom models.

Simple Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the lightest and fastest YOLOv5 model. For details on all available models please see the README.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
img = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(img)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

YOLOv5 Tutorials

@Kieran31
Copy link
Author

@glenn-jocher thanks.

what does size do here? does it chop a height=640, width=1280, RGB image to two 640x640 images?

yolov5/models/common.py

Lines 243 to 252 in 30e4c4f

def forward(self, imgs, size=640, augment=False, profile=False):
# Inference from various sources. For height=640, width=1280, RGB images example inputs are:
# filename: imgs = 'data/images/zidane.jpg'
# URI: = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/zidane.jpg'
# OpenCV: = cv2.imread('image.jpg')[:,:,::-1] # HWC BGR to RGB x(640,1280,3)
# PIL: = Image.open('image.jpg') # HWC x(640,1280,3)
# numpy: = np.zeros((640,1280,3)) # HWC
# torch: = torch.zeros(16,3,320,640) # BCHW (scaled to size=640, 0-1 values)
# multiple: = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...] # list of images

@jbattab
Copy link

jbattab commented Oct 23, 2021

I have a question with the same concept.
I'm trying to convert to torchscript , using:
python export.py --weights yolov5x.pt --img 640 --batch 1 --include torchscript

and I printed the output of (line 298 export.pt):
for _ in range(2): y = model(im) print(y[0].shape)
I'm getting [1,25556,85]-> Now why is it 85?! it suppose to be 6 is it?
When using:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
I'm getting 6 in dim=-1
Thanks!!

@glenn-jocher
Copy link
Member

@Kieran31 size defines inference size (long side). Resizing and padding is handled by letterbox() function.

@jbattab COCO models have 80 classes + 4 box + 1 objectness outputs at each anchor, and there are 25k anchors per image in your example.

@glenn-jocher
Copy link
Member

@jbattab you might want to start at the beginning and read the YOLO papers, which explain everything well:
https://pjreddie.com/publications/

@Kieran31
Copy link
Author

@glenn-jocher thank u. I figured out the letterbox() function scales down the image with a ratio so that it can be fed into the model. But will this cause the original objects being too small to be detected?

Also, line 250 indicates the input image can be a tensor.

yolov5/models/common.py

Lines 243 to 252 in 30e4c4f

def forward(self, imgs, size=640, augment=False, profile=False):
# Inference from various sources. For height=640, width=1280, RGB images example inputs are:
# filename: imgs = 'data/images/zidane.jpg'
# URI: = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/zidane.jpg'
# OpenCV: = cv2.imread('image.jpg')[:,:,::-1] # HWC BGR to RGB x(640,1280,3)
# PIL: = Image.open('image.jpg') # HWC x(640,1280,3)
# numpy: = np.zeros((640,1280,3)) # HWC
# torch: = torch.zeros(16,3,320,640) # BCHW (scaled to size=640, 0-1 values)
# multiple: = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...] # list of images

However, the output of output = model(torch.zeros(8,3,512,512)) is not a models.common.Detections but a tuple. this is what I raised in my first question. Could you please explain it?

  • output is a tuple of length 2.
    • output[0] is a tensor of size [8, 16128, 6].
    • output[1] is a list of length 3.
      • output[1][0] is a tensor of size [8, 3, 64, 64, 6]
      • output[1][1] is a tensor of size [8, 3, 32, 32, 6]
      • output[1][2] is a tensor of size [8, 3, 16, 16, 6]

@glenn-jocher
Copy link
Member

@Kieran31 all pytorch models input and output torch tensors. YOLOv5 PyTorch Hub models are AutoShape() classes that wrap a pytorch model and handle inputs and outputs.

It's up to you to determine an appropriate --img-size suitable for your deployment requirements.

@jbattab
Copy link

jbattab commented Oct 24, 2021

Let me ask it in a different way
the output of (line 298 export.pt):
for _ in range(2): y = model(im) print(y[0].shape)
is just a tuple, how can I make it 'models.common.Detections ' type?

@Kieran31
Copy link
Author

Kieran31 commented Oct 24, 2021

@glenn-jocher sorry maybe I'm not clear enough.
My question is: when the input is an image file, like data/images/zidane.jpg, the output is models.common.Detections. But when the input is a tensor, like torch.zeros(8,3,512,512), the output is a tuple instead of models.common.Detections.

I tried all input types. filename, URI, OpenCV, PIL, np, multiple give a models.common.Detections except for tensor returning a tuple. Is this a bug?

yolov5/models/common.py

Lines 243 to 252 in 30e4c4f

def forward(self, imgs, size=640, augment=False, profile=False):
# Inference from various sources. For height=640, width=1280, RGB images example inputs are:
# filename: imgs = 'data/images/zidane.jpg'
# URI: = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/zidane.jpg'
# OpenCV: = cv2.imread('image.jpg')[:,:,::-1] # HWC BGR to RGB x(640,1280,3)
# PIL: = Image.open('image.jpg') # HWC x(640,1280,3)
# numpy: = np.zeros((640,1280,3)) # HWC
# torch: = torch.zeros(16,3,320,640) # BCHW (scaled to size=640, 0-1 values)
# multiple: = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...] # list of images

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 24, 2021

@Kieran31 yes this is the default behavior. This allows AutoShape models to be used in val.py and detect.py type workflows where more traditional pytorch dataloaders are used that already preprocess the inputs (letterboxing, resizing, etc.)

@jbattab see PyTorch Hub tutorial:

YOLOv5 Tutorials

@Kieran31
Copy link
Author

Kieran31 commented Oct 24, 2021

@glenn-jocher Thanks for your explanation.
Say my input is a tensor, but I still want to get a models.common.Detections so that I can do results.xyxy[0]. Without converting tensor to numpy, while still keeping the shape BCHW, what can I do on the returned tuple on GPU?
Looks like only hiding these 3 lines doesn't work.

yolov5/models/common.py

Lines 255 to 257 in 30e4c4f

if isinstance(imgs, torch.Tensor): # torch
with amp.autocast(enabled=p.device.type != 'cpu'):
return self.model(imgs.to(p.device).type_as(p), augment, profile) # inference

I don't find the solution in the PyTorch Hub tutorial. If there is one, I appreciate you pointing it to me.

@glenn-jocher
Copy link
Member

@Kieran31 torch inputs create torch outputs because in a traditional torch workflow the dataloader has already padded and collated all images into a batch, and the batch itself does not supply sufficient information to invert these letterboxing operations.

Basically you would be attempting to run postprocessing without running preprocessing, which is impossible because postprocessing depends on info generated by preprocessing.

@Kieran31
Copy link
Author

Kieran31 commented Oct 25, 2021

@glenn-jocher
I don't quite understand what you mean. do you mean if I use the create_dataloader in 'utils.datasets' rather than a traditional dataloader, then I can invert?
Because I found

yolov5/val.py

Lines 173 to 185 in a4fece8

# Run model
out, train_out = model(img, augment=augment) # inference and training outputs
dt[1] += time_sync() - t2
# Compute loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1] # box, obj, cls
# Run NMS
targets[:, 2:] *= torch.Tensor([width, height, width, height]).to(device) # to pixels
lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else [] # for autolabelling
t3 = time_sync()
out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)

yolov5/detect.py

Lines 149 to 151 in a4fece8

if pt:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
pred = model(img, augment=augment, visualize=visualize)[0]

yolov5/detect.py

Lines 182 to 183 in a4fece8

# NMS
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

As shown in Line 174 val.py and Line 151 detect.py, the model output of a torch tensor is a tuple. output[0] is for NMS, output[1] is for loss calculation. So if I want to restore the predicted xywh, I just need to pass the whole output[0] to non_max_suppression?

Also, output[0] is a tensor of shape [8, 16128, 6]. The first dimension 8 is the batch size. The third dimension I've figured out is [x, y, x, y, confidence, class]. What is the second dimension? Is it how many objects detected before NMS?

@minhtcai
Copy link

Any update on this?
I was trying to compile YOLOv5 with Neuron (aws-neuron/aws-neuron-sdk#253) but the compiled model returns output similar to you. I try to turn the model do an AutoShape object (

yolov5/models/common.py

Lines 243 to 252 in 30e4c4f

def forward(self, imgs, size=640, augment=False, profile=False):
# Inference from various sources. For height=640, width=1280, RGB images example inputs are:
# filename: imgs = 'data/images/zidane.jpg'
# URI: = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/zidane.jpg'
# OpenCV: = cv2.imread('image.jpg')[:,:,::-1] # HWC BGR to RGB x(640,1280,3)
# PIL: = Image.open('image.jpg') # HWC x(640,1280,3)
# numpy: = np.zeros((640,1280,3)) # HWC
# torch: = torch.zeros(16,3,320,640) # BCHW (scaled to size=640, 0-1 values)
# multiple: = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...] # list of images
) but still get the same output. Is there any way to process this output, or any way to use AutoShape for the neuron compiled model?

output is a tuple of length 2.
output[0] is a tensor of size [8, 16128, 6].
output[1] is a list of length 3.
output[1][0] is a tensor of size [8, 3, 64, 64, 6]
output[1][1] is a tensor of size [8, 3, 32, 32, 6]
output[1][2] is a tensor of size [8, 3, 16, 16, 6]

@glenn-jocher
Copy link
Member

@minhtcai in general we don't apply AutoShape to any export format. We worked with the AWS Inferentia team to ensure YOLOv5 compatibility in #2953 but I haven't actually used it myself so I can't provide much info here.

@Kieran31
Copy link
Author

@minhtcai
I don't know about Neuron but what I did for the output is

  1. take the output[0]
  2. pass it to untils.general. non_max_suppression. then I get a list of tensors (list length = batch size, in my example, it's 8). Each tensor is the detections of shape (n, 6) on each image. n is the number of detected objects.

output[1] is for loss calculation.

yolov5/val.py

Lines 177 to 183 in 8df64a9

# Inference
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
dt[1] += time_sync() - t2
# Loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1] # box, obj, cls

@github-actions
Copy link
Contributor

github-actions bot commented Dec 19, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@hamedmh
Copy link

hamedmh commented Dec 29, 2021

Hi,
I hope that it is not too late to ask a further question about this topic.
Which function in the repository converts the output tensors: torch.Size([1, 3, 48, 80, 85]) , torch.Size([1, 3, 24, 40, 85]) , torch.Size([1, 3, 12, 20, 85]) , to: torch.Size([1, 15120, 85])

@Kieran31
Copy link
Author

@hamedmh
not sure I follow.
as far as I know. these are two parallel things.

yolov5/val.py

Lines 177 to 183 in 8df64a9

# Inference
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
dt[1] += time_sync() - t2
# Loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1] # box, obj, cls

out, train_out = model(im)

out is torch.Size([1, 15120, 85])
train_out is a list of 3 torches of size torch.Size([1, 3, 48, 80, 85]) , torch.Size([1, 3, 24, 40, 85]) , torch.Size([1, 3, 12, 20, 85])

@hamedmh
Copy link

hamedmh commented Dec 29, 2021

@Kieran31 Thank you for the answer.
I noticed that the total number of elements of the three train_out tensors is the same as out.
Are they representing the same information of the same bounding boxes?
How or for what purpose do we use the three train_out tensors?

@Kieran31
Copy link
Author

@hamedmh
for your second question, what I only know is that train_out is used for computing loss. see the quotes below

yolov5/val.py

Lines 177 to 183 in 8df64a9

# Inference
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
dt[1] += time_sync() - t2
# Loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1] # box, obj, cls

for the first one, I don't know. I'm not a Ultralytics member, and this issue has been closed, so I'm not sure if the ultralytics team can received your questions here. My suggestion would be to open a new issue or/and reading the yolov5 paper.

@hamedmh
Copy link

hamedmh commented Dec 30, 2021

@Kieran31 Thank you for the explanation.

@mladen-korunoski
Copy link

What about an exported model?

I finetuned yolov5s and exported it for mobile (torchscript). How to use the model on iOS device if I don't have access to all the utility methods for image preprocessing?

@zhiqwang
Copy link
Contributor

zhiqwang commented Jan 24, 2022

Hi @mladen-korunoski ,

Actually the main ops used in the pre-processing is interpolation and pad, and torch provided these two ops, so I guess you can just use torchscript to implement the pre-processing, check the following as an example.

https://github.com/zhiqwang/yolov5-rt-stack/blob/d2db932/yolort/models/transform.py#L255-L307

@josebenitezg
Copy link

Hi!

I was able to convert the model from yolov5 to neuron with the follow code:

import torch
import torch_neuron
from torchvision import models

model = torch.hub.load('yolo5',
        'custom',
        path='yolov5.pt',
        source='local',
        force_reload=True)  # local repo

fake_image = torch.zeros([1, 3, 640, 640], dtype=torch.float32)
#fake_image = (torch.rand(3), torch.rand(3))
try:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])

model_neuron = torch.neuron.trace(model, 
                                example_inputs=[fake_image])

## Export to saved model
model_neuron.save("model_converted.pt")

Now that I am trying to test and compare I have the tensors outputs different from yolo as follow:

Neuron Yolov5 Model:

[tensor([[-0.0356,  0.1790,  0.7456,  0.6292,  0.9359, 13.0000],
        [ 0.5830,  0.1404,  1.1279,  0.6628,  0.9359, 13.0000],
        [ 0.0823,  0.6350,  0.6272,  1.1599,  0.9315, 13.0000],
        [-0.1443,  0.1416,  0.2542,  0.5107,  0.9224, 13.0000],
        [ 0.3516,  0.6426,  0.7500,  1.0137,  0.9188, 13.0000],
        [ 0.3555,  0.1436,  0.7539,  0.5127,  0.9147, 13.0000]])]

Yolov5 (this one):

[tensor([[334.57495, 176.98302, 407.46155, 213.81169,   0.93721,  13.00000]])]
Inference script:
im = cv2.imread('test_img.jpg')
img0 = im.copy()
im = cv2.resize(im, (640, 640), interpolation = cv2.INTER_AREA)
# Convert
im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)
# Convert into torch
im = torch.from_numpy(im)
im = im.float()  # uint8 to fp16/32
im /= 255  # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim

# Load the compiled model
model = torch.jit.load('model_converted.pt')

# Inference
pred = model(im)
pred = non_max_suppression(pred) #nms function used same as yolov5 detect.py

#Process predictions
for i, det in enumerate(pred):  # per image
    im0 = img0.copy()
    color=(30, 30, 30)
    txt_color=(255, 255, 255)
    h_size, w_size = im.shape[-2:]
    print(h_size, w_size)
    lw = max(round(sum(im.shape) / 2 * 0.003), 2) 

    if len(det):
        # Write results
        for *xyxy, conf, cls in reversed(det):
            c = int(cls)  # integer class
            label = f'{CLASSES[c]} {conf:.2f}'
            print(label)
            box = xyxy 
            p1, p2 = (int(box[0]* w_size), int(box[1]* h_size)), (int(box[2]* w_size), int(box[3]* h_size))
            cv2.rectangle(im0, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA)
            tf = max(lw - 1, 1)  # font thickness
            w, h = cv2.getTextSize(label, 0, fontScale=lw / 3, thickness=tf)[0]  # text width, height
            outside = p1[1] - h - 3 >= 0  # label fits outside box
            p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
            cv2.rectangle(im0, p1, p2, color, -1, cv2.LINE_AA)  # filled
            cv2.putText(im0,
                        label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                        0,
                        lw / 3,
                        txt_color,
                        thickness=tf,
                        lineType=cv2.LINE_AA)
    # Save results (image with detections)
    status = cv2.imwrite('out.jpg', im0)

Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.

I follow @jluntamazon pull but I not able to see difference. # #2953

@srn-source
Copy link

@hamedmh for your second question, what I only know is that train_out is used for computing loss. see the quotes below

yolov5/val.py

Lines 177 to 183 in 8df64a9

# Inference
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
dt[1] += time_sync() - t2
# Loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1] # box, obj, cls

for the first one, I don't know. I'm not a Ultralytics member, and this issue has been closed, so I'm not sure if the ultralytics team can received your questions here. My suggestion would be to open a new issue or/and reading the yolov5 paper.

@Kieran31 Hi, It might be late for question but I have some to ask you. I am new in object detection. I'm still confused with computing loss from train_out.
I recognized that model can predict many boxes of objects before pass to NMS. and train_out is not passed to NMS yet. how model know which boxes is compared with Target Box??

@oms12
Copy link

oms12 commented Oct 20, 2022

@Kieran31 size defines inference size (long side). Resizing and padding is handled by letterbox() function.

@jbattab COCO models have 80 classes + 4 box + 1 objectness outputs at each anchor, and there are 25k anchors per image in your example.

Can you please explain how can I get the class of the objects present in each anchor?

@doppelvincent
Copy link

Hey team, I have a question for the output shape of my model. After the training process for expiry date detection with yolov5, I got an output like this:

index | xmin | ymin | xmax | ymax | confidence | class | name | path
0 | 351.337006 | 470.231140 | 435.794891 | 484.624939 | 0.527743 | 1.0 | exp-date |
0 | 138.336823 | 383.291962 | 233.642303 | 407.610565 | 0.511508 | 1.0 | exp-date |  

And I converted the .pt file for usage in CoreML. And it say that the YOLOv5 gives an output shape (1, 25200, 8). The input image size is 640x640. How could I understand the output? If I print the first 8 elements of the output array, it shows me like this:
Float32 1 × 25200 × 8 array (prediction)
8.929688 (prediction[0])
7.6875 (prediction[1])
16.625 (prediction[2])
18.79688 (prediction[3])
0 (prediction[4])
0.01025391 (prediction[5])
0.9780273 (prediction[6])
0.01074219 (prediction[7])

Could someone give an explanation? Thanks

@glenn-jocher
Copy link
Member

@doppelvincent hi there! The model's output tensor shape is [1, 25200, 8], representing the predicted bounding boxes and their attributes. It contains 25200 entries that correspond to bounding box predictions. Each prediction is composed of 8 values: [x_center, y_center, width, height, objectness, class_0_confidence, class_1_confidence, class_2_confidence]. In your output, the values seem to be in the correct order and format.

These values represent the predicted bounding box attributes, such as its center coordinates, width, height, objectness score, and class confidences. You can extract and interpret these values for each bounding box to understand the model's predictions.

If you need further assistance in interpreting the output or in integrating it into CoreML, feel free to ask. Good luck with your expiry date detection project!

@dengxiongshi
Copy link

@glenn-jocher hello! When I use the yolov5-7.0 before add the --train in export.py, the first export onnx is: python export.py --train --simplify, I get three outputs:
image
the second export onnx is: python export.py --simplify, I get one output:
image

And now, I use the onnx file to test the accuracy by val.py. The fisrt onnx get error:
(NN) D:\python_work\yolov5>python val.py --device 0 --name train_mode --dnn

val: data=E:\downloads\compress\datasets\train_data\train_data.yaml, weights=runs\train\WI_PRW_SSW_SSM_20231127\weights\best_train.onnx, batch_size=16, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=0, workers=0, single_cls=False, augment=False, verbose=False, save_txt=False, save
_hybrid=False, save_conf=False, save_json=False, project=runs\val, name=train_mode, exist_ok=False, half=False, dnn=True
YOLOv5  v7.0-240-g84ec8b5 Python-3.8.18 torch-1.9.1+cu111 CUDA:0 (GeForce RTX 2060, 6144MiB)

Loading runs\train\WI_PRW_SSW_SSM_20231127\weights\best_train.onnx for ONNX OpenCV DNN inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning E:\downloads\compress\datasets\train_data\labels\val.cache... 2575 images, 0 backgrounds, 0 corrupt: 100%|██████████| 2575/2575 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95:   0%|          | 1/2575 [00:00<21:52,  1.96it/s]Exception in thread Thread-3:
Traceback (most recent call last):
    self._target(*self._args, **self._kwargs)
  File "D:\python_work\yolov5\utils\plots.py", line 175, in plot_images
    annotator.box_label(box, label, color=color)
  File "D:\Anaconda3\envs\NN\lib\site-packages\ultralytics\utils\plotting.py", line 108, in box_label
    self.draw.rectangle(box, width=self.lw, outline=color)  # box
  File "D:\Anaconda3\envs\NN\lib\site-packages\PIL\ImageDraw.py", line 294, in rectangle
    self.draw.draw_rectangle(xy, ink, 0, width)
ValueError: x1 must be greater than or equal to x0
                 Class     Images  Instances          P          R      mAP50   mAP50-95:   0%|          | 3/2575 [00:01<18:57,  2.26it/s]Exception in thread Thread-7:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\NN\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "D:\Anaconda3\envs\NN\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "D:\python_work\yolov5\utils\plots.py", line 175, in plot_images
    annotator.box_label(box, label, color=color)
  File "D:\Anaconda3\envs\NN\lib\site-packages\ultralytics\utils\plotting.py", line 108, in box_label
    self.draw.rectangle(box, width=self.lw, outline=color)  # box
  File "D:\Anaconda3\envs\NN\lib\site-packages\PIL\ImageDraw.py", line 294, in rectangle
    **self.draw.draw_rectangle(xy, ink, 0, width)
ValueError: x1 must be greater than or equal to x0**
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 2575/2575 [12:33<00:00,  3.42it/s]
                   all       2575      30443          0          0          0          0
Speed: 0.4ms pre-process, 272.5ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\val\train_mode

the sencond onnx can get success:
`val: data=E:\downloads\compress\datasets\train_data\train_data.yaml, weights=runs\train\WI_PRW_SSW_SSM_20231127\weights\best.onnx, batch_size=16, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=0, workers=0, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybri
d=False, save_conf=False, save_json=False, project=runs\val, name=train_mode, exist_ok=False, half=False, dnn=False
YOLOv5 v7.0-240-g84ec8b5 Python-3.8.18 torch-1.9.1+cu111 CUDA:0 (GeForce RTX 2060, 6144MiB)

Loading runs\train\WI_PRW_SSW_SSM_20231127\weights\best.onnx for ONNX Runtime inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning E:\downloads\compress\datasets\train_data\labels\val.cache... 2575 images, 0 backgrounds, 0 corrupt: 100%|██████████| 2575/2575 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 2575/2575 [01:30<00:00, 28.57it/s]
all 2575 30443 0.807 0.719 0.771 0.51
face 2575 6954 0.835 0.687 0.743 0.352
person 2575 19192 0.814 0.769 0.795 0.471
car 2575 4012 0.868 0.833 0.888 0.671
bus 2575 187 0.799 0.791 0.835 0.616
truck 2575 98 0.717 0.517 0.597 0.439
Speed: 0.4ms pre-process, 12.3ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\val\train_mode2
`
How to solve it?

@dengxiongshi
Copy link

dengxiongshi commented Dec 5, 2023

@glenn-jocher I also get the same question when use the yolov5-6.2.
And how can get one output by reshape and concat from three outputs in the first export onnx, it is the pt file:
best.zip

@glenn-jocher
Copy link
Member

@dengxiongshi Thanks for reaching out. It looks like you're encountering issues with using the ONNX model for validation after the export process. To best troubleshoot this, I recommend the following steps:

  1. Verify the correct export procedure: Ensure that the ONNX export process is performed correctly with the necessary flags, including --train when applicable, and that the environment and dependencies are set up properly.

  2. Model consistency: Check that the versions and configurations of the YOLOv5 code, export script, and third-party libraries used for conversion are consistent and compatible.

  3. Inference environment: Confirm that the ONNX runtime and related dependencies, used during validation, are properly set up and have compatibility with the exported model.

Regarding your query about reshaping and concatenating the three outputs from the first export into a single output, the process may involve reshaping the outputs to ensure compatibility and then concatenating them along the appropriate dimension.

If the issue persists, I recommend posting your detailed question on the YOLOv5 GitHub repository: https://github.com/ultralytics/yolov5. The community and the Ultralytics team will be better equipped to assist with debugging and resolving the issues you're facing.

Let me know if I can help you further with any of these steps!

@dengxiongshi
Copy link

Hi @glenn-jocher. For the three steps, I have checked the corresponding environment and dependencies and there is no problem. I submitted an issue in here.

@glenn-jocher
Copy link
Member

@dengxiongshi great to hear that you've checked the environment and dependencies thoroughly. I see that you've also raised an issue on the YOLOv5 GitHub repository. Our team will assist you there to address the ONNX export and validation concerns effectively. Feel free to reach out if you have any further queries or need additional assistance. Good luck with resolving the issue!

@mandal4
Copy link

mandal4 commented Jan 15, 2024

I have a question with the same concept. I'm trying to convert to torchscript , using: python export.py --weights yolov5x.pt --img 640 --batch 1 --include torchscript

and I printed the output of (line 298 export.pt): for _ in range(2): y = model(im) print(y[0].shape) I'm getting [1,25556,85]-> Now why is it 85?! it suppose to be 6 is it? When using: model = torch.hub.load('ultralytics/yolov5', 'yolov5s') I'm getting 6 in dim=-1 Thanks!!

I find out that 85 meaning self.reg_max*4 + self.nc ! it is not from MSCOCO.
https://github.com/ultralytics/ultralytics/blob/2f11ab5e6f26885640e9ff6b9ebec165c3bf82b3/ultralytics/utils/loss.py#L197
In my case, i set 21 class for my custom dataset.

I wonder if i understood correctly. @glenn-jocher

@glenn-jocher
Copy link
Member

@mandal4 hello! It looks like you've figured out the output dimensions correctly. The 85 in the output tensor [1, 25556, 85] corresponds to the number of classes plus the bounding box coordinates and the objectness score for each prediction. In YOLOv5, the output tensor typically has the shape [batch_size, number_of_anchors, 4 + 1 + number_of_classes], where:

  • 4 represents the bounding box coordinates (x, y, width, height),
  • 1 represents the objectness score, and
  • number_of_classes is the number of classes the model is trained to detect.

In your case, with 21 classes, the output would be 4 (bbox) + 1 (objectness) + 21 (classes) = 26. However, you're seeing 85 because self.reg_max * 4 + self.nc indicates that there might be additional logic applied to the bounding box coordinates, possibly related to anchor scaling or other model-specific details.

If you have any further questions or need clarification, feel free to ask. Good job on diving into the code to understand the model's output! 👍

@Kegard
Copy link

Kegard commented Jan 16, 2024

@glenn-jocher hello! I agree with your answer about the output tensor [1, 25556,85], but I still have some question. As you said, the last 80 is the probability of every classes. But I find that sum([ 5: ]) !=1 . I wonder why this happen? how do you get the logits of these, Softmax() or others?

@glenn-jocher
Copy link
Member

@Kegard hello again! The values you're seeing in the output tensor are raw logits, not probabilities. They do not sum to 1 because they have not been passed through a softmax function. In YOLOv5, during inference, these logits are typically passed through a sigmoid function to convert them to objectness scores and class confidences, which are separate from each other.

The objectness score indicates the likelihood that the bounding box contains any object, while the class confidences represent the likelihood of each class being present in the bounding box. These confidences are not mutually exclusive and are not meant to sum to 1 across all classes. Instead, each class confidence is independent and represents the model's confidence that a particular class is detected within the bounding box.

If you want to convert the raw logits to probabilities that sum to 1 for the class predictions, you would apply a softmax function to the class logits. However, this is not the standard practice for YOLO models, as they treat object detection as a multi-label classification problem, where each bounding box can potentially belong to multiple classes with independent probabilities.

I hope this clarifies your question! If you need further assistance, feel free to ask.

@AlejandroDiazD
Copy link

Hi @glenn-jocher and everyone,

I'm trying to deal with an exported yolov5n.tflite and inference servers. The output I receive from processing an image has a shape of [1, 25200, 85]. This is a sample of the output:
[[[ 2 3 7 ... 0 0 0]
[ 2 3 7 ... 0 0 0]
[ 2 3 7 ... 0 0 0]
...
[230 232 19 ... 2 0 0]
[228 233 26 ... 2 0 0]
[228 232 47 ... 2 0 1]]]

About the dimensions, I already understood that:

  • 1: is due to that only 1 image has been processed
  • 25200: is the number of anchors
  • 85: is [x, y, w, h, objectness, 80 coco classes]

I am reading all the post related to this topic, but I'm still not being able to manage that output to convert it into an output which I can use and understand, similar to the output obtained when just using yolov8 through the ultralytics module (This may be due to my lack of knowledge since I am just beginning on this topics).

As I say, I'm still reading all the previous information related to that, but any help about what steps should I follow would be appreciated.

Thank you in advance!

@glenn-jocher
Copy link
Member

Hi there! 👋

It sounds like you're on the right track with understanding the output of your YOLOv5n.tflite model. To make sense of these outputs and convert them into a more usable form (bounding boxes, class IDs, and scores), you'll typically need to apply some post-processing steps. Here’s a brief overview:

  1. Apply a sigmoid function to the objectness scores and class predictions to convert logits to probabilities.

  2. Filter out predictions with objectness scores below a certain threshold to reduce the number of detections, as many will be low confidence.

  3. Apply Non-Max Suppression (NMS): Since your model may predict multiple overlapping boxes for a single object, NMS helps in selecting the most probable bounding box while discarding the rest.

In pseudo-code, your process might look something like this:

# Assuming outputs is your model output with shape [1, 25200, 85]

# Sigmoid the objectness score and class predictions
outputs[..., :4] = torch.sigmoid(outputs[..., :4])  # Adjust bounding boxes
outputs[..., 4:] = torch.sigmoid(outputs[..., 4:])  # Objectness and class preds

# Apply a threshold to filter out low-confidence predictions
conf_threshold = 0.25
mask = outputs[..., 4] > conf_threshold
outputs = outputs[mask]

# Apply NMS
nms_threshold = 0.45
boxes, scores, classes = nms(outputs, nms_threshold)

# boxes, scores, and classes are your final, usable outputs

Keep in mind, the nms function and thresholds used here are just placeholders. You'll need to adapt this pseudo-code to fit your exact needs and also implement or use an existing NMS function suitable for your framework (TensorFlow, PyTorch, etc.).

This process should help you glean more actionable insights from your model's predictions. Keep experimenting and studying; you're doing great so far!

Feel free to ask if you have more questions. Happy coding!

@AlejandroDiazD
Copy link

Hi @glenn-jocher ,

Thank you so much for your response. I've trying to apply the steps you proposed, but I'm still obtaining an output with no sense. After applying the Sigmoid, most of the objectness values are almost 1.0, what makes no sense. I'm thinking maybe it is related to the fact that the output of the net is quantized, so I probably should dequantize it before (I'm working with an inference server running on an embedded arm64 system).

Does it makes sense? Do you know how could I face the problem to dequantize the output of the net previous to applying the sigmoid?

Thank you!

@glenn-jocher
Copy link
Member

Hi there!

Yes, it absolutely makes sense that if you're working with a quantized model, the outputs could be in a quantized format. Before applying sigmoid functions or any further processing, you would indeed need to dequantize these outputs to floating-point values, which can significantly affect your post-processing steps.

The approach to dequantize depends on the framework you're using. Generally, if the model was quantized using TensorFlow Lite, the .tflite model file often contains the scale and zero-point for each tensor, which you can use to convert the quantized values back to floating-point numbers.

Here's a simplified example in Python for dequantization:

def dequantize(quantized_value, scale, zero_point):
    # Convert quantized value to a floating point
    return scale * (quantized_value - zero_point)

You'd need to apply this function to your model outputs using the appropriate scale and zero_point values for each output tensor before proceeding with sigmoid or other post-processing steps.

Keep in mind, the details may vary depending on your precise setup and framework. If you're using a different environment or library, they might provide built-in methods to handle dequantization more seamlessly.

Hope this helps you move forward! Let me know if you have any more questions. Happy coding! 😊

@madasuvenky
Copy link

When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.

2 similar comments
@madasuvenky
Copy link

When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.

@madasuvenky
Copy link

When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy.

@glenn-jocher
Copy link
Member

Hello @madasuvenky,

Thank you for reaching out and providing details about your issue. It sounds like you're experiencing a significant drop in accuracy when converting your YOLOv8 model weights to int16. This is indeed unusual and suggests there might be an issue with the quantization process.

To help us investigate further, could you please provide a minimum reproducible code example? This will allow us to better understand the steps you're taking and identify any potential issues. You can find guidelines on creating a minimum reproducible example here. Ensuring we can reproduce the bug is crucial for us to provide an effective solution.

Additionally, please make sure you are using the latest versions of torch and the YOLOv5 repository. Sometimes, updates can resolve unexpected issues.

Quantization can be tricky, especially when dealing with different data types. If you haven't already, you might want to check the scale and zero-point values used during the quantization process, as incorrect values can lead to significant accuracy drops.

Here's a brief example of how you might dequantize your model outputs if you're using TensorFlow Lite:

def dequantize(quantized_value, scale, zero_point):
    return scale * (quantized_value - zero_point)

# Example usage
quantized_output = ...  # Your quantized model output
scale = ...  # Scale factor from your model
zero_point = ...  # Zero point from your model

dequantized_output = dequantize(quantized_output, scale, zero_point)

Feel free to share more details or any specific error messages you're encountering. We're here to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests