Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the tensor with yolov8.onnx #751

Closed
1 task done
jayer95 opened this issue Feb 1, 2023 · 43 comments
Closed
1 task done

About the tensor with yolov8.onnx #751

jayer95 opened this issue Feb 1, 2023 · 43 comments
Labels
question Further information is requested

Comments

@jayer95
Copy link

jayer95 commented Feb 1, 2023

Search before asking

Question

Hi @glenn-jocher,

I would like to ask you a question, why does yolov8's export.py specify onnx as [1x84x8400]?

208404

The difference from yolov5.onnx is that yolov8.onnx deletes 1 object score (85-1=84), and the output tensor of yolov8.onnx is [1, 84, 8400]: it means that there are 8400 detection results, each detection result There are 84 values. 84 mean: 4 coordinate value box(x,y,w,h) + 80 class scores of each class index

But the tensor of yolov5.onnx is [1x25200x85], and 85 is placed at the end.

208406

Can the tensor of yolov8.onnx be changed to [1x8400x84] the same as yolov5.onnx?

Additional

No response

@jayer95 jayer95 added the question Further information is requested label Feb 1, 2023
@triple-Mu
Copy link
Contributor

84 means bbox xywh + 80 scores.
Yolov8 do not need conf.

@jayer95
Copy link
Author

jayer95 commented Feb 1, 2023

@triple-Mu

I know, what I mean is to change the output tensor order to be the same as yolov5.

I can modify export.py by myself to output transpose[0, 2, 1], but I hope to correct the source code, or let me know why yolov8 changed the order.

@JustasBart
Copy link
Contributor

@jayer95 I totally agree that this would make it more compatible, although I do have to say that if you were to look at the CPP code that I've posted here the one advantage to having to transpose it is that it allows you to check (Based on which value at a particular index is higher [84 vs 8400]) for that data+4 versus data+5 variable (Needed for the class ID [yolov5 is +5 yolov8 is +4]), but it also likely makes it less efficient then overall.

So I can see it both ways personally.

Good luck! 🚀

@jayer95
Copy link
Author

jayer95 commented Feb 2, 2023

@JustasBart
Yes, yolov5 is so nice to use. We also refer to yolov5.onnx and develop C++ for parsing SNPE .dlc model.
When yolov8.pt export becomes yolov8.onnx, the order of output tensor is changed, which leads to the need to modify C++ code. Although +5 changed to +4, it had to be modified.
Thanks for sharing, I'm looking into it.

The following is my transpose "yolov8n_transposed.onnx"

https://drive.google.com/file/d/1c6rbBni0H-yofMdjsDK_uNIF10vb07Ax/view?usp=sharing

@JustasBart
Copy link
Contributor

@jayer95 Thinking about it I've just realised that this line: 'if (dimensions > rows)' could become 'if (dimensions % 2 == 0)' meaning that we could export as transposed and deal with it very easily for the +4/+5... It still means that we need extra C++ code then but at least it would save us that transpose move which would be good...

I'm all in for using the yolov5 export indexing for yolov8 🚀

@JustasBart
Copy link
Contributor

@jayer95 Running with your transposed model the following code becomes redundant:

bool yolov8 = false;
if (dimensions % 2 == 0)
{
    yolov8 = true;
        
    // rows = outputs[0].size[2];
    // dimensions = outputs[0].size[1];

    // outputs[0] = outputs[0].reshape(1, dimensions);
    // cv::transpose(outputs[0], outputs[0]);
}

And then it can be optimised to simply:

float *classes_scores = dimensions % 2 == 0 ? data+4 : data+5;

Which would actually be really nice! It would only require that single line of code to work for yolov8/yolov5 models 🚀

@JustasBart
Copy link
Contributor

Just to add to this as well, perhaps we could simply have a flag for this when exporting? That way no direct change would have to be made to the existing code but it would give the ability for users like me to easily choose to do this and then use it via C++, that would be quite good.

@jayer95
Copy link
Author

jayer95 commented Feb 2, 2023

@JustasBart

It's really good, so nice for me, let me give it a try, I have developed the C++ code of yolov5 and yolov8 separately, as two plugins on Qualcomm platform, because yolov8 is anchors free.

Since yolov8 is anchors free, I refer to yolox.onnx for development.
https://github.com/Megvii-BaseDetection/YOLOX

Perhaps we could simply have a flag for this when exporting?

I think your proposal is good, but I would like to ask the author why he changed the order, maybe it’s just that the order of export.py is reversed? I hope he changed it to the same as yolov5.onnx, because they are all the same framework. At present, yolov8 has not yet supported the demo of onnx.

What do you think?

@jayer95
Copy link
Author

jayer95 commented Feb 2, 2023

@JustasBart

I don't know how to verify whether my "yolov8n_transposed.onnx" is exported correctly, because currently yolov8 does not support onnx runtime demo haha. I'm about to try your C++.

@JustasBart
Copy link
Contributor

@jayer95 Naturally, the earlier the decision is made the better, I'm all in for yolov8 having the same export format as yolov5 (With the exception of the confidence +4/+5 change).

@JustasBart
Copy link
Contributor

JustasBart commented Feb 2, 2023

@jayer95 As mentioned earlier I'm able to run your provided model just fine:

image
image

(With letterbox mode disabled)
image

Where in order to do this I only had to modify a single line of code:

float *classes_scores = dimensions % 2 == 0 ? data+4 : data+5;

@jayer95
Copy link
Author

jayer95 commented Feb 2, 2023

@JustasBart
We can look forward to the author's response. I wonder if it is because the author wants to distinguish yolov5 from yolov8, so he changed the output order?

Excellent, I think you have developed every detail very well. The pre-processing letterbox is very important.

Have you used the custom model you trained for verification? I am curious about the ability of yolov8 to detect small objects.

@jayer95
Copy link
Author

jayer95 commented Feb 2, 2023

@JustasBart
May I ask why you exported yolov8 as rectangular (640x480)? There are pictures of various sizes in COCO Dataset, and I think rect training should not be used.

@JustasBart
Copy link
Contributor

JustasBart commented Feb 2, 2023

@jayer95 I just think that in the real world (I'm a Vision Engineer) you really mostly (~95%+) of the time use 1080p images that is 1920x1080.

And so now if you have a 1920x1080 image and even if you train your model as say 1920x1920 you'd end up with 840 rows of padding that you'll be discarding. In reality what I would do is to train my models usually at around 1280x704 and that way I need to scale-down my data but I keep most of the Aspect ratio of the original data that is:

image

As you can see the only squishing of the data occurs from the height of 1080 to 1056 which is a mere 24 rows of height data and then the rest can be used as is without any padding.

Otherwise if you assume that I have 1280x1280 I'd have to either squish my data from 1920x1080 to 1280x1280 which would ruin the horizontal resolution (Which is especially a problem for vertical thin objects) or I can resize it to 1280x720 and pad the remaining 560 rows of height data.

Either way there would be a significant cost of either padding or squishing. And worst of all there would be a training/inference penalty for the padded data not to mention the padding itself!

So therefore I always use the rect=True flag whenever I'm working with 1080p data and if it's any data that say is close enough to being square then I would consider my options to either go fully square it would it still make sense to make it somewhat rectangular.

You see the COCO dataset is based on object "In the wild" and furthermore it is meant to be optimal to work on as many off-the-shelve cameras/setups as possible, whereas in reality for someone like me I would work with a fixed resolution on a fixed-camera with a fixed setup (Including the lights/background etc...) so at that point there is no need to force my data to a rectangle or even a need to pad my data, I can just use a rectangular network instead which is the most optimum!

Therefore for the C++ example I've decided to do the somewhat more tricky rectangular input example by default and then made sure that square examples would work just fine 🚀

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart
I generally understand what you mean. If the input video is 1920x1080 (16:9), I will choose the padding method, set the model to 640x384, fill the minimum pixel and meet the multiple of 32.

If you choose the squishing method, set the model to 640x352, and the pre-processing before inputting 1920x1080 (16:9) into the model will squeeze the image. Although the degree of squeezing is not large, it will destroy the original ratio, which is different from yolov5/yolov8 The letter box algorithm used during training is different. The purpose of the letter box is to not destroy the aspect ratio of the image and maintain a multiple of 32. Therefore, I think padding should be used to set the model to 640x384 (if the default user Both input 16:9 images or videos), of course, this is not suitable, because the size of the mobile phone is a portrait, and mobile phones are now the mainstream. If you want to make a model yolov8 model, you should choose 640x640, because it is compatible with landscape (16:9) and portrait (9:16).

gnome-shell-screenshot-ULFOZ1

By the way, the initial setting of yolov5/yolov8 is on the mobile app, and the size is 320x192(hw, portrait) or 640x384(hw, portrait).

Regarding the example you mentioned, if the input video is 1920x1080 (16:9), and the size of the model is set to 1920x1920, 840 will be filled with gray, so we can delete 840, but 1080 is not divisible by 32, 1080 /32=33.75, 32x34=1088, according to the yolov5/yolov8 letterbox algorithm, the minimum height should be 1088, if you choose the squishing method, set the model to 1920x(1088-32), and the pre-processing before inputting 1920x1080 (16:9) into the model will squeeze the image.(squish), and the letterbox algorithm conflicts with each other.

Regarding my original question, why your C++ default input size is 640x480, the only reason I can think of is that you are compatible with 4:3 and 16:9, and your default users only input landscape videos or images.

Welcome to discuss.

@JustasBart
Copy link
Contributor

@jayer95 The 640x480 is completely trivial for the example itself, as in it's just a rectangular input, but not at a specific ratio or anything like that, just for testing/messing around in general.

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart Got it, you mentioned:

I just think that in the real world (I'm a Vision Engineer) you really mostly (~95%+) of the time use 1080p images that is 1920x1080.

I thought by your specific size you meant 1080p.
If it is for testing/messing, will 640x640 be more general?

https://github.com/ultralytics/yolov5/releases/tag/v7.0

gnome-shell-screenshot-GEHJZ1

@JustasBart
Copy link
Contributor

@jayer95 So for a normal actual project I would train 1080p (1920x1080) images with a Network of usually in around 1280x704, or 960x544 or 768x416, it just depends on how much resolution I feel like I need to expose to the model at the expense of training/inference speed etc...

In all cases there's going to be trade offs between squishing the data ever so slightly, but I wouldn't normally use padding for this setup just squish the small remainder.

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart Got it, very good idea, compared to padding (letterbox), slight squishing can make the model faster (a little).

I observed that your model size is relatively large: 1280x704, or 960x544 or 768x416. Will the model have a speed difference between padding (letterbox) and squishing?

If under the powerful GPU or cloud computing, I think the padding (letterbox) pre-processing algorithm is more advanced, because its goal is not to destroy the original image ratio, which is also the method used by the author of yolov5.

The product I am developing is an edge computing product, and I can refer to your proposed method to reduce 320x192 to 320x160, because edge computing will attach great importance to speed.

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart I also succeeded!!!
gnome-shell-screenshot-QGISZ1

@JustasBart
Copy link
Contributor

@jayer95 Looks good man! Well done 🚀

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart In order to achieve the same capability as yolov5, I still have a way to go.

@JustasBart
Copy link
Contributor

@jayer95 We need a functional CPP implementation for reading yolov8.onnx models, I'll be looking into that over the weekend, like I'm sure that in the end it'll simplify the whole thing by a lot etc... But it's just one of those things where it's easy once you know how but confusing if you don't...

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@AyushExel @glenn-jocher

Can the tensor of yolov8.onnx be changed to [1x8400x84] the same as yolov5.onnx?

@jayer95
Copy link
Author

jayer95 commented Feb 3, 2023

@JustasBart
This is very important, I think soon, the author will also provide a python yolov8.onnx runtime demo!

@JustasBart
Copy link
Contributor

@jayer95 I don't think that that's the issue. I think that the issue is that the 8400 of yolov8 needs to be fundamentally read-in and treated very differently in order to actually extract the class ids/confidences when compared to yolov5 that has 18900 outputs. This is a separate issue from the transpose from (1, 84, 8400) to (1, 8400, 84).

Ultimately this is a good thing because then the post-processing code is shorter and more efficient but someone needs to figure out what it would look like in CPP... As again, currently if you re-use the code it seems to only give any confidences for any detections that are class index 0, which is no good!

@JustasBart
Copy link
Contributor

@jayer95 Fixed my C++ code to now properly parse yolov8 data for all classes!

@dragan07-07
Copy link

@jayer95 @JustasBart how to export this "yolov8n_transposed.onnx" with opset 12?

@JustasBart
Copy link
Contributor

Hi @dragan07-07, it was @jayer95 that figured that bit out, but you can refer to my CPP code as to how to transpose the output from (1, 84, 8400) to (1, 8400, 84) and then run inference (As if it was a yolov5 model but with a slight change), otherwise you'll have to change the python code to export the model as that.

Good luck! 🚀

@dragan07-07
Copy link

@JustasBart Thanks, I was able to run v8 using your code. I needed an idea how to start with C# and I succeeded.

@JustasBart
Copy link
Contributor

Well done @dragan07-07! enjoy using yolov8 exported models! 🚀

@jayer95
Copy link
Author

jayer95 commented Feb 9, 2023

@JustasBart Thanks for your work, I'm currently running well.
Currently, I no longer use "yolov8n_transposed.onnx" of [1, 8400, 84], I choose to solve it in the code.

gnome-shell-screenshot-EZU9Z1

640/8=80
640/16=40
640/32=20

80^2+40^2+20^2=8400

yolov5 has anchors, so 8400*3=25200

@dragan07-07 If you want to convert "yolov8n_transposed.onnx" of opset 12, please refer to the link below:

https://drive.google.com/file/d/1c6rbBni0H-yofMdjsDK_uNIF10vb07Ax/view?usp=sharing

Change "opset: 11" in "default.yaml" to "opset: 12".

@JustasBart
Copy link
Contributor

@jayer95 Oh this makes so much sense!

640/8=80
640/16=40
640/32=20

80^2+40^2+20^2=8400

yolov5 has anchors, so 8400*3=25200

This is really useful, thanks a mil man! 🚀

@jayer95
Copy link
Author

jayer95 commented Feb 9, 2023

@JustasBart In other words, your yolov8 has 18900 outputs, is your size exported by 480x640?
I didn't actually calculate it, hahaha

You are also the mil man, we have to learn from each other, btw, the author has not replied to me, maybe he is too busy.

@JustasBart
Copy link
Contributor

@jayer95 It's 6300, and yeah @glenn-jocher is pretty much 200% busy 100% of the time especially with the release of yolov8...

But I think at this point we know that it's not too difficult to run inference on either of the models... Still it would be nice to get a reply on this at some stage 🚀

@AyushExel
Copy link
Contributor

Hey guys. Sorry for the late reply. Exports weren't the most stable back then. Is this issue resolved now? If not, here are the ONNX runtime examples both in C++ and Python. C++ one is done by @JustasBart himself!
https://github.com/ultralytics/ultralytics/tree/main/examples/

@dragan07-07
Copy link

Is resolved. Thanks!

@confusedgreenhand
Copy link

confusedgreenhand commented Jun 15, 2023

@JustasBart Yes, yolov5 is so nice to use. We also refer to yolov5.onnx and develop C++ for parsing SNPE .dlc model. When yolov8.pt export becomes yolov8.onnx, the order of output tensor is changed, which leads to the need to modify C++ code. Although +5 changed to +4, it had to be modified. Thanks for sharing, I'm looking into it.

The following is my transpose "yolov8n_transposed.onnx"

https://drive.google.com/file/d/1c6rbBni0H-yofMdjsDK_uNIF10vb07Ax/view?usp=sharing

@jayer95 hi, I would like to ask you a question. When you use "snpe-onnx-to-dlc -i yolov8.onnx", did you encounter the error "ValueError: ERROR_UNSUPPORTED_ATTRIBUTE_VALUE: Attribute axis in Softmax Op with input /model.22/dfl/Transpose_output_0 has an unsupported value. Expected value 1 for this Op attribute, got 3"? If so, I would like to ask you how to solve them. Looking forward to your reply, thank you very much.

@jayer95
Copy link
Author

jayer95 commented Jun 15, 2023

@confusedgreenhand Hi,

I didn't encounter this kind of problem when I converted, but I can try to help you solve it. Can you tell me about your environment and version first?

Are you using Python 3.6 when you convert the SNPE .dlc?
ONNX version?
Have you specified Opset=11 when converting .pt to .onnx?
SNPE version? (I am 1.64)

@confusedgreenhand
Copy link

confusedgreenhand commented Jun 16, 2023

@confusedgreenhand Hi,

I didn't encounter this kind of problem when I converted, but I can try to help you solve it. Can you tell me about your environment and version first?

Are you using Python 3.6 when you convert the SNPE .dlc? ONNX version? Have you specified Opset=11 when converting .pt to .onnx? SNPE version? (I am 1.64)

Thanks for your reply, here are my environment:
(1) snpe-1.41.0.2173, python=3.5.6, onnx=1.3.0
(2) when i convert .pt to .onnx, opset=11
(3) when i convert yolov8s-seg.onnx to .dlc, the softmax (axis=3) in DFL module cannot convert successfully, and i try to modify softmax axis=-1 in onnx, it cannot convert either, snpe only support softmax axis=1, i also try to modify transpose to let axis=1, but the output of onnx inference is wrong
(4) softmax node image:
softmax_node
error_code

@jayer95
Copy link
Author

jayer95 commented Jun 16, 2023

@confusedgreenhand Hi,

I think the problem lies in point (1), SNPE 1.41 is really too old, please try to update to 1.64 to synchronize with me, I even think 1.64 is an old version.

When converting .pt to .onnx, since the master of yolov5 no longer supports Python 3.6, I using Python 3.9.16 to convert .onnx right now

But when converting .onnx to .dlc, I use Python 3.6 & ONNX 1.11.

By the way, after converting to yolov8.dlc, quantize yolov8.dlc to yolov8_int8.dlc. At this time, the last few softmax nodes (yolov8_int8.dlc) have bugs, so my approach is to extract the output from the first few nodes, and the part after the nodes depends on coding myself.

@confusedgreenhand
Copy link

@confusedgreenhand Hi,

I think the problem lies in point (1), SNPE 1.41 is really too old, please try to update to 1.64 to synchronize with me, I even think 1.64 is an old version.

When converting .pt to .onnx, since the master of yolov5 no longer supports Python 3.6, I using Python 3.9.16 to convert .onnx right now

But when converting .onnx to .dlc, I use Python 3.6 & ONNX 1.11.

By the way, after converting to yolov8.dlc, quantize yolov8.dlc to yolov8_int8.dlc. At this time, the last few softmax nodes (yolov8_int8.dlc) have bugs, so my approach is to extract the output from the first few nodes, and the part after the nodes depends on coding myself.

Thank you for your reply, after I moved the output node forward, the conversion from onnx to dlc was successful.

@glenn-jocher
Copy link
Member

@confusedgreenhand thank you for the update. I'm glad to hear that you were able to resolve the issue with the conversion from ONNX to DLC by moving the output node forward. This adjustment allowed for a successful conversion. If you have any further questions or need assistance with anything else related to YOLOv8, please don't hesitate to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants