-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the tensor with yolov8.onnx #751
Comments
84 means bbox xywh + 80 scores. |
I know, what I mean is to change the output tensor order to be the same as yolov5. I can modify export.py by myself to output transpose[0, 2, 1], but I hope to correct the source code, or let me know why yolov8 changed the order. |
@jayer95 I totally agree that this would make it more compatible, although I do have to say that if you were to look at the CPP code that I've posted here the one advantage to having to transpose it is that it allows you to check (Based on which value at a particular index is higher [84 vs 8400]) for that data+4 versus data+5 variable (Needed for the class ID [yolov5 is +5 yolov8 is +4]), but it also likely makes it less efficient then overall. So I can see it both ways personally. Good luck! 🚀 |
@JustasBart The following is my transpose "yolov8n_transposed.onnx" https://drive.google.com/file/d/1c6rbBni0H-yofMdjsDK_uNIF10vb07Ax/view?usp=sharing |
@jayer95 Thinking about it I've just realised that this line: 'if (dimensions > rows)' could become 'if (dimensions % 2 == 0)' meaning that we could export as transposed and deal with it very easily for the +4/+5... It still means that we need extra C++ code then but at least it would save us that transpose move which would be good... I'm all in for using the yolov5 export indexing for yolov8 🚀 |
@jayer95 Running with your transposed model the following code becomes redundant:
And then it can be optimised to simply:
Which would actually be really nice! It would only require that single line of code to work for yolov8/yolov5 models 🚀 |
Just to add to this as well, perhaps we could simply have a flag for this when exporting? That way no direct change would have to be made to the existing code but it would give the ability for users like me to easily choose to do this and then use it via C++, that would be quite good. |
It's really good, so nice for me, let me give it a try, I have developed the C++ code of yolov5 and yolov8 separately, as two plugins on Qualcomm platform, because yolov8 is anchors free. Since yolov8 is anchors free, I refer to yolox.onnx for development.
I think your proposal is good, but I would like to ask the author why he changed the order, maybe it’s just that the order of export.py is reversed? I hope he changed it to the same as yolov5.onnx, because they are all the same framework. At present, yolov8 has not yet supported the demo of onnx. What do you think? |
I don't know how to verify whether my "yolov8n_transposed.onnx" is exported correctly, because currently yolov8 does not support onnx runtime demo haha. I'm about to try your C++. |
@jayer95 Naturally, the earlier the decision is made the better, I'm all in for yolov8 having the same export format as yolov5 (With the exception of the confidence +4/+5 change). |
@jayer95 As mentioned earlier I'm able to run your provided model just fine: (With letterbox mode disabled) Where in order to do this I only had to modify a single line of code:
|
@JustasBart Excellent, I think you have developed every detail very well. The pre-processing letterbox is very important. Have you used the custom model you trained for verification? I am curious about the ability of yolov8 to detect small objects. |
@JustasBart |
@jayer95 I just think that in the real world (I'm a Vision Engineer) you really mostly (~95%+) of the time use 1080p images that is 1920x1080. And so now if you have a 1920x1080 image and even if you train your model as say 1920x1920 you'd end up with 840 rows of padding that you'll be discarding. In reality what I would do is to train my models usually at around 1280x704 and that way I need to scale-down my data but I keep most of the Aspect ratio of the original data that is: As you can see the only squishing of the data occurs from the height of 1080 to 1056 which is a mere 24 rows of height data and then the rest can be used as is without any padding. Otherwise if you assume that I have 1280x1280 I'd have to either squish my data from 1920x1080 to 1280x1280 which would ruin the horizontal resolution (Which is especially a problem for vertical thin objects) or I can resize it to 1280x720 and pad the remaining 560 rows of height data. Either way there would be a significant cost of either padding or squishing. And worst of all there would be a training/inference penalty for the padded data not to mention the padding itself! So therefore I always use the rect=True flag whenever I'm working with 1080p data and if it's any data that say is close enough to being square then I would consider my options to either go fully square it would it still make sense to make it somewhat rectangular. You see the COCO dataset is based on object "In the wild" and furthermore it is meant to be optimal to work on as many off-the-shelve cameras/setups as possible, whereas in reality for someone like me I would work with a fixed resolution on a fixed-camera with a fixed setup (Including the lights/background etc...) so at that point there is no need to force my data to a rectangle or even a need to pad my data, I can just use a rectangular network instead which is the most optimum! Therefore for the C++ example I've decided to do the somewhat more tricky rectangular input example by default and then made sure that square examples would work just fine 🚀 |
@JustasBart If you choose the squishing method, set the model to 640x352, and the pre-processing before inputting 1920x1080 (16:9) into the model will squeeze the image. Although the degree of squeezing is not large, it will destroy the original ratio, which is different from yolov5/yolov8 The letter box algorithm used during training is different. The purpose of the letter box is to not destroy the aspect ratio of the image and maintain a multiple of 32. Therefore, I think padding should be used to set the model to 640x384 (if the default user Both input 16:9 images or videos), of course, this is not suitable, because the size of the mobile phone is a portrait, and mobile phones are now the mainstream. If you want to make a model yolov8 model, you should choose 640x640, because it is compatible with landscape (16:9) and portrait (9:16). By the way, the initial setting of yolov5/yolov8 is on the mobile app, and the size is 320x192(hw, portrait) or 640x384(hw, portrait). Regarding the example you mentioned, if the input video is 1920x1080 (16:9), and the size of the model is set to 1920x1920, 840 will be filled with gray, so we can delete 840, but 1080 is not divisible by 32, 1080 /32=33.75, 32x34=1088, according to the yolov5/yolov8 letterbox algorithm, the minimum height should be 1088, if you choose the squishing method, set the model to 1920x(1088-32), and the pre-processing before inputting 1920x1080 (16:9) into the model will squeeze the image.(squish), and the letterbox algorithm conflicts with each other. Regarding my original question, why your C++ default input size is 640x480, the only reason I can think of is that you are compatible with 4:3 and 16:9, and your default users only input landscape videos or images. Welcome to discuss. |
@jayer95 The 640x480 is completely trivial for the example itself, as in it's just a rectangular input, but not at a specific ratio or anything like that, just for testing/messing around in general. |
@JustasBart Got it, you mentioned:
I thought by your specific size you meant 1080p. |
@jayer95 So for a normal actual project I would train 1080p (1920x1080) images with a Network of usually in around 1280x704, or 960x544 or 768x416, it just depends on how much resolution I feel like I need to expose to the model at the expense of training/inference speed etc... In all cases there's going to be trade offs between squishing the data ever so slightly, but I wouldn't normally use padding for this setup just squish the small remainder. |
@JustasBart Got it, very good idea, compared to padding (letterbox), slight squishing can make the model faster (a little). I observed that your model size is relatively large: 1280x704, or 960x544 or 768x416. Will the model have a speed difference between padding (letterbox) and squishing? If under the powerful GPU or cloud computing, I think the padding (letterbox) pre-processing algorithm is more advanced, because its goal is not to destroy the original image ratio, which is also the method used by the author of yolov5. The product I am developing is an edge computing product, and I can refer to your proposed method to reduce 320x192 to 320x160, because edge computing will attach great importance to speed. |
@JustasBart I also succeeded!!! |
@jayer95 Looks good man! Well done 🚀 |
@JustasBart In order to achieve the same capability as yolov5, I still have a way to go. |
@jayer95 We need a functional CPP implementation for reading yolov8.onnx models, I'll be looking into that over the weekend, like I'm sure that in the end it'll simplify the whole thing by a lot etc... But it's just one of those things where it's easy once you know how but confusing if you don't... |
Can the tensor of yolov8.onnx be changed to [1x8400x84] the same as yolov5.onnx? |
@JustasBart |
@jayer95 I don't think that that's the issue. I think that the issue is that the 8400 of yolov8 needs to be fundamentally read-in and treated very differently in order to actually extract the class ids/confidences when compared to yolov5 that has 18900 outputs. This is a separate issue from the transpose from (1, 84, 8400) to (1, 8400, 84). Ultimately this is a good thing because then the post-processing code is shorter and more efficient but someone needs to figure out what it would look like in CPP... As again, currently if you re-use the code it seems to only give any confidences for any detections that are class index 0, which is no good! |
@jayer95 Fixed my C++ code to now properly parse yolov8 data for all classes! |
@jayer95 @JustasBart how to export this "yolov8n_transposed.onnx" with opset 12? |
Hi @dragan07-07, it was @jayer95 that figured that bit out, but you can refer to my CPP code as to how to transpose the output from (1, 84, 8400) to (1, 8400, 84) and then run inference (As if it was a yolov5 model but with a slight change), otherwise you'll have to change the python code to export the model as that. Good luck! 🚀 |
@JustasBart Thanks, I was able to run v8 using your code. I needed an idea how to start with C# and I succeeded. |
Well done @dragan07-07! enjoy using yolov8 exported models! 🚀 |
@JustasBart Thanks for your work, I'm currently running well. 640/8=80 80^2+40^2+20^2=8400 yolov5 has anchors, so 8400*3=25200 @dragan07-07 If you want to convert "yolov8n_transposed.onnx" of opset 12, please refer to the link below: https://drive.google.com/file/d/1c6rbBni0H-yofMdjsDK_uNIF10vb07Ax/view?usp=sharing Change "opset: 11" in "default.yaml" to "opset: 12". |
@jayer95 Oh this makes so much sense!
This is really useful, thanks a mil man! 🚀 |
@JustasBart In other words, your yolov8 has 18900 outputs, is your size exported by 480x640? You are also the mil man, we have to learn from each other, btw, the author has not replied to me, maybe he is too busy. |
@jayer95 It's 6300, and yeah @glenn-jocher is pretty much 200% busy 100% of the time especially with the release of yolov8... But I think at this point we know that it's not too difficult to run inference on either of the models... Still it would be nice to get a reply on this at some stage 🚀 |
Hey guys. Sorry for the late reply. Exports weren't the most stable back then. Is this issue resolved now? If not, here are the ONNX runtime examples both in C++ and Python. C++ one is done by @JustasBart himself! |
Is resolved. Thanks! |
@jayer95 hi, I would like to ask you a question. When you use "snpe-onnx-to-dlc -i yolov8.onnx", did you encounter the error "ValueError: ERROR_UNSUPPORTED_ATTRIBUTE_VALUE: Attribute axis in Softmax Op with input /model.22/dfl/Transpose_output_0 has an unsupported value. Expected value 1 for this Op attribute, got 3"? If so, I would like to ask you how to solve them. Looking forward to your reply, thank you very much. |
I didn't encounter this kind of problem when I converted, but I can try to help you solve it. Can you tell me about your environment and version first? Are you using Python 3.6 when you convert the SNPE .dlc? |
I think the problem lies in point (1), SNPE 1.41 is really too old, please try to update to 1.64 to synchronize with me, I even think 1.64 is an old version. When converting .pt to .onnx, since the master of yolov5 no longer supports Python 3.6, I using Python 3.9.16 to convert .onnx right now But when converting .onnx to .dlc, I use Python 3.6 & ONNX 1.11. By the way, after converting to yolov8.dlc, quantize yolov8.dlc to yolov8_int8.dlc. At this time, the last few softmax nodes (yolov8_int8.dlc) have bugs, so my approach is to extract the output from the first few nodes, and the part after the nodes depends on coding myself. |
Thank you for your reply, after I moved the output node forward, the conversion from onnx to dlc was successful. |
@confusedgreenhand thank you for the update. I'm glad to hear that you were able to resolve the issue with the conversion from ONNX to DLC by moving the output node forward. This adjustment allowed for a successful conversion. If you have any further questions or need assistance with anything else related to YOLOv8, please don't hesitate to ask. |
Search before asking
Question
Hi @glenn-jocher,
I would like to ask you a question, why does yolov8's export.py specify onnx as [1x84x8400]?
The difference from yolov5.onnx is that yolov8.onnx deletes 1 object score (85-1=84), and the output tensor of yolov8.onnx is [1, 84, 8400]: it means that there are 8400 detection results, each detection result There are 84 values. 84 mean: 4 coordinate value box(x,y,w,h) + 80 class scores of each class index
But the tensor of yolov5.onnx is [1x25200x85], and 85 is placed at the end.
Can the tensor of yolov8.onnx be changed to [1x8400x84] the same as yolov5.onnx?
Additional
No response
The text was updated successfully, but these errors were encountered: