-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can u get the mAP as reported in darknet ? #9
Comments
Issue #7 is open on this topic. The mAP calculation is still under development. The current mAP calculation using this repo with the official yolov3 weights is 56.7 compared to 57.9 in Darknet. |
I use your code with the official yolov3 weights and COCO API to calculate AP in 5k, and I get AP[IoU=0.50] = 0.533 when image size is 416x416.Here is the whole results.
|
When change image size to 608x608, I get results like this
|
@xuzheyuan624 oh thanks for using the official COCO mAP! Do you have code to generate the COCO input text files for the mAP calculation? Maybe we can update I've been updating the mAP code recently, so I'm not sure if you are using the most recent version. Can you tell me the equivalent mAP you end up with if you run test.py for the 5000 images? I get |
@xuzheyuan624 could you share the code you used to generate the COCO text files used to get the official COCO mAP? It would be very useful if I could integrate that into |
I have modified some code (I write a dataloader for my own).But u can refer to it to kown how to calculate the mAP by COCO APi from pycocotools.coco import COCO parser = argparse.ArgumentParser() cuda = torch.cuda.is_available() and opt.use_cuda data_config = parse_data_config(opt.data_config_path) model = Darknet(opt.cfg, opt.img_size) if opt.weights_path.endswith('.weights'): model.to(device).eval() dataloader = torch.utils.data.DataLoader(COCODataset(test_path, index2category = json.load(open("coco_index2category.json")) print('Start evaling') |
@xuzheyuan624 I noticed that your code doesn't shift the boxes back to original image coordinates from before the letterboxing process. Is this done somewhere else internally? |
@xuzheyuan624 this code takes care of scaling back, but what about letterboxing? I mean the process of fitting the image inside a 608x608 square. This is usually done by resizing the larger dimension to 608 (and smaller dimension accordingly) and then padding to the aspect ratio the model was trained on (for COCO - 1x1). @glenn-jocher's original code reverses this process by subtracting the amount of padding done during inference. |
@nirbenz Oh, thanks for reminding me that.Indeed, I write a dataloader for my own when I test this code.In this dataloader, I didn't use padding when resizing the image to 608x608. Maybe it's the reason that I got a different mAP and I will try again by keeping aspect ratio. |
@xuzheyuan624 That would give you aspect ratio invariance, which is probably not what you want for COCO. I wouldn't be surprised if modifying this will get you much closer to the target mAP :) |
@xuzheyuan624 @nirbenz the dataloader in Lines 157 to 163 in 24a4197
|
@glenn-jocher your implementation definitely does it (although for custom datasets I would expand it to support different aspect ratios - e.g. when width and height aren't the same). But, I believe the data-loader which @xuzheyuan624 was referring to doesn't letterbox. This could attribute to a rather large difference in mAP. As for how letterboxing it performed along with data augmentation - well it seems that in the examples you sent me there are a few examples that can be cropped further while keeping correct AR - for example, this image: I am not too sure how the original Darknet implementation does its augmentations but I would assume the image undergoes all the necessary augmentation and then the minimal bounding box (minx, miny, maxx, maxy over a a rotated/skewed rectangle) is used as a reference for letterboxing. This would ensure that you are as tight as possible around the image boundaries and that you are keeping the grey area to minimum. |
@glenn-jocher I am reviving this to note that that even if training succeeds, I wouldn't be surprised if the different letterboxing (which is wasteful in image real estate) might cause a slight difference in overall mAP. I will try to fix this code and PR once I get to training myself. |
@xuzheyuan624 @glenn-jocher reviving this issue. Other than different aspect ratios, something else that should be taken into account to get performance comparable to Darknet training is different resolutions. When training YOLOv3 using darknet - not only image AR is changed but the actual final image resolution is also changed (i.e., without letterboxing). A training YOLOv3 model naturally supports every image for which both height and width are multiples of 32. This is used during training to get better robustness to varying object sizes. This is also the reason the original YOLOv3 model is the same one for all three resolutions in the paper (320, 416, 608). I am in the process of implementing this but I wonder if anyone else already did it. @xuzheyuan624 , if I recall correctly your trained model achieves a final mAP not too far from the original implementation - correct? I'll note that I see that existing code performs minor scaling (between 0.8 and 1.2) but this isn't in the same range as original model and would (to the best of my understanding) crop the original image and it's bounding boxes. This is also useful but should be performed regardless (and after) what I just described. |
@nirbenz I think you are referring to the multi-scale training. I have an implementation of this commented out currently in train.py line 102-106. If these lines are uncommented then each epoch will train on a random image size from 320 - 608. Aspect ratio is different though, this should always be constant. For example if you increase the height of an image by 50% you must also increase its width by 50%. # Multi-Scale YOLO Training
img_size = random.choice(range(10, 20)) * 32 # 320 - 608 pixels
dataloader = load_images_and_labels(train_path, batch_size=opt.batch_size, img_size=img_size, augment=True)
print('Running this epoch with image size %g' % img_size) To work properly,
|
@xuzheyuan624 @nirbenz Commit dc7b58b updates train.py to use multi-scale training by default, though this can be turned off by setting # Multi-Scale YOLO Training
if opt.multi_scale:
img_size = random.choice(range(10, 20)) * 32 # 320 - 608 pixels
dataloader = load_images_and_labels(train_path, batch_size=opt.batch_size, img_size=img_size, augment=True)
print('Running Epoch %g at multi_scale img_size %g' % (epoch, img_size)) |
I use multi-scale training like @glenn-jocher @nirbenz :
|
@glenn-jocher In the original yolov3 code, the image size is changed for every 10 batches. |
@okanlv ah yes now I remember why I did it only once per epoch. The Does darknet vary the image size throughout training or just in the final few epochs? |
@glenn-jocher Hmm, that is a valid point. Darknet changes the image size throughout the training according to the following rule.
I thought changing the image size every 10 batches prevents the model to overfit a particular image size. However, your method might work as well imo. |
@glenn-jocher @xuzheyuan624 Thanks, I missed that since I was looking for it in the dataloader rather than outside of it. Which really brings the question - wouldn't this be easier in the data loader itself and make more sense? |
@okanlv Are you sure this is the relevant piece of code from Darknet training? It would impose constant aspect ratio images, which is clearly not the case (I've been using Darknet to train non-square native models for a while now). |
@nirbenz Yes, this is the multi-scale training part of Darknet. In order to deal with different image sizes, the layers are resized as follows:
In practice, images with different aspect ratio are padded to the same aspect ratio to train on gpu. |
I am not sure this is the whole picture - and that the configured (from cfg file) width/height do affect ho anchors locations are used in practice (both in training and inference). So while naturally, internally the model is aspect ratio invariant - if the width/height are configured to be non-square, letterboxing won't be performed at all and the model is trained on actual rectangular images. This isn't relevant to MS-COCO training but is relevant to custom datasets (which is what I'm facing, hence needing to expand the original code). |
@okanlv hello. i meet some trouble while using this code train my own dataset, the data include 10 classes, the training result was as follow: |
@sporterman Hi, I have not trained on another dataset but I have summerized the necessery steps in this comment for VOC dataset. You should follow similar steps for your dataset. |
@okanlv Appreciate ! Actually i want to know if this code is useful for my small dataset, because there are many troubles you guys met. I've searched for yolov3 realise. but almost all the blogs are about how to train yolov3 on official c++ code. Up to now, i didn't find any code in pytorch that realized yolov3 steady, any advice? |
@sporterman yes this repository will work for any dataset, not just COCO, just follow @okanlv comment with excellent directions. I used it for the xView challenge this summer. You can see training results and example inference result here: https://github.com/ultralytics/xview-yolov3 |
@glenn-jocher so now we can say u get the mAP as reported in darknet? |
@sanmianjiao It seems not quite. The latest commit produces 0.52 mAP around epoch 62, at 416 x 416. Darknet reported mAP is 0.58 at 608 x 608 (paper does not report at 416 pixels). I have not tried to train fully with |
@glenn-jocher so u get mAP 0.52 in 62epoch and size is 416*416 and the dataset is train2014+val2014-5k ? When we use 5k and yolo_weights.pt provided by author to evaluate the mAP , u can get 0.58 as you said in read.md? Forget my poor english~ |
@glenn-jocher and I want to say that the paper said the mAP50 is 55.3 in size of 416*416. |
@sanmianjiao ah yes you are correct. And yes this is on COCO2014 you mentioned. So the proper comparison is darknet 55.3 to this repo 52.2 currently. This is not so far apart, I'm happy to see that. I'm still running experiments to improve the map as well, but it is slow going as I only have one GPU, so hopefully I can raise this 52.2 a little higher in the coming weeks and months. |
@glenn-jocher there is no real difference between COCO14 and COCO17, other than (and this is a big thing) separation of validations and train. With COCO14 there is a roughly 50-50 split, so common practice is to merge them and choose a small subset for test. This is what 5k.part and testvalno5k.part are. For COCO17 the dataset is already split that way (train+val and test). But because of that, evaluating YOLOv3 (using original weights) must be done on the 5k split performed by the author. Otherwise you are probably testing on some of the train-set. Apologies if this is well known already but I though it's important to clarify. |
Hi Glenn, Thanks for sharing this repo. I noticed there's a difference between this repo and darknet during training that may impact performance. In "build_targets" you use an arbitrary threshold -> 0.1 to skip anchors that are not good enough, whereas in darknet, the (0, 0) center IOU is calculated between all anchors (here the total number of anchors is 9) and the target. Only when the best anchor in all 9 anchors is at the current yolo layer the prediction joins training. In short, a target is only assigned to 1 yolo layer, 1 anchor. In this repo, a target can be assigned to multiple yolo layers, and all of them calculate loss and gradients, which could affect the training significantly. |
@codingantjay yes you are correct, this repo sets an arbitrary lower threshold (0.1 IOU) for rejecting potential anchors within each of the 3 yolo layers. This is a tunable parameter that I set a while back after some trial and error, though the repo was in a substantially different state at the time, so perhaps it needs retuning. You are also correct that this means that an object can be assigned to multiple anchors, perhaps even 3 times, one in each layer. I did not know darknet was only assigning an object to one of the 9 layers. This would be difficult to replicate in this repo as each of the yolo layers creates its own independent loss function, though you could try to do this, and to tune the rejection threshold (I would vary it between 0.0 and 0.3), and if any of these work I'd be all ears!! Unfortunately I only have one GPU and limited resources to devote to further improving the repo. Any help is appreciated! |
@glenn-jocher @codingantjay This would indeed necessitate rebuilding the YOLO layer to be shareable across the network rather than repeated three times. I would assume it shouldn't be too difficult to change the chosen anchors in a forward/backward pass depending on an argument passed to In other words, the current behavior of I will have a go at this soon. |
@codingantjay I'm actually not 100% sure I understand this part, can you perhaps clarify what you meant there? |
@xuzheyuan624 @codingantjay @sanmianjiao @sporterman pycocotools mAP is 0.550 (416) and 0.579 (608) with
|
@xuzheyuan624 can you please share the command and steps you followed to make the evaluation of your model. ? |
No description provided.
The text was updated successfully, but these errors were encountered: