Questions on DeT training datasets #11

Zongwei97 · 2022-07-25T10:13:47Z

Hi,

Thanks for the brilliant work on RGB-D tracking.

I am new to the tracking domain. I have several questions regarding the tracking datasets. Can you please help me to clarify several points?

Firstly, in the paper it mentions that the DeT is firstly pretrained on Pseudo LaSOT and Pseudo Coco, and then finetuned on DepthTrack. I would like to know if it makes any changes if we directly train on both three datasets?

Secondly, on the github it mentions that Using the default DiMP50 or ATOM pretrained checkpoints can reduce the training time. It seems that DiMP or ATOM are pretrained with larger RGB datasets (trackingnet, got10k, etc etc). I would like to know if these pretrained weights are adopted to initialize the model weight to produce the paper results? Or in the paper the network is only trained with Pseudo LaSOT, Pseudo Coco, and DepthTrack.

Finally, one question regarding the Table 2: Comparison of the original RGB trackers and their DeT variants. How are the RGB baseline trained? Only with RGB images from Pseudo LaSOT, Pseudo Coco, and DepthTrack? Are they only initialized with pretrained encoder (Imagenet)?

Sorry to bother you with all these questions... Looking forward to hearing from you.
Thanks again

xiaozai · 2022-07-25T10:35:54Z

Thanks for your interest!
1), we also tried to train the DeT on the all RGB datasets with pseudo depth maps, but at the moment we submitted the paper, we did not get the pseudo depth maps of TrackingNet which is so large. We think it will get better results if you train with more data, even the pseudo depth maps.
Since the huge amounts of the RGB datasets, if you train the network with mixed data, we think the results will be affect the pseudo depth maps. Because that compared to the RGB with pseudo depth maps, out Depthtrack has only 150 training seqs.
But we recommend that you can use mixed datasets for training, because we found the pseudo depth data is good enough for training.
2) we used the default DíMP and ATOM from pytracking, I think they are pretrained on the all three RGB datasets and also initialized with ImageNet.
We also used the pretrained DIMP and ATOM checkpoints for initialization, because we used the colormap depth as input, which can be considered as the textureless RGB images

xiaozai · 2022-07-28T11:38:24Z

Hi, I checked the code, in the code, I finally put all three datasets, COCO, LaSOT and DepthTrack train set together for training. And I also checked the BMVC paper, in which the depth DiMP was trained with COCO, LaSOT, and TrackingNet, but I did not use the depth DiMP in BMVC as the initialised weights for DeT, maybe you can tried it.

For the ReseNet50, we still use the ImageNet weights.

Zongwei97 · 2022-07-28T11:46:17Z

Hi,

Thanks for the help!

I am trying to reproduce the paper result. I use your provided ckpt (DeT_max) to test on DepthTrack.

I obtain 61, 57, 59 as metrics. I use the default run_tracker, so i guess the mode is sequential.

Therefore, the obtained results are significantly higher than the metrics reported on DeT paper (56 50 53)

Do you have any idea about the performance?

xiaozai · 2022-07-28T11:52:32Z

Hi, thanks for your interest!
Higher results are possible. The reason may be that DepthTrack training set still shares some similar scenarios and objects with the test sequences. You can test it on the CDTB, the results may be lower. Thats why we recommend to use more generated data to increase the general performances on various test datasets. If you need the numbers for your papers, you can just use the lower numbers in the paper.

By the way, can you train the DeT with only COCO and LaSOT data? and then test again?

Zongwei97 · 2022-07-28T12:27:00Z

Hi,

I didn't retrain the paper yet. I only used your provided model, your ckpt, and your dataset to reproduce the result, which is strangely higher than your paper ^^ (just curious why this is happening)

I am currently preparing the datasets since I am new to the domain. I have downloaded all your generated depth, but the original LaSOT RGB link seems to be broken... I am trying to figure it out. If I have sth, I will definitly let you know.

xiaozai · 2022-07-28T12:30:12Z

Hi, if you download all generated data, it may be too large. It is better you find some better monocular depth estimation methods and generate depth maps at your local machine. Recently there are more sota methods, they may result better performances.

Zongwei97 changed the title ~~Questions DeT training datasets~~ Questions on DeT training datasets Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on DeT training datasets #11

Questions on DeT training datasets #11

Zongwei97 commented Jul 25, 2022

xiaozai commented Jul 25, 2022

xiaozai commented Jul 28, 2022

Zongwei97 commented Jul 28, 2022

xiaozai commented Jul 28, 2022

Zongwei97 commented Jul 28, 2022

xiaozai commented Jul 28, 2022

Questions on DeT training datasets #11

Questions on DeT training datasets #11

Comments

Zongwei97 commented Jul 25, 2022

xiaozai commented Jul 25, 2022

xiaozai commented Jul 28, 2022

Zongwei97 commented Jul 28, 2022

xiaozai commented Jul 28, 2022

Zongwei97 commented Jul 28, 2022

xiaozai commented Jul 28, 2022