Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't the track queries get updated for two_stage? #61

Closed
owen24819 opened this issue Sep 15, 2022 · 4 comments
Closed

Why don't the track queries get updated for two_stage? #61

owen24819 opened this issue Sep 15, 2022 · 4 comments

Comments

@owen24819
Copy link

owen24819 commented Sep 15, 2022

if self.two_stage:
output_memory, output_proposals = self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes)
# hack implementation for two-stage Deformable DETR
enc_outputs_class = self.decoder.class_embed[self.decoder.num_layers](output_memory)
enc_outputs_coord_unact = self.decoder.bbox_embed[self.decoder.num_layers](output_memory) + output_proposals
topk = self.two_stage_num_proposals
topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]
topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4))
topk_coords_unact = topk_coords_unact.detach()
reference_points = topk_coords_unact.sigmoid()
init_reference_out = reference_points
pos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact)))
query_embed, tgt = torch.split(pos_trans_out, c, dim=2)
else:
query_embed, tgt = torch.split(query_embed, c, dim=1)
query_embed = query_embed.unsqueeze(0).expand(bs, -1, -1)
tgt = tgt.unsqueeze(0).expand(bs, -1, -1)
reference_points = self.reference_points(query_embed).sigmoid()
if targets is not None and 'track_query_hs_embeds' in targets[0]:
# print([t['track_query_hs_embeds'].shape for t in targets])
# prev_hs_embed = torch.nn.utils.rnn.pad_sequence([t['track_query_hs_embeds'] for t in targets], batch_first=True, padding_value=float('nan'))
# prev_boxes = torch.nn.utils.rnn.pad_sequence([t['track_query_boxes'] for t in targets], batch_first=True, padding_value=float('nan'))
# print(prev_hs_embed.shape)
# query_mask = torch.isnan(prev_hs_embed)
# print(query_mask)
prev_hs_embed = torch.stack([t['track_query_hs_embeds'] for t in targets])
prev_boxes = torch.stack([t['track_query_boxes'] for t in targets])
prev_query_embed = torch.zeros_like(prev_hs_embed)
# prev_query_embed = self.track_query_embed.weight.expand_as(prev_hs_embed)
# prev_query_embed = self.hs_embed_to_query_embed(prev_hs_embed)
# prev_query_embed = None
prev_tgt = prev_hs_embed
# prev_tgt = self.hs_embed_to_tgt(prev_hs_embed)
query_embed = torch.cat([prev_query_embed, query_embed], dim=1)
tgt = torch.cat([prev_tgt, tgt], dim=1)
reference_points = torch.cat([prev_boxes[..., :2], reference_points], dim=1)
# if 'track_queries_placeholder_mask' in targets[0]:
# query_attn_mask = torch.stack([t['track_queries_placeholder_mask'] for t in targets])
init_reference_out = reference_points

I am confused why the track queries don't get updated for the two-stage.

Also, nice work by the way!

@timmeinhardt
Copy link
Owner

TrackFormer is not implemented to work with the two-stage approach of Deformable DETR. In my opinion, the two-stage approach is a step back from the end-to-end unified solution of DETR. Hence, we never tried to combine two-stage with our track query approach.

@owen24819
Copy link
Author

Ah ok. I see. Thanks for the quick response!

Following up with another question, does TrackFormer work with num_feature_levels greater than 1. It makes sense why you chose 1 feature level however I would like to try more feature levels. Although I could run MultiScaleDeformableAttention with multiple feature levels on Trackformer, I noticed the MultiScaleDeformableAttention package was differenent from the Deformable DETR paper. Is it ok to use your MultiScaleDeformableAttention package for multiple feature levels or should I revert back to the Deformable DETR's MultiScaleDeformableAttention package.

@timmeinhardt
Copy link
Owner

TrackFormer already works with multiple feature levels. All our trainings/evaluations (except the MOTS20 models) run the deformable option which loads this config.

The underlying MultiScaleDeformableAttention backend should not make a big difference.

@owen24819
Copy link
Author

Great! Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants