Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch partition? #42

Closed
zbc-l opened this issue Nov 30, 2022 · 5 comments
Closed

patch partition? #42

zbc-l opened this issue Nov 30, 2022 · 5 comments

Comments

@zbc-l
Copy link

zbc-l commented Nov 30, 2022

Thank you for such an excellent job. I have some questions about cotr. During the training process, do you divide the scene images into 256*256 patches according to certain rules after scaling and then input them into the network for training? (I'm not sure where this step is implemented in the program.) How is corrs partitioned? Will it be the case that the corresponding point is divided into the next patch? How should this be handled? Is the validation process also similar to the training process after the split iteration.

@jiangwei221
Copy link
Collaborator

  1. Cropping and scaling is done inside the dataloader:
    seed_corr = self.get_seed_corr(nn_cap, query_cap)
    if seed_corr is None:
    return self.__getitem__(random.randint(0, self.__len__() - 1))
    # crop cap
    s = np.random.choice(self.zooms)
    nn_zoom_cap = self.get_zoomed_cap(nn_cap, seed_corr[:2], s, 0)
    query_zoom_cap = self.get_zoomed_cap(query_cap, seed_corr[2:], s, self.zoom_jitter)
    assert nn_zoom_cap.shape == query_zoom_cap.shape == (constants.MAX_SIZE, constants.MAX_SIZE)
    corrs = self.get_corrs(query_zoom_cap, nn_zoom_cap)
    if corrs is None or corrs.shape[0] < self.num_kp:
    return self.__getitem__(random.randint(0, self.__len__() - 1))
    shuffle = np.random.permutation(corrs.shape[0])
    corrs = np.take(corrs, shuffle, axis=0)
    corrs = self._trim_corrs(corrs)
  2. Validation data is similar to training data, both are cropped and scaled.

@zbc-l
Copy link
Author

zbc-l commented Dec 13, 2022

Thank you for your patient reply. I noticed that during the training process, queries and targets are pairs of query points of images a and b, but they are concatenated in reverse order, which means that in the model prediction process, all queries of a and b need to be input to predict.
But I noticed that in the demo, you could enter queries_a for one of the images and get queries_b for b images.
I am confused about the role of the variables corr_a, con_a, loc_from, and loc_to.

corr_a, con_a, resample_a, corr_b, con_b, resample_b = cotr_flow(self.model,
img_a_sq,
img_b_sq
)

loc_to = (corr_a[tuple(np.floor(pos).astype('int'))].copy() * 0.5 + 0.5) * img_b.shape[:2][::-1]

it looks like loc_from is the coordinates of the query point on graph a and loc_to is the coordinates of the query point on image b but isn't the prediction process of the model done in the infer_batch function def infer_batch?
out = self.infer_batch(img_batch, query_batch)

@zbc-l
Copy link
Author

zbc-l commented Dec 13, 2022

Here it seems that the pred value is overwritten by the new loop each time the loop, so the resulting pred does not seem to store all the values predicted by the loop.

for batch_idx, data_pack in tqdm.tqdm(
enumerate(self.val_loader), total=len(self.val_loader),
desc='Valid iteration=%d' % self.iteration, ncols=80,
leave=False):
loss_data, pred = self.validate_batch(data_pack)
val_loss_list.append(loss_data)
mean_loss = np.array(val_loss_list).mean()
validation_data = {'val_loss': mean_loss,
'pred': pred,
}

@jiangwei221
Copy link
Collaborator

  1. COTR can take in query points on both image A and image B. If the X coordinate of a query point is between [0, 0.5] then it's on the left(A) image, if the X coordinate of a query point is between [0.5, 1] then it's on the right(B) image. In addition, the architecture treats each individual point independently.
  2. For the dataloader, although we input query points from both images, COTR still treats them as independent points.
  3. The inference engine is needed for the recursive zoom-in, but the base model inference is made in infer_batch.
  4. Yes, we only keep the final pred for the validation process.

@zbc-l
Copy link
Author

zbc-l commented Dec 16, 2022

Your reply is very detailed. Thank you for your patient reply.

@zbc-l zbc-l closed this as completed Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants