Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the loss function for the optimization of the rigid transformation #24

Closed
D0miH opened this issue Jun 21, 2020 · 2 comments

Comments

@D0miH
Copy link

D0miH commented Jun 21, 2020

Hello,
fist of all thank you very much for your awesome work!

I am currently looking through the code that fits the flame model to the 2D landmarks of a given image and I am trying to understand it. However, I am not sure what the logic behind the formula is that was used to create the loss for finding the appropriate scale, translation and rotation to align the projected flame landmarks with the target landmarks.
More specifically I am talking about this two lines:

factor = max(max(target_2d_lmks[:,0]) - min(target_2d_lmks[:,0]),max(target_2d_lmks[:,1]) - min(target_2d_lmks[:,1]))
lmk_dist = weights['lmk']*tf.reduce_sum(tf.square(tf.subtract(lmks_proj_2d, target_2d_lmks))) / (factor ** 2)

I do understand that the factor is the maximum range of either the x- or y-coordinates of the target landmarks. But why is the result of the subtraction of the projected landmarks and the target landmarks squared, reduced and then divided by the square of the factor? Why don't calculate the average distance of the projected and the target landmarks and reduce those?

@D0miH D0miH changed the title Understanding loss for the optimization of the rigid transformation Understanding the loss function for the optimization of the rigid transformation Jun 21, 2020
@TimoBolkart
Copy link
Owner

Hello,
the loss itself is just the sum of squared L2 distances between the projected 2D landmarks and the target landmarks. One could also use a difference distance (including the average of absolute distances, this resembles the L1 distance). Taking the average vs summing over all landmarks is just a fixed factor which is not important, as the number of landmarks remains fixed.
The overall minimization is a weighted sum of losses, where the landmark loss is the only objective function that depends on the actual image size (all other regularizes are independent of the image size and only depend on the model dimensions). To compensate for this, we divide by some normalization factor that depends on the face size within the image. Without this normalization, the influence of the regularizes versus the landmark loss would strongly be influenced by the size of the face within the image.

@D0miH
Copy link
Author

D0miH commented Jun 23, 2020

Ahh, I see! Thank you very much for your fast and detailed answer 👍🏼

@D0miH D0miH closed this as completed Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants