Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about 300W-LP labels acquirements #61

Closed
FunkyKoki opened this issue Jan 6, 2022 · 6 comments
Closed

Question about 300W-LP labels acquirements #61

FunkyKoki opened this issue Jan 6, 2022 · 6 comments
Labels
question Further information is requested

Comments

@FunkyKoki
Copy link

Thanks for your work and help. Now I have got a lite-weight model using MobileNetV3-small as backbone. This lite-weight model can achieve the same pose evaluation performance on AFLW2000 as your model (both without fine-tuning on 300W-LP).

Now I am focused on fine-tuning. In your paper, you said:

Training pose rotation labels are obtained by converting the 300W-LP ground-truth Euler angles to rotation vectors, and pose translation labels are created using the ground-truth landmarks, using standard means.

I open this issue to confirm several things, and I will be very grateful if you can help.

Here are my questions:

  1. How did you define the face bounding box for each image in 300W-LP since in some images there can be two or more faces detected? Did you just use one bounding box, which has landmark annotations? How did you get the bounding box? Did you use a face detector, like InsightFace?
  2. Since the only one face in each image of 300W-LP are annotated with 68 points, can I directly use the labeled landmarks to make the JSON files, and choose to use self.threed_68_points in the code here to generate the lmdb file?

That's all. Thank you so much.

@eugeneYz
Copy link

eugeneYz commented Jan 8, 2022

have u ever found the prediction result x,y,z an error (especially z when i use my webcam to detect)?

@FunkyKoki
Copy link
Author

FunkyKoki commented Jan 10, 2022

have u ever found the prediction result x,y,z an error (especially z when i use my webcam to detect)?

@eugeneYz

What is the model you are using? Are you using the fine-tuned model?

By the way, the question you asked is not related to the topic at all.

You can ask @vitoralbiero in a new issue.

@FunkyKoki
Copy link
Author

Hi, @vitoralbiero , why don't you release the 300W-LP annotations? This makes me feel strange. Is this because of the copyright?

I use InsightFace to detect the faces and the corresponding landmarks (3D, 68 points) for each image in 300W-LP, only one face closest to the center of the image would be saved as the ground truth (with .json format). Then I use the convert_json_list_to_lmdb.py to convert them into a lmdb file and use it for training.

But the evaluation performance on AFLW2000 is not boosted at all.

@vitoralbiero
Copy link
Owner

Hello @FunkyKoki,

I don't have time right now to release the annotations. But we may do so in the future.

300W-LP comes with Euler angles and landmarks, so we converted the provided Euler angles to get rotation vectors, and used the landmarks to get translation vectors.

To fine-tune on 300W-LP, please follow section 4.1 of our paper, which describes the training and fine-tuning steps.

Hope this helps.

@vitoralbiero vitoralbiero added the question Further information is requested label Jan 12, 2022
@FunkyKoki
Copy link
Author

Hi there, thanks for your answer, but I still cannot figure out how did you define the ground truth face bounding box for each image?

@vitoralbiero
Copy link
Owner

We used the provided landmarks to get a bounding box. You can use this function to get a bbox using landmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants