Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model trained with gpu is not as good as trained by cpu. #10526

Open
2 tasks
springile opened this issue Mar 6, 2022 · 2 comments
Open
2 tasks

Model trained with gpu is not as good as trained by cpu. #10526

springile opened this issue Mar 6, 2022 · 2 comments
Assignees
Labels
models:research:odapi ODAPI models:research models that come under research directory type:bug Bug in the code

Comments

@springile
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/...

2. Describe the bug

I want train a new model with ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.

Model trained by CPU is better than GPU.

For the same picture, model trained by cpu can detect the object, but model from gpu can not.

gpu:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.186

cpu:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.286

3. Steps to reproduce

Steps to reproduce the behavior.

4. Expected behavior

Model trained by GPU should be the same as trained by CPU.

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu20.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.8
  • Python version: 3.8
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 8.1
  • GPU model and memory: 16G
@springile springile added models:official models that come under official repository type:bug Bug in the code labels Mar 6, 2022
@saberkun saberkun added models:research models that come under research directory and removed models:official models that come under official repository labels Mar 11, 2022
@saberkun
Copy link
Member

Do you enable mix precision (fp16) training? Also the kernels and implementations are different on GPUs.
For detailed debugging, maybe TF team can help more.

@frostbyte012
Copy link

Hey I want to fix the issue but I'm a complete beginner can anyone suggest to me how to begin and initiate the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:research:odapi ODAPI models:research models that come under research directory type:bug Bug in the code
Projects
None yet
Development

No branches or pull requests

7 participants