Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 30 GPU and bug of random seed #14

Closed
Senwang98 opened this issue Sep 27, 2021 · 7 comments
Closed

Support 30 GPU and bug of random seed #14

Senwang98 opened this issue Sep 27, 2021 · 7 comments

Comments

@Senwang98
Copy link

Senwang98 commented Sep 27, 2021

Hi, @xinzhuma
Q1: It seems that code can't run on 30x0 GPU. I guess your code can't support cuda11 now.
image

Q2:It can't set random seed to do further experiment.
Since you have set fixed random seed now, but it is not really work as expected. Each time you train one model, the final results are different. Do you have any suggestions about it? thanks very much! (This is strange, because we always use this method to make model reproducted)

@alfinnurhalim
Copy link

I used cuda 11 and its works

@Senwang98
Copy link
Author

I used cuda 11 and its works

Thanks for replying, which type of GPU do you use for traing now?

@Senwang98 Senwang98 changed the title Support cuda11? Support cuda11 and bug of random seed Sep 27, 2021
@Senwang98 Senwang98 changed the title Support cuda11 and bug of random seed Support 30 GPU and bug of random seed Sep 27, 2021
@Senwang98
Copy link
Author

Q1:
The training processing is stucked in

self.model = torch.nn.DataParallel(model, device_ids=self.gpu_ids).to(self.device)

@xinzhuma
Copy link
Owner

xinzhuma commented Sep 27, 2021

@Senwang98

  1. the information is limited, maybe you need to check the device_id in the config file is consistent with your machine
  2. it works well in my local environment, please ensure you use the same environments when you try different seeds.

@Senwang98
Copy link
Author

@xinzhuma
Hi,
Q1: It's my fault!!! The codes is work well now, sorry. I make a mistake, 30 GPU can't support pytorch version < 1.7. pytorch>=1.7 work well except some warning.
Q2: I didn't change my random seed, I keep seed=444,but two training results are different (I use the orginal code of your repo).
I will try again, thanks for your reply.

@xinzhuma
Copy link
Owner

okay, please tell me if Q2 is confirmed.

@Senwang98
Copy link
Author

@xinzhuma
I further test three times training without any change of config file.
Now, the training is stable and the results are reproducted. The best results is 14.16 now.
I will close this issue and sorry for disturb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants