-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with lm.one_billion_wds.OneBWdsGPipeTransformer #68
Comments
Please try bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds.OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --controller_gpus=4 --worker_gpus=4 --worker_split_size=4 (Having to specify controller_gpus is a bug that we will fix) |
There also seems to be a failing assertion right now with that model, we will look into that too. |
Hi I tried the command you gave me in the above comment. I think it progressed and some where it met with Aborted (core dumped). I am attaching the error log: |
Yes, there is some error with the model configuration right now. We are sorry about the problem and will update this issue when it is resolved. |
@jonathanasdf I0530 07:26:44.508102 140140756334336 trainer.py:305] Load from checkpoint /tmp/mnist/log/train/ckpt-00000000. |
The VOCAB_SIZE was incorrectly set. We will fix it asap. |
This issue should have been fixed. Please close it if there is no further issue. |
Hi, I am trying to run the above mentioned model in the docker. I was facing the error when I ran the following command,
**command : ** bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds.OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --worker_split_size=4
I have a 4 GPU system so I am using split_size=4. When I asked how to try out Gpipe in issue #48 I was given this command and also was asked to modify OneBWdsGPipeTransformer hparams, I haven't done the changes for hparams is the following error because of that? If I need to change something can you help in what hparams I need to change. I am also posting the error logo below:
**Error log : **
err.txt
The text was updated successfully, but these errors were encountered: