-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference with 2 GPUs #69
Comments
I would like to ask, I found that the speed of inference using the model is much slower than training the model. Is this normal? Are there any tricks to speed up this process? |
same question |
Maybe you are using cpu to do the inference job. Switch to the right environment and try this in python: |
During inference, the model runs N times sequentially to generate N tokens (it waits to get the output of the previous inference to aggreagate the token to the next inference's input). |
Hi everyone,
I got the following exception when running
generate.py
with 2 GPUs:I have tried the solution in #21, but it makes all the workload running on a single GPU.
I changed the code a little bit and here's the diff:
The text was updated successfully, but these errors were encountered: