RNG APIs called in Inference engine even when do_sample is passed as False. #7295
-
I'm setting up an inference pipeline using DeepSpeed for tensor parallelism. However, I'm encountering an error due to the unavailability of RNG APIs. To avoid sampling, I'm passing do_sample=False during token generation, which should prevent the device from needing to be in the same RNG state, as random number generator APIs shouldn't be called in this case. Commenting out the following code snippet in DeepSpeed, helps in resolving the issue: https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/inference/engine.py (Lines 174 to 177) if config.tensor_parallel.tp_size > 1: Let me know if anything else is required from my side or if my understanding is incorrect. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
-
Mentioning the original PR in which this code snippet was introduced @RezaYazdaniAminabadi Random token generation should be an issue when do_sample=True while generating tokens, but when set to False this issue should not occur and hence the need to align random states across devices should not be needed. |
Beta Was this translation helpful? Give feedback.
@shubhagr-quic, apologies for the delayed response. My guess is that RNG alignment is unnecessary if
do_sample=False
.However, I have little knowledge of the inference engine internals, and the authors are no longer in the project. Going forward, our inference investment is likely to reduce, especially given the rapid progress of alternatives like vllm and sglang. You might want to take that into consideration for your plans.