RNG APIs called in Inference engine even when do_sample is passed as False. #7295

shubhagr-qc · 2025-05-19T13:37:07Z

shubhagr-qc
May 19, 2025

I'm setting up an inference pipeline using DeepSpeed for tensor parallelism. However, I'm encountering an error due to the unavailability of RNG APIs. To avoid sampling, I'm passing do_sample=False during token generation, which should prevent the device from needing to be in the same RNG state, as random number generator APIs shouldn't be called in this case.

Commenting out the following code snippet in DeepSpeed, helps in resolving the issue:

https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/inference/engine.py (Lines 174 to 177)

if config.tensor_parallel.tp_size > 1:
_rng_state = get_accelerator().get_rng_state().to(get_accelerator().current_device_name())
dist.broadcast(_rng_state, 0)
get_accelerator().set_rng_state(_rng_state.cpu())

Let me know if anything else is required from my side or if my understanding is incorrect.

Answered by sfc-gh-truwase

May 27, 2025

@shubhagr-quic, apologies for the delayed response. My guess is that RNG alignment is unnecessary if do_sample=False.

However, I have little knowledge of the inference engine internals, and the authors are no longer in the project. Going forward, our inference investment is likely to reduce, especially given the rapid progress of alternatives like vllm and sglang. You might want to take that into consideration for your plans.

View full answer

shubhagr-qc · 2025-05-21T10:22:56Z

shubhagr-qc
May 21, 2025
Author

@jeffra @delock @tjruwase
Please help with this query.

5 replies

sfc-gh-truwase May 21, 2025
Maintainer

@shubhagr-quic, apologies for the delayed response.

I don't understand what you mean by "unavailability of RNG APIs". Can you clarify or share an error message?
My understanding of that snippet is that is to help align the RNG states across the devices. However, it might be insufficient to align the execution states since it might be depend on other non-device RNG states (e.g., numpy).
If commenting the snippet works and do_sample=False works for you, that is great. But can you clarify what the ask is for us? Are you thinking of creating a PR for this change?

shubhagr-qc May 22, 2025
Author

@sfc-gh-truwase Thanks for the response.

I am trying to integrate Qualcomm AI100 Backend with Deepspeed only for the Inference case, and for now RNG APIs support on the card is under development, so we wanted to run only inference with do_sample=False to avoid any requirement of RNG APIs.
Ask here is to understand why do we need to align the RNG states across the devices in Inferencing case along with do_sample=False. Is there any flow in Inferencing which might require RNG states to be aligned across devices ??

shubhagr-qc May 26, 2025
Author

@sfc-gh-truwase Please let me know if any other information is required from my side.

sfc-gh-truwase May 27, 2025
Maintainer

@shubhagr-quic, apologies for the delayed response. My guess is that RNG alignment is unnecessary if do_sample=False.

However, I have little knowledge of the inference engine internals, and the authors are no longer in the project. Going forward, our inference investment is likely to reduce, especially given the rapid progress of alternatives like vllm and sglang. You might want to take that into consideration for your plans.

Answer selected by shubhagr-qc

shubhagr-qc May 28, 2025
Author

Thanks @sfc-gh-truwase, for clarifying my query.

shubhagr-qc · 2025-05-27T05:53:46Z

shubhagr-qc
May 27, 2025
Author

Mentioning the original PR in which this code snippet was introduced
#2132

@RezaYazdaniAminabadi Random token generation should be an issue when do_sample=True while generating tokens, but when set to False this issue should not occur and hence the need to align random states across devices should not be needed.
Please help me understand if it's required for any other case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RNG APIs called in Inference engine even when do_sample is passed as False. #7295

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RNG APIs called in Inference engine even when do_sample is passed as False. #7295

Uh oh!

shubhagr-qc May 19, 2025

Replies: 2 comments · 5 replies

Uh oh!

shubhagr-qc May 21, 2025 Author

Uh oh!

sfc-gh-truwase May 21, 2025 Maintainer

Uh oh!

shubhagr-qc May 22, 2025 Author

Uh oh!

shubhagr-qc May 26, 2025 Author

Uh oh!

sfc-gh-truwase May 27, 2025 Maintainer

Uh oh!

shubhagr-qc May 28, 2025 Author

Uh oh!

shubhagr-qc May 27, 2025 Author

shubhagr-qc
May 19, 2025

Replies: 2 comments 5 replies

shubhagr-qc
May 21, 2025
Author

sfc-gh-truwase May 21, 2025
Maintainer

shubhagr-qc May 22, 2025
Author

shubhagr-qc May 26, 2025
Author

sfc-gh-truwase May 27, 2025
Maintainer

shubhagr-qc May 28, 2025
Author

shubhagr-qc
May 27, 2025
Author