[Bugfix] Fix marlin kernel crash on H100 #4218

alexm-neuralmagic · 2024-04-20T02:09:59Z

This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187.
The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.

mgoin

Is there any way to keep the cache hint? Seems pretty useful but if you measured no difference then it might be alright

alexm-neuralmagic · 2024-04-20T13:01:37Z

I tried various modifications to the PTX to keep the cache-hint, but it did not work.

pcmoritz

Thanks for fixing this, I validated the fix with the reproduction in neuralmagic#187. Always great to see fixes that make things simpler ❤️

This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.

This PR addresses the Marlin kernel H100 crash that was reported here: #187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.

This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.

The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one. Ported from dense marlin: vllm-project#4218

This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.

[Bugfix] Fix marlin kernel crash on H100

689e2ed

alexm-neuralmagic mentioned this pull request Apr 20, 2024

[Bug]: When running repo hello world: RuntimeError: CUDA error: an illegal instruction was encountered neuralmagic/nm-vllm#187

Closed

mgoin reviewed Apr 20, 2024

View reviewed changes

pcmoritz approved these changes Apr 24, 2024

View reviewed changes

pcmoritz merged commit aae0824 into vllm-project:main Apr 24, 2024
47 checks passed

Qubitium mentioned this pull request Apr 27, 2024

[BUG] Fix H100 crash/compat with Marlin AutoGPTQ/AutoGPTQ#654

Merged

3 tasks

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

mgoin mentioned this pull request May 15, 2024

[Bugfix] Fix marlin 2:4 kernel crash on H100 neuralmagic/nm-vllm#243

Merged

Qubitium mentioned this pull request Jun 15, 2024

Fix H100 crash with Marlin Qubitium/AutoGPTQ#11

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix marlin kernel crash on H100 #4218

[Bugfix] Fix marlin kernel crash on H100 #4218

alexm-neuralmagic commented Apr 20, 2024

mgoin left a comment

alexm-neuralmagic commented Apr 20, 2024

pcmoritz left a comment

[Bugfix] Fix marlin kernel crash on H100 #4218

[Bugfix] Fix marlin kernel crash on H100 #4218

Conversation

alexm-neuralmagic commented Apr 20, 2024

mgoin left a comment

Choose a reason for hiding this comment

alexm-neuralmagic commented Apr 20, 2024

pcmoritz left a comment

Choose a reason for hiding this comment