-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[doc][EM] Add a brief introduction to NUMA. #11538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python-package/xgboost/utils.py
Outdated
def _get_uuid(ordinal: int) -> str: | ||
"""Construct a string representation of UUID.""" | ||
from cuda.bindings import runtime as cudart | ||
|
||
status, prop = cudart.cudaGetDeviceProperties(ordinal) | ||
_checkcu(status) | ||
|
||
dash_pos = {0, 4, 6, 8, 10} | ||
uuid = "GPU" | ||
|
||
for i in range(16): | ||
if i in dash_pos: | ||
uuid += "-" | ||
h = hex(0xFF & np.int32(prop.uuid.bytes[i])) | ||
assert h[:2] == "0x" | ||
h = h[2:] | ||
|
||
while len(h) < 2: | ||
h = "0" + h | ||
uuid += h | ||
return uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. Got a bit too used to cudart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, I need this cudart version as XGBoost should prefer the CUDA device enumeration instead of the nvml device enumeration.
@pentschev I have simplified the utility using nvml and changed the package name in the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document that set_device_cpu_affinity
is not available on other OSes like Windows.
Update:
|
Hold this PR for now. I think the CPU affinity alone is not sufficient when the memory is under pressure. |
Expanded the document for using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the External Memory tutorial by introducing a section on NUMA configuration and updates the Python demos to reference the new cuda-python
package and the NUMA guidance.
- Added a table of contents and a new NUMA section explaining how to set CPU and memory affinity.
- Updated demo scripts to switch from
python-cuda
tocuda-python
, adjust thecudart
import, and reference the NUMA tutorial. - Minor history update in the release notes to include support for the Grace Blackwell decompression engine.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
doc/tutorials/external_memory.rst | Added TOC and a detailed NUMA section with examples for numactl . |
demo/guide-python/external_memory.py | Swapped python-cuda to cuda-python , updated import, added NUMA note. |
demo/guide-python/distributed_extmem_basic.py | Same updates as above: package name, import path, and NUMA reference. |
Comments suppressed due to low confidence (3)
doc/tutorials/external_memory.rst:297
- [nitpick] Consider adding a brief note or link on installing
numactl
(e.g., viaapt-get install numactl
), so readers know how to obtain the tool before using it.
numactl --membind=${NODEID} --cpunodebind=${NODEID} ./myapp
demo/guide-python/external_memory.py:50
- [nitpick] The
device_mem_total
function is duplicated across demos; consider extracting it into a shared utility module to reduce repetition.
import cuda.bindings.runtime as cudart
demo/guide-python/distributed_extmem_basic.py:44
- [nitpick] Same helper appears here; extracting the GPU memory query into a common helper would improve consistency and reduce maintenance overhead.
import cuda.bindings.runtime as cudart
Add a utility to help set the CPU affinity.