-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
warning: "can't bind memory" seems to be OS-related. #2482
Comments
You seem to use xmrig in a Virtual Machine. Just with Windows 10 (for testing) or as well with Linux? Perhaps that's why the system can't bind the threads because of the virtualization layer between mining software and hardware! |
No, I just had Linux (Fedora and Ubuntu) OS booted up on a live installation USB stick, and tested. I did not install Linux on a hard disk drive. So it's not in a Virtual Machine environment. |
But xmrig recognized your system as a VM. Look at the output line 5 |
Is the hwloc version the same between the two? |
For Portable Hardware Locality, I guess yes as both are booted up on the same hardware. |
I had Hyper-V enabled under Windows 10, which nominally also runs as a virtual machine, but xmrig has direct access to all hardware. |
This is what has happend under pop-os: root@pop-os:/home/pop-os# '/home/pop-os/Downloads/xmrig-6.13.1/xmrig'
|
Instead of booting up a live USB stick, I just installed a Ubuntu OS on a 64 USB flash drive. The same "can't bind memory" occurred. |
By the way, I have experimented with Hyper-V both enabled and disabled under Windows 10, and XMRig has always been running properly without any bug prompts. |
Probably needs all 8 sticks or not all nodes get their own RAM, which will anger hwloc pinning cores to RAM they don't have. |
@Spudz76 Threadripper 2990WX has only 4 DRAM channels, not 8. That's the difference to the EPYCs of the same generation. 2 of the 4 dies on the package don't have their own memory controller. Perhaps that's the cause of the issue. But then, all 1st gen Threadrippers would have this problem! |
You are right. This is a bug for the Linux version of XMRig. So I appeal for the developers to correct this problem, even if only for 2990WX and other 1st gen Threadrippers. |
What does your |
|
hwloc-ls is a command. Mostly looking for the NUMANode entries.
|
finally found someone else having this issue, unfortunately it appears there's no resolution other than "download more ram" :). Would setting the topology/thread affinities to ignore nodes with attached memory work as a stopgap? |
I gave up :( |
bump, issue still exists in 6.17.0 2990wx, 8x8GB |
I did begin seeing can't bind errors on Intel systems but killing it and running again it just works. |
So it's likely an OS-related bug. |
Yes or hwloc newer than 2.4.x or something (not sure if 2.3.x->2.4.x deps version bump was about when it began) Things definitely need 2.x.x or GhostRider breaks, so maybe even 2.1.x or 2.2.x would work better. |
Added some better failure messaging since the call could fail in a few ways. The hwloc object lookup works but then the actual binding fails. It actually did it for me the second time I restarted it, luck? As usual a third relaunch it worked fine like the first.
|
I fiddled with this now for several hours and the solution is most probably not software related. I documented it here: Hope this helps! |
@lexo-mfleuti Thanks for that, all very true and could be the issue for some here. However the other half of the issue happens regardless of physical memory locations (one of my systems has every slot full, still occasionally fails but works the next run). |
@lexo-mfleuti WX series Threadrippers have 2 CCD/CCX's not connected directly to memory, they must cross through other CCD/CCX's before they can be seen by the memory controller, I think the problem is that for some of us XMRIG is miss-identifying useable CPU cores due to this. The memory latency is higher due to the way the CPU's are made but the cores are still useable. This increased memory latency might be why XMRIG can be re-started and the next time it works fine for some. As a side-note, I have been able to replicate a similar issue using CPUMiner for Raptoreum. Sometimes when I perform a tune_config it will disable half the true cores and performance will reflect that (~2.5-3.3Kh average 24 hours). With a "good" tune that all the cores stayed enabled the performance is approximately what it should be (5Kh average over last 6 days). The old Threadripper 1950x I had did 2.6-2.7Kh average per 24 hours so this is probably fairly close to correct. BTW I run with SMT disabled and memory in Channel interleave mode to force NUMA nodes to enable and optimize memory access latency as best as possible. I have 8x8GB dual rank Samsung E-Die, so my slots and ranks are literally full, lol. Something of note is that running the infinity fabric faster (3200mhz vs 3000/2400/etc) or using Channel/auto interleave mode might have an affect on XMRIG for others who cant get it to bind the memory at all. I have done extensive testing for the interleave modes on CPUMiner for RTM and only Channel mode is able to perform well, even if I managed to get a "clean" tune on any other mode. I just checked, hwloc 2.1.0 dfsg4 on my Mint box |
There is warning: "can't bind memory" under Fedora and Ubuntu, but no problem occurs under Windows 10. So this problem seems to be OS-related.
msr register values for "ryzen_17h" preset have been set successfully (4 ms)
[2021-07-10 22:01:35.357] randomx init datasets algo rx/0 (64 threads) seed 36e5ffbb4d64a1f3...
[2021-07-10 22:01:35.657] randomx #0 allocated 2080 MB huge pages 100% (300 ms)
[2021-07-10 22:01:35.954] randomx #2 allocated 2080 MB huge pages 100% (596 ms)
[2021-07-10 22:01:35.995] randomx #0 allocated 256 MB huge pages 100% +JIT (41 ms)
[2021-07-10 22:01:35.995] randomx -- allocated 4416 MB huge pages 100% 2208/2208 (637 ms)
[2021-07-10 22:01:37.738] randomx #0 dataset ready (1743 ms)
[2021-07-10 22:01:38.202] randomx #2 dataset ready (464 ms)
[2021-07-10 22:01:38.202] cpu use profile rx (32 threads) scratchpad 2048 KB
[2021-07-10 22:01:38.204] CPU #32 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #34 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #36 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #38 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #40 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #42 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #44 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #46 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #50 warning: "can't bind memory"
[2021-07-10 22:01:38.204] CPU #52 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #48 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #54 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #58 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #56 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #62 warning: "can't bind memory"
[2021-07-10 22:01:38.205] CPU #60 warning: "can't bind memory"
[2021-07-10 22:01:38.227] cpu READY threads 32/32 (32) huge pages 100% 32/32 memory 65536 KB (25 ms)
[2021-07-10 22:01:52.037] cpu accepted (1/0) diff 480045 (574 ms)
CPU: AMD 2990WX
The text was updated successfully, but these errors were encountered: