Describe the bug
I am experiencing hardware initialization failures when using NVIDIA GPUs (RTX 3060, 4080, 4090, 5090) on a Mac mini (M4 Pro, macOS 26.x) via UGreen Thunderbolt 5 and USB4 enclosures. While AMD GPUs (RX 7900 XTX) work flawlessly via macOS native drivers, tinygrad fails during the NV driver handshake.
Ref : #15652
Hardware Environment:
Host: MAC mini (Apple M4 Pro), macOS 16.x
ASM2464 (USB4) -> Works intermittently or better than TBT
UGreen LinkStation ASM2464 (USB4) + RTX3060, 4090 and AMD RX7900 XTX -> Pass
GPUs Tested: RTX 3060, 4080, 4090, 5090 (system info device list can find the eGPU device)
Enclosures: - UGreen LinkStation (Thunderbolt 5) + RTX3060, 4080,4090,5090 -> Fails
Enclosures: - UGreen LinkStation (Thunderbolt 5) + AMD RX7900 XTX -> Pass
Error 1: Architecture Recognition (KeyError)
In nvdev.py, the architecture is read as 0x3F instead of 0x19 (Ada) or 0x17 (Ampere).
Python
File "nvdev.py", line 113, in _early_ip_init
self.chip_name = {0x17: "GA1", 0x19: "AD1", 0x1b: "GB2"}[self.chip_details['architecture']]
KeyError: 63 # 0x3F
Error 2: Reset Timeout
After manually patching the KeyError (add 0x3F: "GA1" on self.chip_name), the driver hangs at wait_for_reset:
TimeoutError: waiting for reset. Timed out after 10000 ms, condition not met: False != True
Analysis:
It seems the TBT5/Titan Ridge controllers are interfering with low-level PCIe PERST# signals or Atomic Operations required by the GSP-RM initialization in user-space. The ASM2464's transparent PCIe tunneling seems more compatible with tinygrad's bare-metal approach than Intel's Thunderbolt implementation.
To Reproduce:
DEV=NV python3 -m tinygrad.llm --benchmark
Additional Context:
I can provide more logs or test on various hardware if needed. Is there a plan to optimize the RPC timeout or the GSP initialization sequence for high-bandwidth/high-latency Thunderbolt 5 tunnels?

Describe the bug
I am experiencing hardware initialization failures when using NVIDIA GPUs (RTX 3060, 4080, 4090, 5090) on a Mac mini (M4 Pro, macOS 26.x) via UGreen Thunderbolt 5 and USB4 enclosures. While AMD GPUs (RX 7900 XTX) work flawlessly via macOS native drivers, tinygrad fails during the NV driver handshake.
Ref : #15652
Hardware Environment:
Host: MAC mini (Apple M4 Pro), macOS 16.x
ASM2464 (USB4) -> Works intermittently or better than TBT
UGreen LinkStation ASM2464 (USB4) + RTX3060, 4090 and AMD RX7900 XTX -> Pass
GPUs Tested: RTX 3060, 4080, 4090, 5090 (system info device list can find the eGPU device)
Enclosures: - UGreen LinkStation (Thunderbolt 5) + RTX3060, 4080,4090,5090 -> Fails
Enclosures: - UGreen LinkStation (Thunderbolt 5) + AMD RX7900 XTX -> Pass
Error 1: Architecture Recognition (KeyError)
In nvdev.py, the architecture is read as 0x3F instead of 0x19 (Ada) or 0x17 (Ampere).
Python
File "nvdev.py", line 113, in _early_ip_init
Error 2: Reset Timeout
After manually patching the KeyError (add 0x3F: "GA1" on self.chip_name), the driver hangs at wait_for_reset:
TimeoutError: waiting for reset. Timed out after 10000 ms, condition not met: False != True
Analysis:
It seems the TBT5/Titan Ridge controllers are interfering with low-level PCIe PERST# signals or Atomic Operations required by the GSP-RM initialization in user-space. The ASM2464's transparent PCIe tunneling seems more compatible with tinygrad's bare-metal approach than Intel's Thunderbolt implementation.
To Reproduce:
DEV=NV python3 -m tinygrad.llm --benchmark
Additional Context:
I can provide more logs or test on various hardware if needed. Is there a plan to optimize the RPC timeout or the GSP initialization sequence for high-bandwidth/high-latency Thunderbolt 5 tunnels?