-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file" #2226
Comments
I noticed the same issue. If you go through Xcode instead of the Metal shader API via |
I managed to get an output by setting the metal library type to be dynamic and adding an install path to the
I managed to dump the shader output, which looks like this:
The code looks correct, but the metadata is not right. I built the same shader locally using the metal command line tools installed with Xcode 15 and got a target triple for macOS 14.0.0 and a different metal version in the later metadata. A working shader his additional metadata for the kernel type:
Could there be any issue with whatever driver the GPU is using for runtime compilation? Or perhaps some bug in This is on MacOS 14.1 by the way. Everything worked fine on this machine before I upgraded to Sonoma. |
I spent some more time looking into this, and I am stuck. It looks like the main issue is in macOS 14 you cannot get a
Buried in the header file I found this comment about
But I have not found anything in the documentation about how to use unqualified functions - and presumably we want to compile to some sort of byte representation in the In summary, it looks like the executable |
I have macOS 13. Is 14 broken? |
I have the same error on macOS 13.4.1 trying to run the mnist example.
|
Runs fine on macOS 14.1.1, macbook pro(m2). The library that is generated has these options:
@alexbaden: Note that this is a |
this seems to be working on macOS 14, macbook pro (m1) here's the IR code for the exact same kernel, which works. the metal api recognizes it as a valid metallib file. ; ModuleID = 'shader.air'
source_filename = "E_"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx14.1.0"
; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind willreturn
define void @E_(float addrspace(1)* nocapture writeonly "air-buffer-no-alias" %data0, float addrspace(1)* nocapture readonly "air-buffer-no-alias" %data1, float addrspace(1)* nocapture readonly "air-buffer-no-alias" %data2, <3 x i32> %gid, <3 x i32> %lid) local_unnamed_addr #0 {
entry:
%0 = load float, float addrspace(1)* %data1, align 4, !tbaa !22, !alias.scope !26, !noalias !29
%1 = load float, float addrspace(1)* %data2, align 4, !tbaa !22, !alias.scope !32, !noalias !33
%add = fadd fast float %1, %0
store float %add, float addrspace(1)* %data0, align 4, !tbaa !22, !alias.scope !34, !noalias !35
ret void
}
attributes #0 = { argmemonly mustprogress nofree norecurse nosync nounwind willreturn "approx-func-fp-math"="true" "frame-pointer"="all" "min-legal-vector-width"="96" "no-builtins" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "unsafe-fp-math"="true" }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}
!air.version = !{!9}
!air.language_version = !{!10}
!air.compile_options = !{!11, !12, !13}
!air.kernel = !{!14} immediate mismatches i see, language_version, compile_options, ident, and kernel. you could try patching the output binary with the above metadata and load it as a test. |
This seems promising: #2369 |
I am still having this problem - M1 MacBook Air (2020) with macOS 14.1.1. Interestingly with #2369 I get a different problem:
where line 61 is the call to |
the underlying issue is, The pyobjc bindings we use inspects the objective-c runtime for methods, regardless if it's documented or not. In general we should limit ourselves with API described by the Metal.framework. The new error you're seeing is related to raw bytes being passed to |
MTLDynamicLibrary isn't a viable solution, it isn't supported on all devices. Macs with Intel or AMD gpus will suffer. |
@alexbaden can you try this: #2372 (still WIP) |
#2372 resolves this issue for me! |
Same issue on Mac M1 with Sonoma 14.1.1 and Apple Silicon. It works if I apply patch from #2372 and add: |
Same issue on Mac w/ 13.6 - "Invalid library file" at |
I tried to run the patch from the other repo as mentioned by @bojanbabic above, and set METAL_XCODE, etc, but still not able to run, for example, beautiful_mnist.py or beautiful_cartpole.py with the patch. Could someone care to share a set of build steps with the fix, or alternatively suggest something else? System Info: Sonoma 14.1.1 |
fixed the issue for me. |
This took a while... Running on M1 Macbook Pro 16 inch 2021 (14.2.1 (23C71)) |
Piggybacking from @SamRaymond 's work. I got to this issue after trying to run the hello word matmul example in the README $ METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
*** METAL rand seed 1703711284 size 1048576 dtype dtypes.float
*** METAL rand seed 1703711285 size 1048576 dtype dtypes.float
0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
1 ┗━┳ SUM (1024, 1024, 1)
2 ┗━┳ MUL
3 ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
4 ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
TENSOR CORES [(1, 1024, 1)] [(0, 1024, 1024)] tensor_core<METAL, [8, 8, 8], dtypes.float, dtypes.float>
3 alias 1: idxs= [Variable('gidx0', 0, 31), NumNode(0), NumNode(0), Variable('lidx3', 0, 1), NumNode(0), ((Variable('lidx4', 0, 15)//2)%4), NumNode(0), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)%2)*2)+((Variable('lidx4', 0, 15)//8)*4)+Variable('None', 0, 1)), NumNode(0), Variable('None', 0, 3), NumNode(0)]
4 alias 2: idxs= [NumNode(0), Variable('gidx1', 0, 7), Variable('lidx2', 0, 3), NumNode(0), (Variable('lidx4', 0, 15)//8), NumNode(0), (Variable('lidx4', 0, 15)%2), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)//2)%4)+(Variable('lidx3', 0, 1)*4)), Variable('None', 0, 1), NumNode(0), Variable('None', 0, 3)]
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 131, in numpy
return self.cast(self.dtype.scalar()).contiguous().realize().lazydata.base.realized.toCPU().astype(self.dtype.np, copy=True).reshape(self.shape)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 101, in realize
run_schedule(self.lazydata.schedule())
File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 28, in run_schedule
prg = lower_schedule_item(si)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 21, in lower_schedule_item
return Device[si.out.device].get_runner(si.ast)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 314, in get_runner
def get_runner(self, ast:LazyOp) -> CompiledASTRunner: return self.to_program(self.get_linearizer(ast))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 286, in to_program
return CompiledASTRunner(k.ast, k.name, src, k.global_size, k.local_size, runtime_args).build(self.compiler, self.runtime)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 255, in build
self.clprg = runtime(self.name, self.lib)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/runtime/ops_metal.py", line 30, in __init__
self.library = unwrap2(self.device.device.newLibraryWithData_error_(data, None))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clay/git/forks/tinygrad/tinygrad/helpers.py", line 46, in unwrap2
assert err is None, str(err)
^^^^^^^^^^^
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file} with GPU=1 METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
CLDevice: got 1 platforms and 1 devices
*** GPU rand seed 1703711410 size 1048576 dtype dtypes.float
*** GPU rand seed 1703711411 size 1048576 dtype dtypes.float
0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
1 ┗━┳ SUM (1024, 1024, 1)
2 ┗━┳ MUL
3 ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
4 ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
*** GPU 1 r_32_16_8_16_256_4_4_4 arg 3 mem 0.01 GB tm 7579.38us/ 7.58ms ( 283.33 GFLOPS, 1.66 GB/s)
2.5320332e-09
avg: 283.33 GFLOPS 1.66 GB/s total: 1 kernels 2.15 GOPS 0.01 GB 7.58 ms On Sonoma 14.2.1, MacBook Air M1 2020 |
For all facing this issue -> just run the existing master with these two env variables "METAL_XCODE=1 DISABLE_COMPILER_CACHE=1". Essentialy there is some issue with when you use the cached metal library in M1 air macos14.x.x atleast. Got the hint from #2372 and the latest discussion there. |
@certik you probably need to clear cache should be at ~/Library/Caches/tinygrad/ |
GPU=1 works for me but any ideas how to run with ipynb files? |
@dattienle2573 |
seems related to this issue in pyobjc: ronaldoussoren/pyobjc#580,
the snippet here works in python 3.12 but doesn't in 3.10.12 for me |
update: it was compiler cache, the error in this issue doesn't occur when using python 3.12 |
both python 3.12 and 3.10 use pyobjc-* 10.1.
Seems that we have a problem with pyobjc metal compiler bindings on python <3.12 |
@Leikoe great job investigating this! |
looks like xcode doesn't use the same metal compiler as objc metal bindings (14.0.0 for xcode and 14.1.0 on my machine) |
TLDR: on python 3.10.12, Workarounds: use @10-zin 's answer
or use python 3.12 or 3.8 (just discovered that it works in 3.8 idk at this point) |
Yeah can confirm, I was able to run the program using the following commands
The above works and shows that it's using METAL, but then if I switch to conda's python
then I get the error. Both conda and the virtualenv have same-ish python version
The info strings for conda and homebrew's python are as follows:
|
likely a pyobjc/libobjc problem, sends logs differ with same python + pyobjc* packages |
weirder than expected, objective C logs from the working system python3.10.13 are duplicated but conda's python3.10.13 aren't ?? |
For reference: see also: |
when running the same script with conda python3.10.13 and brew python 3.10.13 with conda's (tinygrad3.10.13):
brew:
The objective C code running is indeed different when using conda's python |
This bug still occurs when using conda packaged PyObjc. |
bug occurred in pyobjc=10.1 and still occurs in pyobjc=10.2 |
Got this error on my conda python 3.12. So can't say its working on 3.12 entirely. |
Maybe I wasn't clear enough, all conda python versions I've tried had made this bug occur. |
I'm getting the same error with miniconda on my Intel Macbook Pro 2020. Installing tinygrad directly on system python works fine. Another confirmation that this is specific to miniconda. |
if anyone wants to help import ctypes
from ctypes import cdll, c_char_p, c_void_p, c_bool, CDLL, c_uint, c_ulong, util, c_int
libobjc = CDLL(util.find_library("objc"))
# Class objc_getClass(const char *name)
libobjc.objc_getClass.restype = c_void_p
libobjc.objc_getClass.argtypes = [c_char_p]
ensure_bytes = lambda bs: bs if isinstance(bs, bytes) else bs.encode()
getClass = lambda name: libobjc.objc_getClass(ensure_bytes(name))
libobjc.sel_registerName.restype = c_void_p
libobjc.sel_registerName.argtypes = [c_char_p]
libobjc.objc_msgSend.restype = c_void_p
libobjc.objc_msgSend.argtypes = [c_void_p, c_void_p]
def objc_msgSend(obj, sel, *args, restype=c_void_p, argtypes=[]):
msgSend = libobjc.objc_msgSend
msgSend.restype = restype
msgSend.argtypes = [c_void_p, c_void_p] + argtypes
return msgSend(obj, libobjc.sel_registerName(sel.encode()), *args)
# CDLL("/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation")
CDLL("/System/Library/Frameworks/Foundation.framework/Foundation")
NSString = getClass("NSString")
assert NSString is not None
def to_nsstring(s: bytes):
r = objc_msgSend(NSString, "stringWithUTF8String:", ctypes.create_string_buffer(s), argtypes=[c_char_p])
assert r is not None
return r
def from_nsstring(nsstring: c_void_p):
return ctypes.string_at(objc_msgSend(nsstring, "UTF8String"), size=objc_msgSend(nsstring, "length")).decode()
def from_nsdata(nsdata: c_void_p):
return ctypes.string_at(objc_msgSend(libraryDataContents, "bytes"), size=objc_msgSend(libraryDataContents, "length"))
metal = CDLL("/System/Library/Frameworks/Metal.framework/Metal")
# CDLL("/System/Library/Frameworks/MetalTools.framework/MetalTools")
# CDLL("/System/Library/Frameworks/MetalPerformanceShaders.framework/MetalPerformanceShaders")
core_graphics = CDLL("/System/Library/Frameworks/CoreGraphics.framework/CoreGraphics")
metal.MTLCreateSystemDefaultDevice.restype = c_void_p
metal.MTLCreateSystemDefaultDevice.argtypes = []
dev = metal.MTLCreateSystemDefaultDevice()
print(f"device pointer: {dev}")
source = """
#include <metal_stdlib>
using namespace metal;
kernel void r_32_256_2_20_20n1(device float* data0, const device float* data1, const device float* data2, const device float* data3, const device float* data4, const device float* data5, const device float* data6, uint3 gid [[threadgroup_position_in_grid]], uint3 lid [[thread_position_in_threadgroup]]) {
threadgroup float temp[256];
int gidx0 = gid.x; /* 32 */
int lidx1 = lid.x; /* 256 */
float acc0 = 0.0f;
float val0 = *(data2+gidx0);
float val1 = *(data3+gidx0);
for (int ridx0 = 0; ridx0 < 2; ridx0++) {
for (int ridx1 = 0; ridx1 < 20; ridx1++) {
for (int ridx2 = 0; ridx2 < 20; ridx2++) {
float val2 = *(data1+(gidx0*400)+(lidx1*25600)+(ridx0*12800)+(ridx1*20)+ridx2);
int alu0 = ((gidx0*100)+(lidx1*6400)+(ridx0*3200)+((ridx1/2)*10)+(ridx2/2));
float val3 = *(data4+alu0);
float val4 = *(data5+alu0);
float val5 = *(data6+alu0);
float alu1 = (val2-val0);
acc0 = ((alu1*val1*((float)(((alu1*val1)==val3))/val4)*val5)+acc0);
}
}
}
*(temp+lidx1) = acc0;
threadgroup_barrier(mem_flags::mem_threadgroup);
if ((lidx1<1)) {
float acc1 = 0.0f;
for (int ridx3 = 0; ridx3 < 256; ridx3++) {
float val6 = *(temp+ridx3);
acc1 = (val6+acc1);
}
*(data0+gidx0) = acc1;
}
}"""
lib = objc_msgSend(dev, "newLibraryWithSource:options:error:", to_nsstring(source.encode()), c_void_p(0), c_void_p(0), argtypes=[c_void_p, c_void_p, c_void_p])
libraryDataContents = objc_msgSend(lib, "libraryDataContents")
print(from_nsdata(libraryDataContents)) |
After some long investigations here are some news: This bug is due to the compilation request having the wrong type (3 instead of 13) when using conda's python to call metal compiler. This is set in the Investigating the root cause right now. |
Hi, I am going through the tinygrad documentation and trying to run the basic Tensor example on
abstractions.py
Environment:
try_tensor.py:
However I am getting an error:
This was the Metal program generated:
Compile options:
The text was updated successfully, but these errors were encountered: