Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file" #2226

JChunX · 2023-11-06T18:10:35Z

Hi, I am going through the tinygrad documentation and trying to run the basic Tensor example on abstractions.py

Environment:

M1 Mac, Sonoma 14.0,
Python 3.10.13
tinygrad source install Nov 6th

try_tensor.py:

from tinygrad.tensor import Tensor
a = Tensor([2])
b = Tensor([3])
result = a + b
print(f"{a.numpy()} + {b.numpy()} = {result.numpy()}")

However I am getting an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tinygrad/tinygrad/tensor.py", line 126, in numpy
    return self.detach().cast(dtypes.from_np(self.dtype.np)).contiguous().to('CPU').realize().lazydata.realized.toCPU().reshape(self.shape)
  File "/tinygrad/tinygrad/tensor.py", line 105, in realize
    run_schedule(self.lazydata.schedule())
  File "/tinygrad/tinygrad/realize.py", line 27, in run_schedule
    si.out.realized = Device[si.out.device].exec_ast(si.ast, output=si.out, inputs=si.inputs, var_vals=si.var_vals, **si.out._device_extra_args())
  File "/tinygrad/tinygrad/ops.py", line 292, in exec_ast
    if ast not in self.method_cache: self.method_cache[ast] = get_program()
  File "/tinygrad/tinygrad/ops.py", line 289, in get_program
    return self.to_program(k)
  File "/tinygrad/tinygrad/ops.py", line 239, in to_program
    display_name=k.display_name, runtime_args=runtime_args).build(self.compiler, self.runtime)
  File "/tinygrad/tinygrad/ops.py", line 193, in build
    self.lib = compiler.__wrapped__(self.prg) if getenv("DISABLE_COMPILER_CACHE") else compiler(self.prg)
  File "/tinygrad/tinygrad/helpers.py", line 206, in wrapper
    return diskcache_put(table, key, func(*args, **kwargs))
  File "/tinygrad/tinygrad/runtime/ops_metal.py", line 51, in compile_metal
    unwrap(library.serializeToURL_error_(Cocoa.NSURL.URLWithString_(f"file://{output_file.name}"), None))
  File "/tinygrad/tinygrad/runtime/ops_metal.py", line 38, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.}

This was the Metal program generated:

#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data0, 
               const device float* data1, 
               const device float* data2, 
               uint3 gid [[threadgroup_position_in_grid]], 
               uint3 lid [[thread_position_in_threadgroup]]) {
    float val0 = *(data1+0);
    float val1 = *(data2+0);
    *(data0+0) = (val0+val1);
}

Compile options:

<MTLCompileOptionsInternal: 0x10b7eea90>
    preprocessorMacros:  
    fastMathEnabled = 1 
    framebufferReadEnabled = 0 
    preserveInvariance = 0 
    optimizationLevel = MTLLibraryOptimizationLevelDefault 
    libraryType = MTLLibraryTypeExecutable 
    installName = <null> 
    compileSymbolVisibility =  0 
    allowReferencingUndefinedSymbols =  0 
    maxTotalThreadsPerThreadgroup =  0 
    languageVersion = default

The text was updated successfully, but these errors were encountered:

alexbaden · 2023-11-08T02:37:56Z

I noticed the same issue. If you go through Xcode instead of the Metal shader API via METAL_XCODE then your example works, but other examples are pretty slow (I suppose because the number of shader compiles is both slower and going way up?)

alexbaden · 2023-11-08T04:10:17Z

I managed to get an output by setting the metal library type to be dynamic and adding an install path to the MTLCompileOptions. This gets me the an output but now I am seeing the following error when we load the function from the compiled shader library:

validateMTLFunctionType:8593: failed assertion `type is not a valid MTLFunctionType.'

I managed to dump the shader output, which looks like this:

; ModuleID = 'shader.air'
source_filename = "/var/folders/cb/x67yc6g126nfkr5_tr0cdcnh0000gn/T/tmpw5t7x1sm.metallib"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx14.1.0"

; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind willreturn
define void @E_(float addrspace(1)* nocapture noundef writeonly "air-buffer-no-alias" %data0, float addrspace(1)* nocapture noundef readonly "air-buffer-no-alias" %data1, float addrspace(1)* nocapture noundef readonly "air-buffer-no-alias" %data2, <3 x i32> noundef %gid, <3 x i32> noundef %lid) local_unnamed_addr #0 {
entry:
  %0 = load float, float addrspace(1)* %data1, align 4, !tbaa !14, !alias.scope !18, !noalias !21
  %1 = load float, float addrspace(1)* %data2, align 4, !tbaa !14, !alias.scope !24, !noalias !25
  %sub = fsub fast float %0, %1
  store float %sub, float addrspace(1)* %data0, align 4, !tbaa !14, !alias.scope !26, !noalias !27
  ret void
}

attributes #0 = { argmemonly mustprogress nofree norecurse nosync nounwind willreturn "approx-func-fp-math"="true" "frame-pointer"="all" "min-legal-vector-width"="96" "no-builtins" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "unsafe-fp-math"="true" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!air.version = !{!8}
!air.compile_options = !{!9, !10, !11}
!llvm.ident = !{!12}
!air.language_version = !{!13}

The code looks correct, but the metadata is not right. I built the same shader locally using the metal command line tools installed with Xcode 15 and got a target triple for macOS 14.0.0 and a different metal version in the later metadata. A working shader his additional metadata for the kernel type:

!air.kernel = !{!15}
!15 = !{void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*, <3 x i32>, <3 x i32>)* @E_, !16, !17}

Could there be any issue with whatever driver the GPU is using for runtime compilation? Or perhaps some bug in pyobjc, but I don't think that's likely as there have been no major changes to the metal shader library in macOS 14.

This is on MacOS 14.1 by the way. Everything worked fine on this machine before I upgraded to Sonoma.

alexbaden · 2023-11-09T02:37:39Z

I spent some more time looking into this, and I am stuck. It looks like the main issue is in macOS 14 you cannot get a MTLFunction from a MTLDynamicLibrary, and MTLDynamicLibrary is the only library object with the serializeTo method. Even a small standalone example in swift gives the error:

validateMTLFunctionType, line 8593: error 'type is not a valid MTLFunctionType.'

Buried in the header file I found this comment about MTLLibraryType:

 /**
     @property type
     @abstract Which type the library should be compiled as. The default value is MTLLibraryTypeExecutable.
     @discussion MTLLibraryTypeExecutable is suitable to build a library of "kernel", "vertex" and "fragment" qualified functions.
     MTLLibraryType is suitable when the compilation result will instead be used to instantiate a MTLDynamicLibrary.
     MTLDynamicLibrary contains no qualified functions, but it's unqualified functions and variables can be used as an external dependency for compiling other libraries.
    */
    @available(macOS 11.0, *)
    open var libraryType: MTLLibraryType

But I have not found anything in the documentation about how to use unqualified functions - and presumably we want to compile to some sort of byte representation in the compile_metal function, not build some external library and link it elsewhere.

In summary, it looks like the executable MTLLibrary is what we want, but whatever process allowed us to get the executable library as bytes on macOS 13 has changed - and using serializable dynamic libraries don't seem to give us access to the function through the runtime (perhaps they did before?).

geohot · 2023-11-09T19:56:08Z

I have macOS 13. Is 14 broken?

boltzmann-brain · 2023-11-10T17:48:59Z

I have the same error on macOS 13.4.1 trying to run the mnist example.

Traceback (most recent call last):
  File "/Users/leif-hancox-li/tinygrad/examples/mnist_gan.py", line 98, in <module>
    loss_d += train_discriminator(optim_d, data_real, data_fake)
  File "/Users/leif-hancox-li/tinygrad/examples/mnist_gan.py", line 61, in train_discriminator
    optimizer.step()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/nn/optim.py", line 52, in step
    self.t.assign(self.t + 1).realize()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/tensor.py", line 105, in realize
    run_schedule(self.lazydata.schedule())
  File "/Users/leif-hancox-li/tinygrad/tinygrad/realize.py", line 27, in run_schedule
    si.out.realized = Device[si.out.device].exec_ast(si.ast, output=si.out, inputs=si.inputs, var_vals=si.var_vals, **si.out._device_extra_args())
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 292, in exec_ast
    if ast not in self.method_cache: self.method_cache[ast] = get_program()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 289, in get_program
    return self.to_program(k)
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 237, in to_program
    return ASTRunner(k.function_name, src, k.global_size, k.local_size,
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 193, in build
    self.lib = compiler.__wrapped__(self.prg) if getenv("DISABLE_COMPILER_CACHE") else compiler(self.prg)
  File "/Users/leif-hancox-li/tinygrad/tinygrad/helpers.py", line 216, in wrapper
    return diskcache_put(table, key, func(*args, **kwargs))
  File "/Users/leif-hancox-li/tinygrad/tinygrad/runtime/ops_metal.py", line 51, in compile_metal
    unwrap(library.serializeToURL_error_(Cocoa.NSURL.URLWithString_(f"file://{output_file.name}"), None))
  File "/Users/leif-hancox-li/tinygrad/tinygrad/runtime/ops_metal.py", line 38, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.}

roelofvandijk · 2023-11-17T20:24:19Z

Runs fine on macOS 14.1.1, macbook pro(m2). The library that is generated has these options:

<native-selector libraryType of <MTLCompileOptionsInternal: 0x12664bdb0>
    preprocessorMacros:  
    fastMathEnabled = 1 
    framebufferReadEnabled = 0 
    preserveInvariance = 0 
    optimizationLevel = MTLLibraryOptimizationLevelDefault 
    libraryType = MTLLibraryTypeExecutable 
    installName = <null> 
    compileSymbolVisibility =  0 
    allowReferencingUndefinedSymbols =  0 
    maxTotalThreadsPerThreadgroup =  0 
    languageVersion = default>

@alexbaden: Note that this is a MTLLibraryTypeExecutable that serializes without error.

nullhook · 2023-11-17T23:19:54Z

this seems to be working on macOS 14, macbook pro (m1)

here's the IR code for the exact same kernel, which works. the metal api recognizes it as a valid metallib file.

; ModuleID = 'shader.air'
source_filename = "E_"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx14.1.0"

; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind willreturn
define void @E_(float addrspace(1)* nocapture writeonly "air-buffer-no-alias" %data0, float addrspace(1)* nocapture readonly "air-buffer-no-alias" %data1, float addrspace(1)* nocapture readonly "air-buffer-no-alias" %data2, <3 x i32> %gid, <3 x i32> %lid) local_unnamed_addr #0 {
entry:
  %0 = load float, float addrspace(1)* %data1, align 4, !tbaa !22, !alias.scope !26, !noalias !29
  %1 = load float, float addrspace(1)* %data2, align 4, !tbaa !22, !alias.scope !32, !noalias !33
  %add = fadd fast float %1, %0
  store float %add, float addrspace(1)* %data0, align 4, !tbaa !22, !alias.scope !34, !noalias !35
  ret void
}

attributes #0 = { argmemonly mustprogress nofree norecurse nosync nounwind willreturn "approx-func-fp-math"="true" "frame-pointer"="all" "min-legal-vector-width"="96" "no-builtins" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "unsafe-fp-math"="true" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}
!air.version = !{!9}
!air.language_version = !{!10}
!air.compile_options = !{!11, !12, !13}
!air.kernel = !{!14}

immediate mismatches i see, language_version, compile_options, ident, and kernel.

you could try patching the output binary with the above metadata and load it as a test.

roelofvandijk · 2023-11-20T08:30:56Z

This seems promising: #2369

alexbaden · 2023-11-20T20:40:21Z

I am still having this problem - M1 MacBook Air (2020) with macOS 14.1.1. Interestingly with #2369 I get a different problem:

  File "/Users/alexb/Projects/tinygrad/tinygrad/ops.py", line 314, in to_program
    display_name=k.display_name, runtime_args=runtime_args).build(self.compiler, self.runtime)
  File "/Users/alexb/Projects/tinygrad/tinygrad/ops.py", line 280, in build
    self.clprg = runtime(self.name, self.lib)
  File "/Users/alexb/Projects/tinygrad/tinygrad/runtime/ops_metal.py", line 61, in __init__
    self.library = unwrap(METAL.device.newLibraryWithData_error_(data, None))
  File "/Users/alexb/Projects/tinygrad/tinygrad/runtime/ops_metal.py", line 45, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file}

where line 61 is the call to newLibraryWithData. It would seem something is whacky with exporting the compiled shader library on certain Mseries Macs? This worked fine on the exact same machine w/ macOS 13.x.

nullhook · 2023-11-20T20:58:07Z

the underlying issue is, serializeToURL isn't an exposed event to MTLLibrary instance.

The pyobjc bindings we use inspects the objective-c runtime for methods, regardless if it's documented or not. In general we should limit ourselves with API described by the Metal.framework.

The new error you're seeing is related to raw bytes being passed to newLibraryWithData, if you use the MetalLibrary instance directly you wouldn't need it.

montezdesousa · 2023-11-20T21:01:33Z

I managed to skip that same error by setting the library type in the options to dynamic. Btw this works the same if writing to file.

But then kernel just crashes at line 70. Because fxn is None, even though it is inside the library...

snapshot of library

nullhook · 2023-11-20T21:05:11Z

MTLDynamicLibrary isn't a viable solution, it isn't supported on all devices. Macs with Intel or AMD gpus will suffer.

nullhook · 2023-11-20T21:19:29Z

@alexbaden can you try this: #2372 (still WIP)

alexbaden · 2023-11-20T22:21:54Z

#2372 resolves this issue for me!

bojanbabic · 2023-11-29T22:10:17Z

Same issue on Mac M1 with Sonoma 14.1.1 and Apple Silicon. It works if I apply patch from #2372 and add:
METAL_XCODE=1 to the shell script.

BrianLitwin · 2023-11-30T21:42:37Z

Same issue on Mac w/ 13.6 - "Invalid library file" at METAL.device.newLibraryWithData_error_ - #2372 resolves the issue

arthur-brainchain · 2023-12-03T23:58:21Z

I tried to run the patch from the other repo as mentioned by @bojanbabic above, and set METAL_XCODE, etc, but still not able to run, for example, beautiful_mnist.py or beautiful_cartpole.py with the patch. Could someone care to share a set of build steps with the fix, or alternatively suggest something else?

System Info:

Sonoma 14.1.1

nima10khodaveisi · 2023-12-13T12:16:53Z

xcode-select --switch /Applications/Xcode.app/Contents/Developer

fixed the issue for me.

SamRaymond · 2023-12-27T18:19:56Z

This took a while...
My M2 Air works like a charm, Metal was in the site-packages, for my M1 Pro 16, Metal wasn't in the packages
If I run METAL_XCODE=1 python docs/abstractions2.py
with DEVICE = "GPU" this works for me, same for other files.

Running on M1 Macbook Pro 16 inch 2021 (14.2.1 (23C71))

claydugo · 2023-12-27T21:13:22Z

Piggybacking from @SamRaymond 's work.

I got to this issue after trying to run the hello word matmul example in the README

$ METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
*** METAL   rand  seed 1703711284 size 1048576         dtype dtypes.float
*** METAL   rand  seed 1703711285 size 1048576         dtype dtypes.float
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM (1024, 1024, 1)
  2    ┗━┳ MUL
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
TENSOR CORES [(1, 1024, 1)] [(0, 1024, 1024)] tensor_core<METAL, [8, 8, 8], dtypes.float, dtypes.float>
3 alias 1: idxs= [Variable('gidx0', 0, 31), NumNode(0), NumNode(0), Variable('lidx3', 0, 1), NumNode(0), ((Variable('lidx4', 0, 15)//2)%4), NumNode(0), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)%2)*2)+((Variable('lidx4', 0, 15)//8)*4)+Variable('None', 0, 1)), NumNode(0), Variable('None', 0, 3), NumNode(0)]
4 alias 2: idxs= [NumNode(0), Variable('gidx1', 0, 7), Variable('lidx2', 0, 3), NumNode(0), (Variable('lidx4', 0, 15)//8), NumNode(0), (Variable('lidx4', 0, 15)%2), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)//2)%4)+(Variable('lidx3', 0, 1)*4)), Variable('None', 0, 1), NumNode(0), Variable('None', 0, 3)]
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 131, in numpy
    return self.cast(self.dtype.scalar()).contiguous().realize().lazydata.base.realized.toCPU().astype(self.dtype.np, copy=True).reshape(self.shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 101, in realize
    run_schedule(self.lazydata.schedule())
  File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 28, in run_schedule
    prg = lower_schedule_item(si)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 21, in lower_schedule_item
    return Device[si.out.device].get_runner(si.ast)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 314, in get_runner
    def get_runner(self, ast:LazyOp) -> CompiledASTRunner: return self.to_program(self.get_linearizer(ast))
                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 286, in to_program
    return CompiledASTRunner(k.ast, k.name, src, k.global_size, k.local_size, runtime_args).build(self.compiler, self.runtime)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 255, in build
    self.clprg = runtime(self.name, self.lib)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/runtime/ops_metal.py", line 30, in __init__
    self.library = unwrap2(self.device.device.newLibraryWithData_error_(data, None))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/helpers.py", line 46, in unwrap2
    assert err is None, str(err)
           ^^^^^^^^^^^
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file}

with GPU=1

GPU=1 METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
CLDevice: got 1 platforms and 1 devices
*** GPU   rand  seed 1703711410 size 1048576         dtype dtypes.float
*** GPU   rand  seed 1703711411 size 1048576         dtype dtypes.float
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM (1024, 1024, 1)
  2    ┗━┳ MUL
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
*** GPU        1 r_32_16_8_16_256_4_4_4                arg   3 mem  0.01 GB tm   7579.38us/     7.58ms (  283.33 GFLOPS,    1.66 GB/s)
2.5320332e-09
avg:   283.33 GFLOPS     1.66 GB/s           total:     1 kernels     2.15 GOPS     0.01 GB     7.58 ms

On Sonoma 14.2.1, MacBook Air M1 2020

10-zin · 2023-12-28T19:42:56Z

For all facing this issue -> just run the existing master with these two env variables "METAL_XCODE=1 DISABLE_COMPILER_CACHE=1".
It passes all the tests for m1 macos sonoma 14.x.x

Essentialy there is some issue with when you use the cached metal library in M1 air macos14.x.x atleast.
With DISABLE_COMPILER_CACHE=1 in line 254 in device.py we call the wrapped attr.. so basically without the disckcache added functionality and it re-compiles everytime which makes it work.

Got the hint from #2372 and the latest discussion there.
this way u dont even need to do any change in the code.

DKormann · 2024-02-28T09:32:46Z

@certik you probably need to clear cache should be at ~/Library/Caches/tinygrad/

dattienle2573 · 2024-03-04T04:28:15Z

GPU=1 works for me but any ideas how to run with ipynb files?

DKormann · 2024-03-04T16:31:59Z

@dattienle2573
from tinygrad import Device
Device.DEFAULT = "GPU"

Leikoe · 2024-03-14T11:06:53Z

seems related to this issue in pyobjc: ronaldoussoren/pyobjc#580,
people can/can't run the same snippet based on their python version.

Don't know if this helps, explains it...., but basically this is the error

import Metal, libdispatch

prg="""#include <metal_stdlib>
using namespace metal;
kernel void r_5(device int* data0, const device int* data1, uint3 gid [[threadgroup_position_in_grid]], uint3 lid [[thread_position_in_threadgroup]]) {
  int acc0 = -2147483648;
  int val0 = *(data1+0);
  int val1 = *(data1+1);
  int val2 = *(data1+2);
  int val3 = *(data1+3);
  int val4 = *(data1+4);
  int alu0 = max(((val0+1)*2*val0),0);
  int alu1 = max(((val1+1)*2*val1),0);
  int alu2 = max(((val2+1)*2*val2),0);
  int alu3 = max(((val3+1)*2*val3),0);
  int alu4 = max(((val4+1)*2*val4),0);
  int alu5 = max(alu0,acc0);
  int alu6 = max(alu1,alu5);
  int alu7 = max(alu2,alu6);
  int alu8 = max(alu3,alu7);
  int alu9 = max(alu4,alu8);
  *(data0+0) = alu9;
}"""



options = Metal.MTLCompileOptions.new()

print(f"{options=}")
print(f"{options.libraryType=}")
compiler = Metal.MTLCreateSystemDefaultDevice()




def unwrap2(x):
  ret, err = x
  assert err is None, str(err)
  return ret

r = compiler.newLibraryWithSource_options_error_(prg, options, None)
library = unwrap2(r)
lib = library.libraryDataContents().bytes().tobytes()

device = compiler

#### 
data = libdispatch.dispatch_data_create(lib, len(lib), None, None)
library = unwrap2(device.newLibraryWithData_error_(data, None))
fxn = library.newFunctionWithName_("r_5")
pipeline_state = unwrap2(device.newComputePipelineStateWithFunction_error_(fxn, None))

Which outputs

Traceback (most recent call last):
  File "/Users/tyoc213/github/tinygrad/test3.py", line 50, in <module>
    library = unwrap2(device.newLibraryWithData_error_(data, None))
  File "/Users/tyoc213/github/tinygrad/test3.py", line 38, in unwrap2
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file}

Don't know if that makes more clear what is going on?

the snippet here works in python 3.12 but doesn't in 3.10.12 for me
note that running tinygrad with metal still gives the same error even though the snippet runs without error.

Leikoe · 2024-03-14T11:11:25Z

update: it was compiler cache, the error in this issue doesn't occur when using python 3.12

Leikoe · 2024-03-15T08:22:59Z

both python 3.12 and 3.10 use pyobjc-* 10.1.
Did the following tests:

Compile shader with python3.10, load on python3.12 (doesn't work)
Compile shader with python3.12, load on python3.10 (works)

Seems that we have a problem with pyobjc metal compiler bindings on python <3.12
edit: after looking through the compiled shader by python3.10 objc-metal, we see that it doesn't begin by MTLB magic

Leikoe · 2024-03-15T09:09:51Z

same shader compiled with same script found above on python 3.12 (left) and 3.10 on the right. Both the beginning and end of the compiled shader are wrong on python <3.12

certik · 2024-03-15T14:04:20Z

@Leikoe great job investigating this!

Leikoe · 2024-03-15T14:31:38Z

looks like xcode doesn't use the same metal compiler as objc metal bindings (14.0.0 for xcode and 14.1.0 on my machine)

Leikoe · 2024-03-15T15:48:18Z

looking at metadata in the generated library bytes by objc-metal bindings

(left is python3.12 and right is python3.10)

We see that they both are using the same "Apple metal version 32023.26"

Leikoe · 2024-03-15T17:02:39Z

TLDR: on python 3.10.12, compiler.newLibraryWithSource_options_error_(prg, options, None) produces an invalid library.

Workarounds: use @10-zin 's answer

For all facing this issue -> just run the existing master with these two env variables "METAL_XCODE=1 DISABLE_COMPILER_CACHE=1". It passes all the tests for m1 macos sonoma 14.x.x

Essentialy there is some issue with when you use the cached metal library in M1 air macos14.x.x atleast. With DISABLE_COMPILER_CACHE=1 in line 254 in device.py we call the wrapped attr.. so basically without the disckcache added functionality and it re-compiles everytime which makes it work.

Got the hint from #2372 and the latest discussion there. this way u dont even need to do any change in the code.

or use python 3.12 or 3.8

(just discovered that it works in 3.8 idk at this point)

Leikoe · 2024-03-15T23:27:51Z

new update: It is probably not a python version problem but rather a miniforge3 (or maybe conda?) problem.

my python3.10 was miniforge3's and by using brew's it fixed it.

This still doesn't make much sense as my pyobjc bindings are from pypi anyways and same ver.

update: created two venvs, one with brew's python3.10 and one with miniforge3's conda python3.10 and diff -bur all 4 pyobjc packages are identical, it's not pyobjc.

starting to think it's miniconda3's python binary 😭

pushpendre · 2024-03-16T23:37:01Z

Yeah can confirm, I was able to run the program using the following commands

/Users/$USER/.homebrew/bin/python3 -m venv hb_py_3_12
source hb_py_3_12/bin/activate
python -m pip install git+https://github.com/tinygrad/tinygrad.git
cat > try_tensor.py <<EOF
from tinygrad.tensor import Tensor
a = Tensor([2])
b = Tensor([3])
result = a + b
print(f"{a.numpy()} + {b.numpy()} = {result.numpy()}")
EOF
DEBUG=9 python try_tensor.py

The above works and shows that it's using METAL, but then if I switch to conda's python

deactivate # deactivate existing virtualenv
conda activate tg
which python # /Users/$USER/anaconda3/envs/tg/bin/python
python try_tensor.py

then I get the error.

Both conda and the virtualenv have same-ish python version

/Users/$USER/anaconda3/envs/tg/bin/python3 --version # Python 3.12.1
hb_py_3_12/bin/python --version # Python 3.12.2

The info strings for conda and homebrew's python are as follows:

Python 3.12.1 | packaged by Anaconda, Inc. | (main, Jan 19 2024, 09:45:58) [Clang 14.0.6 ] on darwin
Python 3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin

Leikoe · 2024-03-17T17:09:50Z

likely a pyobjc/libobjc problem, sends logs differ with same python + pyobjc* packages

Leikoe · 2024-03-17T17:27:28Z

weirder than expected, objective C logs from the working system python3.10.13 are duplicated but conda's python3.10.13 aren't ??

Leikoe · 2024-03-17T19:58:10Z

For reference:
NSObjCMessageLoggingEnabled=YES or https://github.com/ronaldoussoren/pyobjc/blob/master/pyobjc-core/Examples/Scripts/instrumentSends.py to log sends
DYLD_PRINT_LIBRARIES=1 to see loaded dyld
OBJC_HELP Objective-C runtime debugging. Set variable=YES to enable.

see also:
https://wincent.com/wiki/Instrumenting_Objective-C_message_sends
https://www.dribin.org/dave/blog/archives/2006/04/22/tracing_objc/

Leikoe · 2024-03-17T23:06:50Z

when running the same script with conda python3.10.13 and brew python 3.10.13 with
OBJC_PRINT_CLASS_SETUP=1, the following can be seen:

conda's (tinygrad3.10.13):

OBJC_PRINT_CLASS_SETUP=1 python extra/create_metallib.py                                                                                                               
objc[75281]: CLASS: found 78 classes during launch

brew:

(.venv) leo@leikos-macbook-pro tinygrad % OBJC_PRINT_CLASS_SETUP=1 python extra/create_metallib.py
objc[75405]: CLASS: found 2232 classes during launch

The objective C code running is indeed different when using conda's python

Leikoe · 2024-03-18T10:07:50Z

This bug still occurs when using conda packaged PyObjc.

Leikoe · 2024-03-18T10:12:58Z

bug occurred in pyobjc=10.1 and still occurs in pyobjc=10.2

the-praxs · 2024-03-20T11:00:34Z

Got this error on my conda python 3.12. So can't say its working on 3.12 entirely.

Leikoe · 2024-03-20T15:18:49Z

Got this error on my conda python 3.12. So can't say its working on 3.12 entirely.

Maybe I wasn't clear enough, all conda python versions I've tried had made this bug occur.

eugeneteoh · 2024-03-20T19:46:34Z

I'm getting the same error with miniconda on my Intel Macbook Pro 2020. Installing tinygrad directly on system python works fine. Another confirmation that this is specific to miniconda.

Leikoe · 2024-04-17T13:30:13Z

if anyone wants to help b -[_MTLDevice newLibraryWithSource:options:error:] when lldb metal_minimal.py with

import ctypes
from ctypes import cdll, c_char_p, c_void_p, c_bool, CDLL, c_uint, c_ulong, util, c_int


libobjc = CDLL(util.find_library("objc"))

# Class objc_getClass(const char *name)
libobjc.objc_getClass.restype = c_void_p
libobjc.objc_getClass.argtypes = [c_char_p]
ensure_bytes = lambda bs: bs if isinstance(bs, bytes) else bs.encode()
getClass = lambda name: libobjc.objc_getClass(ensure_bytes(name))

libobjc.sel_registerName.restype = c_void_p
libobjc.sel_registerName.argtypes = [c_char_p]

libobjc.objc_msgSend.restype = c_void_p
libobjc.objc_msgSend.argtypes = [c_void_p, c_void_p]

def objc_msgSend(obj, sel, *args, restype=c_void_p, argtypes=[]):
    msgSend = libobjc.objc_msgSend
    msgSend.restype = restype
    msgSend.argtypes = [c_void_p, c_void_p] + argtypes
    return msgSend(obj, libobjc.sel_registerName(sel.encode()), *args)

# CDLL("/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation")
CDLL("/System/Library/Frameworks/Foundation.framework/Foundation")

NSString = getClass("NSString")
assert NSString is not None
def to_nsstring(s: bytes):
    r = objc_msgSend(NSString, "stringWithUTF8String:", ctypes.create_string_buffer(s), argtypes=[c_char_p])
    assert r is not None
    return r
def from_nsstring(nsstring: c_void_p):
    return ctypes.string_at(objc_msgSend(nsstring, "UTF8String"), size=objc_msgSend(nsstring, "length")).decode()
def from_nsdata(nsdata: c_void_p):
    return ctypes.string_at(objc_msgSend(libraryDataContents, "bytes"), size=objc_msgSend(libraryDataContents, "length"))


metal = CDLL("/System/Library/Frameworks/Metal.framework/Metal")
# CDLL("/System/Library/Frameworks/MetalTools.framework/MetalTools")
# CDLL("/System/Library/Frameworks/MetalPerformanceShaders.framework/MetalPerformanceShaders")
core_graphics = CDLL("/System/Library/Frameworks/CoreGraphics.framework/CoreGraphics")

metal.MTLCreateSystemDefaultDevice.restype = c_void_p
metal.MTLCreateSystemDefaultDevice.argtypes = []

dev = metal.MTLCreateSystemDefaultDevice()
print(f"device pointer: {dev}")

source = """
#include <metal_stdlib>
using namespace metal;
kernel void r_32_256_2_20_20n1(device float* data0, const device float* data1, const device float* data2, const device float* data3, const device float* data4, const device float* data5, const device float* data6, uint3 gid [[threadgroup_position_in_grid]], uint3 lid [[thread_position_in_threadgroup]]) {
  threadgroup float temp[256];
  int gidx0 = gid.x; /* 32 */
  int lidx1 = lid.x; /* 256 */
  float acc0 = 0.0f;
  float val0 = *(data2+gidx0);
  float val1 = *(data3+gidx0);
  for (int ridx0 = 0; ridx0 < 2; ridx0++) {
    for (int ridx1 = 0; ridx1 < 20; ridx1++) {
      for (int ridx2 = 0; ridx2 < 20; ridx2++) {
        float val2 = *(data1+(gidx0*400)+(lidx1*25600)+(ridx0*12800)+(ridx1*20)+ridx2);
        int alu0 = ((gidx0*100)+(lidx1*6400)+(ridx0*3200)+((ridx1/2)*10)+(ridx2/2));
        float val3 = *(data4+alu0);
        float val4 = *(data5+alu0);
        float val5 = *(data6+alu0);
        float alu1 = (val2-val0);
        acc0 = ((alu1*val1*((float)(((alu1*val1)==val3))/val4)*val5)+acc0);
      }
    }
  }
  *(temp+lidx1) = acc0;
  threadgroup_barrier(mem_flags::mem_threadgroup);
  if ((lidx1<1)) {
    float acc1 = 0.0f;
    for (int ridx3 = 0; ridx3 < 256; ridx3++) {
      float val6 = *(temp+ridx3);
      acc1 = (val6+acc1);
    }
    *(data0+gidx0) = acc1;
  }
}"""
lib = objc_msgSend(dev, "newLibraryWithSource:options:error:", to_nsstring(source.encode()), c_void_p(0), c_void_p(0), argtypes=[c_void_p, c_void_p, c_void_p])
libraryDataContents = objc_msgSend(lib, "libraryDataContents")
print(from_nsdata(libraryDataContents))

Leikoe · 2024-04-30T16:22:20Z

After some long investigations here are some news:

This bug is due to the compilation request having the wrong type (3 instead of 13) when using conda's python to call metal compiler.

This is set in the XPCCompilerConnection::BuildRequestInternal call during compilation (requestType).

Investigating the root cause right now.

DKormann mentioned this issue Nov 8, 2023

Rnnt 2nd attempt #2150

Closed

wozeparrot added the bug Something isn't working label Nov 8, 2023

alexbaden mentioned this issue Nov 20, 2023

WIP: directly use Metallib instance #2372

Closed

2 tasks

PaulGustafson mentioned this issue Nov 28, 2023

Switch from pyobj to metalcompute #2479

Closed

wozeparrot mentioned this issue Nov 29, 2023

Getting started issue #2509

Closed

This was referenced Dec 1, 2023

AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" right after installing and running test program #2535

Closed

Issue on execution on MacOS M1 #2569

Closed

chenyuxyz pinned this issue Dec 2, 2023

chenyuxyz changed the title ~~Running introductory Tensor example on Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file"~~ Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file" Dec 2, 2023

Leikoe mentioned this issue Mar 18, 2024

Metal library: fail to serialize on new Macos Sonoma 14.1 ronaldoussoren/pyobjc#580

Open

chenyuxyz unpinned this issue Apr 2, 2024

tabashki mentioned this issue Apr 3, 2024

Fix for "MTLLibrary is not formatted as a MetalLib file" issue #4057

Closed

mooihi mentioned this issue May 1, 2024

Fix for MTLLibraryErrorDomain Code=1 "Invalid library file" #4373

Closed

This was referenced May 4, 2024

fix: not using private metal api methods. #4414

Closed

fix: metal on conda on a13 and higher #4483

Closed

Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file" #2226

Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file" #2226

Comments

JChunX commented Nov 6, 2023

alexbaden commented Nov 8, 2023

alexbaden commented Nov 8, 2023 • edited Loading

alexbaden commented Nov 9, 2023

geohot commented Nov 9, 2023

boltzmann-brain commented Nov 10, 2023

roelofvandijk commented Nov 17, 2023 • edited Loading

nullhook commented Nov 17, 2023 • edited Loading

roelofvandijk commented Nov 20, 2023

alexbaden commented Nov 20, 2023

nullhook commented Nov 20, 2023 • edited Loading

montezdesousa commented Nov 20, 2023 • edited Loading

nullhook commented Nov 20, 2023 • edited Loading

nullhook commented Nov 20, 2023 • edited Loading

alexbaden commented Nov 20, 2023

bojanbabic commented Nov 29, 2023 • edited Loading

BrianLitwin commented Nov 30, 2023 • edited Loading

arthur-brainchain commented Dec 3, 2023 • edited Loading

nima10khodaveisi commented Dec 13, 2023

SamRaymond commented Dec 27, 2023

claydugo commented Dec 27, 2023

10-zin commented Dec 28, 2023 • edited Loading

DKormann commented Feb 28, 2024

dattienle2573 commented Mar 4, 2024

DKormann commented Mar 4, 2024

Leikoe commented Mar 14, 2024 • edited Loading

Leikoe commented Mar 14, 2024

Leikoe commented Mar 15, 2024 • edited Loading

Leikoe commented Mar 15, 2024 • edited Loading

certik commented Mar 15, 2024

Leikoe commented Mar 15, 2024

Leikoe commented Mar 15, 2024 • edited Loading

Leikoe commented Mar 15, 2024

Leikoe commented Mar 15, 2024 • edited Loading

pushpendre commented Mar 16, 2024 • edited Loading

Leikoe commented Mar 17, 2024

Leikoe commented Mar 17, 2024

Leikoe commented Mar 17, 2024

Leikoe commented Mar 17, 2024

Leikoe commented Mar 18, 2024

Leikoe commented Mar 18, 2024

the-praxs commented Mar 20, 2024

Leikoe commented Mar 20, 2024

eugeneteoh commented Mar 20, 2024

Leikoe commented Apr 17, 2024

Leikoe commented Apr 30, 2024

alexbaden commented Nov 8, 2023 •

edited

Loading

roelofvandijk commented Nov 17, 2023 •

edited

Loading

nullhook commented Nov 17, 2023 •

edited

Loading

nullhook commented Nov 20, 2023 •

edited

Loading

montezdesousa commented Nov 20, 2023 •

edited

Loading

nullhook commented Nov 20, 2023 •

edited

Loading

nullhook commented Nov 20, 2023 •

edited

Loading

bojanbabic commented Nov 29, 2023 •

edited

Loading

BrianLitwin commented Nov 30, 2023 •

edited

Loading

arthur-brainchain commented Dec 3, 2023 •

edited

Loading

10-zin commented Dec 28, 2023 •

edited

Loading

Leikoe commented Mar 14, 2024 •

edited

Loading

Leikoe commented Mar 15, 2024 •

edited

Loading

Leikoe commented Mar 15, 2024 •

edited

Loading

Leikoe commented Mar 15, 2024 •

edited

Loading

Leikoe commented Mar 15, 2024 •

edited

Loading

pushpendre commented Mar 16, 2024 •

edited

Loading