-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_dtype fails with a segfault on LLVM on linux #1367
Comments
This will add more info to the console. |
It works in macOS (Metal & CPU), but ran into the same issue on Ubuntu 22.04 (clang 14).
IR
It crashes at |
I swapped out the llvmir version with numpy impl, and it works. It seems the generated llvmir conversion for bfloat16 doesn't work well. |
$ sha256sum weights/LLaMA-2/7B/consolidated.00.pth
d67a91807d5879d193a694da57f28ff85092e92dc9fbef4888bd05e22b15ab75 weights/LLaMA-2/7B/consolidated.00.pth
$ export PYTHONFAULTHANDLER=1
$ python3 examples/llama.py --prompt="Hello." --temperature=0 --gen 2
using CPU backend
using LLaMA-2-7B model
ram used: 0.00 GB, layers.0.attention.wq.weight : 0%| | 0/292 [00:00<?, ?it/s]Fatal Python error: Segmentation fault
Thread 0x00007f1db286c700 (most recent call first):
File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 306 in wait
File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 558 in wait
File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 932 in _bootstrap_inner
File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 890 in _bootstrap
Current thread 0x00007f1dd5618180 (most recent call first):
File "/home/x/workspace/project/tinygrad/tinygrad/runtime/ops_llvm.py", line 62 in __call__
File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 142 in __call__
File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 139 in exec
File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 197 in exec_ast
File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 157 in realize
File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 352 in _realize_from
File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 144 in realize
File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 147 in realize
File "/home/x/workspace/project/tinygrad/tinygrad/tensor.py", line 94 in realize
File "/home/x/workspace/project/tinygrad/tinygrad/state.py", line 56 in load_state_dict
File "examples/llama.py", line 246 in build
File "examples/llama.py", line 387 in <module>
Segmentation fault (core dumped) |
I did a bit more debugging, it seems to be related to missing Script to reproduce in Ubuntu: from tinygrad.tensor import Tensor
a = Tensor([1.0]).to("LLVM").half().to('CPU').realize()
print(a.numpy()) The compiled assembly .text
.file "<string>"
.globl exec # -- Begin function exec
.p2align 4, 0x90
.type exec,@function
exec: # @exec
.cfi_startproc
# %bb.0: # %entry
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset %rbx, -16
movq %rdi, %rbx
movss (%rsi), %xmm0 # xmm0 = mem[0],zero,zero,zero
movabsq $__gnu_f2h_ieee, %rax
callq *%rax
movw %ax, (%rbx)
popq %rbx
.cfi_def_cfa_offset 8
retq
.Lfunc_end0:
.size exec, .Lfunc_end0-exec
.cfi_endproc
# -- End function
.section ".note.GNU-stack","",@progbits Core dump after disassembling: (gdb) disassemble exec
Dump of assembler code for function exec:
0x00007fc078c43000 <+0>: push %rbx
0x00007fc078c43001 <+1>: mov %rdi,%rbx
0x00007fc078c43004 <+4>: movss (%rsi),%xmm0
0x00007fc078c43008 <+8>: movabs $0x0,%rax
0x00007fc078c43012 <+18>: call *%rax
0x00007fc078c43014 <+20>: mov %ax,(%rbx)
0x00007fc078c43017 <+23>: pop %rbx
0x00007fc078c43018 <+24>: ret
End of assembler dump. It seems @prusnak @wozeparrot if you can chime in to help here.. |
Why isn't this failing in CI, cuz running But yea, we probably need compiler-rt for half support on most x86 systems, I thought fptrunc gets compiled down to something else tho? |
Hopefully will be fixed in llvmlite 0.41? See numba/llvmlite#909 |
|
|
You're right https://reviews.llvm.org/D4927. I tried to compile numba/llvmlite#909 locally and linked
Still the same seg fault.. scratching my head. |
Tried to compile a dummy LLVM IR to assembly, linking
assembly
So |
Finally fixed the issue by setting proper flags in
|
I've posted a potential workaround for the issue in #2915 |
$ python3 examples/llama.py --gen 2
using CPU backend
using LLaMA-2-7B model
ram used: 0.00 GB, layers.0.attention.wq.weight : 0%| | 0/292 [00:00<?, ?it/s]Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered: