test_dtype fails with a segfault on LLVM on linux #1367

ghost · 2023-07-28T14:10:35Z

$ python3 examples/llama.py --gen 2
using CPU backend
using LLaMA-2-7B model
ram used: 0.00 GB, layers.0.attention.wq.weight : 0%| | 0/292 [00:00<?, ?it/s]Segmentation fault (core dumped)

prusnak · 2023-07-28T14:57:39Z

Can you post the result of the following command?

sha256sum weights/LLaMA-2/7B/consolidated.00.pth

Also try to run this before running your command:

export PYTHONFAULTHANDLER=1

This will add more info to the console.

chenyuxyz · 2023-07-28T15:08:27Z

loading works for me, do you have enough ram?

oliverhu · 2023-07-29T05:40:50Z

It works in macOS (Metal & CPU), but ran into the same issue on Ubuntu 22.04 (clang 14).

(p3) pi@pig ~/py/tinygrad (master) $ CPU=1 PYTHONFAULTHANDLER=1 python  -O examples/llama.py --prompt "Hello." --count 10 --temperature 0 --timing
using CPU backend
using LLaMA-7B model
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   0%| | 0/292 [00:0Fatal Python error: Segmentation fault

Thread 0x00007f56dc0ef640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/home/pi/p3/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f5a2a299000 (most recent call first):
  File "/home/pi/py/tinygrad/tinygrad/runtime/ops_llvm.py", line 62 in __call__
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 142 in __call__
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 139 in exec
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 197 in exec_ast
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 157 in realize
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 352 in _realize_from
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 144 in realize
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 147 in realize
  File "/home/pi/py/tinygrad/tinygrad/tensor.py", line 94 in realize
  File "/home/pi/py/tinygrad/tinygrad/state.py", line 56 in load_state_dict
  File "/home/pi/py/tinygrad/examples/llama.py", line 246 in build
  File "/home/pi/py/tinygrad/examples/llama.py", line 391 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, sentencepiece._sentencepiece (total: 14)
Segmentation fault (core dumped)

IR

; ModuleID = "/home/pi/py/tinygrad/tinygrad/codegen/llvmir.py"
target triple = "unknown-unknown-unknown"
target datalayout = ""

define void @"exec"(half* %".1", float* %".2") "no-nans-fp-math"="true"
{
entry:
  %".4" = getelementptr inbounds float, float* %".2", i64 0
  %".5" = load float, float* %".4"
  %".6" = fptrunc float %".5" to half
  %".7" = getelementptr inbounds half, half* %".1", i64 0
  store half %".6", half* %".7"
  ret void
}

It crashes at store half %".6", half* %".7"

oliverhu · 2023-07-29T07:05:37Z

I swapped out the llvmir version with numpy impl, and it works. It seems the generated llvmir conversion for bfloat16 doesn't work well.

ghost · 2023-07-30T06:16:11Z

Can you post the result of the following command?
sha256sum weights/LLaMA-2/7B/consolidated.00.pth 
Also try to run this before running your command:
export PYTHONFAULTHANDLER=1 
This will add more info to the console.

$ sha256sum weights/LLaMA-2/7B/consolidated.00.pth
d67a91807d5879d193a694da57f28ff85092e92dc9fbef4888bd05e22b15ab75  weights/LLaMA-2/7B/consolidated.00.pth

$ export PYTHONFAULTHANDLER=1 
$ python3 examples/llama.py --prompt="Hello." --temperature=0 --gen 2
using CPU backend
using LLaMA-2-7B model
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   0%|                                                                                                      | 0/292 [00:00<?, ?it/s]Fatal Python error: Segmentation fault

Thread 0x00007f1db286c700 (most recent call first):
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 306 in wait
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 558 in wait
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 890 in _bootstrap

Current thread 0x00007f1dd5618180 (most recent call first):
  File "/home/x/workspace/project/tinygrad/tinygrad/runtime/ops_llvm.py", line 62 in __call__
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 142 in __call__
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 139 in exec
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 197 in exec_ast
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 157 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 352 in _realize_from
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 144 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 147 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/tensor.py", line 94 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/state.py", line 56 in load_state_dict
  File "examples/llama.py", line 246 in build
  File "examples/llama.py", line 387 in <module>
Segmentation fault (core dumped)

oliverhu · 2023-07-30T18:45:59Z

I did a bit more debugging, it seems to be related to missing compiler-rt symbols.

Script to reproduce in Ubuntu:

from tinygrad.tensor import Tensor


a = Tensor([1.0]).to("LLVM").half().to('CPU').realize()
print(a.numpy())

The compiled assembly

        .text
        .file   "<string>"
        .globl  exec                            # -- Begin function exec
        .p2align        4, 0x90
        .type   exec,@function
exec:                                   # @exec
        .cfi_startproc
# %bb.0:                                # %entry
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset %rbx, -16
        movq    %rdi, %rbx
        movss   (%rsi), %xmm0                   # xmm0 = mem[0],zero,zero,zero
        movabsq $__gnu_f2h_ieee, %rax
        callq   *%rax
        movw    %ax, (%rbx)
        popq    %rbx
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   exec, .Lfunc_end0-exec
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

Core dump after disassembling:

(gdb) disassemble exec
Dump of assembler code for function exec:
   0x00007fc078c43000 <+0>:	push   %rbx
   0x00007fc078c43001 <+1>:	mov    %rdi,%rbx
   0x00007fc078c43004 <+4>:	movss  (%rsi),%xmm0
   0x00007fc078c43008 <+8>:	movabs $0x0,%rax
   0x00007fc078c43012 <+18>:	call   *%rax
   0x00007fc078c43014 <+20>:	mov    %ax,(%rbx)
   0x00007fc078c43017 <+23>:	pop    %rbx
   0x00007fc078c43018 <+24>:	ret
End of assembler dump.

It seems __gnu_f2h_ieee can't be find in the current version of llvmlite installation in Ubuntu... and the found this: numba/llvmlite#834

@prusnak @wozeparrot if you can chime in to help here..

wozeparrot · 2023-07-30T19:16:56Z

Why isn't this failing in CI, cuz running test_dtypes for me with LLVM fails.

But yea, we probably need compiler-rt for half support on most x86 systems, I thought fptrunc gets compiled down to something else tho?

prusnak · 2023-07-30T20:56:44Z

Hopefully will be fixed in llvmlite 0.41? See numba/llvmlite#909

oliverhu · 2023-07-31T04:10:30Z

fptrunc is compiled into __gnu_f2h_ieee here, I think it is some trunc* function for float32 -> bfloat. The llvmlite issue has been there for 1 year already...not optimistic that will fixed very soon.

wozeparrot · 2023-07-31T05:00:43Z

__gnu_f2h_ieee should be for float32 to float16 on systems without native half support.

oliverhu · 2023-07-31T17:47:10Z

You're right https://reviews.llvm.org/D4927. I tried to compile numba/llvmlite#909 locally and linked libclang_rt.builtin, and verified __gnu_f2h_ieee is there

(tinygrad) pi@pig ~/miniconda3/envs/tinygrad $ nm ./lib/python3.10/site-packages/llvmlite/binding/libllvmlite.so | grep __gnu_f2h_ieee

0000000000846df0 t __gnu_f2h_ieee

Still the same seg fault.. scratching my head.

oliverhu · 2023-08-02T05:09:12Z

Tried to compile a dummy LLVM IR to assembly, linking libclang_rt.builtins.aseems to work:

; ModuleID = "/home/pi/py/tinygrad/tinygrad/codegen/llvmir.py"
; target triple = "unknown-unknown-unknown"
target datalayout = ""
target triple = "x86_64-pc-linux-gnu"

define void @"exec"(half* %".1", float* %".2") "no-nans-fp-math"="true"
{
entry:
  %".4" = getelementptr inbounds float, float* %".2", i64 0
  %".5" = load float, float* %".4"
  %".6" = fptrunc float %".5" to half
  %".7" = getelementptr inbounds half, half* %".1", i64 0
  store half %".6", half* %".7"
  ret void
}

define i32 @main() {
  ret i32 0
}

assembly

code:     file format elf64-x86-64


Disassembly of section .init:

0000000000401000 <_init>:
  401000:       f3 0f 1e fa             endbr64 
  401004:       48 83 ec 08             sub    $0x8,%rsp
  401008:       48 8b 05 e9 2f 00 00    mov    0x2fe9(%rip),%rax        # 403ff8 <__gmon_start__@Base>
  40100f:       48 85 c0                test   %rax,%rax
  401012:       74 02                   je     401016 <_init+0x16>
  401014:       ff d0                   call   *%rax
  401016:       48 83 c4 08             add    $0x8,%rsp
  40101a:       c3                      ret    

Disassembly of section .text:

0000000000401020 <_start>:
  401020:       f3 0f 1e fa             endbr64 
  401024:       31 ed                   xor    %ebp,%ebp
  401026:       49 89 d1                mov    %rdx,%r9
  401029:       5e                      pop    %rsi
  40102a:       48 89 e2                mov    %rsp,%rdx
  40102d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  401031:       50                      push   %rax
  401032:       54                      push   %rsp
  401033:       45 31 c0                xor    %r8d,%r8d
  401036:       31 c9                   xor    %ecx,%ecx
  401038:       48 c7 c7 30 11 40 00    mov    $0x401130,%rdi
  40103f:       ff 15 ab 2f 00 00       call   *0x2fab(%rip)        # 403ff0 <__libc_start_main@GLIBC_2.34>
  401045:       f4                      hlt    
  401046:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  40104d:       00 00 00 

0000000000401110 <exec>:
  401110:       53                      push   %rbx
  401111:       48 89 fb                mov    %rdi,%rbx
  401114:       f3 0f 10 06             movss  (%rsi),%xmm0
  401118:       e8 23 01 00 00          call   401240 <__gnu_f2h_ieee>
  40111d:       66 89 03                mov    %ax,(%rbx)
  401120:       5b                      pop    %rbx
  401121:       c3                      ret    
  401122:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  401129:       00 00 00 
  40112c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000401130 <main>:
  401130:       31 c0                   xor    %eax,%eax
  401132:       c3                      ret    
  401133:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  40113a:       00 00 00 
  40113d:       0f 1f 00                nopl   (%rax)

0000000000401140 <__truncsfhf2>:
  401140:       66 0f 7e c1             movd   %xmm0,%ecx
  401144:       66 0f 7e c2             movd   %xmm0,%edx
  401148:       81 e1 ff ff ff 7f       and    $0x7fffffff,%ecx
  40114e:       8d b1 00 00 80 c7       lea    -0x38800000(%rcx),%esi
  401154:       8d 81 00 00 80 b8       lea    -0x47800000(%rcx),%eax
  40115a:       39 c6                   cmp    %eax,%esi
  40115c:       73 32                   jae    401190 <__truncsfhf2+0x50>
...
  401238:       8d 41 01                lea    0x1(%rcx),%eax
  40123b:       eb d0                   jmp    40120d <__truncsfhf2+0xcd>
  40123d:       0f 1f 00                nopl   (%rax)

0000000000401240 <__gnu_f2h_ieee>:
  401240:       e9 fb fe ff ff          jmp    401140 <__truncsfhf2>

Disassembly of section .fini:

0000000000401248 <_fini>:
  401248:       f3 0f 1e fa             endbr64 
  40124c:       48 83 ec 08             sub    $0x8,%rsp
  401250:       48 83 c4 08             add    $0x8,%rsp
  401254:       c3                      ret

So __gnu_f2h_ieee jumps to __truncsfhf2

oliverhu · 2023-08-03T05:24:51Z

Finally fixed the issue by setting proper flags in llvmlite build script. __gnu_f2h_ieee symbol was hidden when linked to the libllvmlite.so. Will submit a patch there and then close this ticket.

readelf -s ./lib/python3.10/site-packages/llvmlite/binding/libllvmlite.so | grep f2h
  4652: 00000000000a560a    32 FUNC    GLOBAL DEFAULT   14 __gnu_f2h_ieee

sobomax · 2023-12-30T06:21:41Z

I've posted a potential workaround for the issue in #2915

oliverhu mentioned this issue Jul 31, 2023

Add support for compiler-rt inclusion numba/llvmlite#909

Open

wozeparrot changed the title ~~Segmentation fault (core dumped)~~ test_dtype fails with a segfault on LLVM on linux Aug 1, 2023

oliverhu mentioned this issue Aug 7, 2023

Statically link compiler-rt numba/llvmlite#976

Closed

oliverhu mentioned this issue Aug 24, 2023

Statically link compiler-rt numba/llvmlite#986

Open

wozeparrot added bug Something isn't working upstream Something to do with upstream packages labels Aug 27, 2023

sobomax mentioned this issue Dec 22, 2023

Fix LLVM issue with bitcast+fptrunc vectorization and add type load/downconversion test #2915

Closed

chenyuxyz mentioned this issue Mar 18, 2024

LLVM=1 python -m pytest test/test_dtype.py segfault on tinybox #3790

Closed

ym1234 mentioned this issue May 9, 2024

Clang jit #4492

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_dtype fails with a segfault on LLVM on linux #1367

test_dtype fails with a segfault on LLVM on linux #1367

ghost commented Jul 28, 2023

prusnak commented Jul 28, 2023 •

edited

Loading

chenyuxyz commented Jul 28, 2023

oliverhu commented Jul 29, 2023 •

edited

Loading

oliverhu commented Jul 29, 2023 •

edited

Loading

ghost commented Jul 30, 2023 •

edited by ghost

Loading

oliverhu commented Jul 30, 2023

wozeparrot commented Jul 30, 2023

prusnak commented Jul 30, 2023

oliverhu commented Jul 31, 2023

wozeparrot commented Jul 31, 2023

oliverhu commented Jul 31, 2023

oliverhu commented Aug 2, 2023 •

edited

Loading

oliverhu commented Aug 3, 2023 •

edited

Loading

sobomax commented Dec 30, 2023

test_dtype fails with a segfault on LLVM on linux #1367

test_dtype fails with a segfault on LLVM on linux #1367

Comments

ghost commented Jul 28, 2023

prusnak commented Jul 28, 2023 • edited Loading

chenyuxyz commented Jul 28, 2023

oliverhu commented Jul 29, 2023 • edited Loading

oliverhu commented Jul 29, 2023 • edited Loading

ghost commented Jul 30, 2023 • edited by ghost Loading

oliverhu commented Jul 30, 2023

wozeparrot commented Jul 30, 2023

prusnak commented Jul 30, 2023

oliverhu commented Jul 31, 2023

wozeparrot commented Jul 31, 2023

oliverhu commented Jul 31, 2023

oliverhu commented Aug 2, 2023 • edited Loading

oliverhu commented Aug 3, 2023 • edited Loading

sobomax commented Dec 30, 2023

prusnak commented Jul 28, 2023 •

edited

Loading

oliverhu commented Jul 29, 2023 •

edited

Loading

oliverhu commented Jul 29, 2023 •

edited

Loading

ghost commented Jul 30, 2023 •

edited by ghost

Loading

oliverhu commented Aug 2, 2023 •

edited

Loading

oliverhu commented Aug 3, 2023 •

edited

Loading