Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_dtype fails with a segfault on LLVM on linux #1367

Open
ghost opened this issue Jul 28, 2023 · 14 comments
Open

test_dtype fails with a segfault on LLVM on linux #1367

ghost opened this issue Jul 28, 2023 · 14 comments
Labels
bug Something isn't working upstream Something to do with upstream packages

Comments

@ghost
Copy link

ghost commented Jul 28, 2023

$ python3 examples/llama.py --gen 2
using CPU backend
using LLaMA-2-7B model
ram used: 0.00 GB, layers.0.attention.wq.weight : 0%| | 0/292 [00:00<?, ?it/s]Segmentation fault (core dumped)

@prusnak
Copy link
Contributor

prusnak commented Jul 28, 2023

  1. Can you post the result of the following command?
sha256sum weights/LLaMA-2/7B/consolidated.00.pth 
  1. Also try to run this before running your command:
export PYTHONFAULTHANDLER=1 

This will add more info to the console.

@chenyuxyz
Copy link
Collaborator

loading works for me, do you have enough ram?

image

@oliverhu
Copy link

oliverhu commented Jul 29, 2023

It works in macOS (Metal & CPU), but ran into the same issue on Ubuntu 22.04 (clang 14).

(p3) pi@pig ~/py/tinygrad (master) $ CPU=1 PYTHONFAULTHANDLER=1 python  -O examples/llama.py --prompt "Hello." --count 10 --temperature 0 --timing
using CPU backend
using LLaMA-7B model
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   0%| | 0/292 [00:0Fatal Python error: Segmentation fault

Thread 0x00007f56dc0ef640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/home/pi/p3/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f5a2a299000 (most recent call first):
  File "/home/pi/py/tinygrad/tinygrad/runtime/ops_llvm.py", line 62 in __call__
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 142 in __call__
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 139 in exec
  File "/home/pi/py/tinygrad/tinygrad/ops.py", line 197 in exec_ast
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 157 in realize
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 352 in _realize_from
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 144 in realize
  File "/home/pi/py/tinygrad/tinygrad/lazy.py", line 147 in realize
  File "/home/pi/py/tinygrad/tinygrad/tensor.py", line 94 in realize
  File "/home/pi/py/tinygrad/tinygrad/state.py", line 56 in load_state_dict
  File "/home/pi/py/tinygrad/examples/llama.py", line 246 in build
  File "/home/pi/py/tinygrad/examples/llama.py", line 391 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, sentencepiece._sentencepiece (total: 14)
Segmentation fault (core dumped)

IR

; ModuleID = "/home/pi/py/tinygrad/tinygrad/codegen/llvmir.py"
target triple = "unknown-unknown-unknown"
target datalayout = ""

define void @"exec"(half* %".1", float* %".2") "no-nans-fp-math"="true"
{
entry:
  %".4" = getelementptr inbounds float, float* %".2", i64 0
  %".5" = load float, float* %".4"
  %".6" = fptrunc float %".5" to half
  %".7" = getelementptr inbounds half, half* %".1", i64 0
  store half %".6", half* %".7"
  ret void
}

It crashes at store half %".6", half* %".7"

@oliverhu
Copy link

oliverhu commented Jul 29, 2023

I swapped out the llvmir version with numpy impl, and it works. It seems the generated llvmir conversion for bfloat16 doesn't work well.

@ghost
Copy link
Author

ghost commented Jul 30, 2023

  1. Can you post the result of the following command?
sha256sum weights/LLaMA-2/7B/consolidated.00.pth 
  1. Also try to run this before running your command:
export PYTHONFAULTHANDLER=1 

This will add more info to the console.

$ sha256sum weights/LLaMA-2/7B/consolidated.00.pth
d67a91807d5879d193a694da57f28ff85092e92dc9fbef4888bd05e22b15ab75  weights/LLaMA-2/7B/consolidated.00.pth

$ export PYTHONFAULTHANDLER=1 
$ python3 examples/llama.py --prompt="Hello." --temperature=0 --gen 2
using CPU backend
using LLaMA-2-7B model
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   0%|                                                                                                      | 0/292 [00:00<?, ?it/s]Fatal Python error: Segmentation fault

Thread 0x00007f1db286c700 (most recent call first):
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 306 in wait
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 558 in wait
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/home/x/miniconda3/envs/open-mmlab/lib/python3.8/threading.py", line 890 in _bootstrap

Current thread 0x00007f1dd5618180 (most recent call first):
  File "/home/x/workspace/project/tinygrad/tinygrad/runtime/ops_llvm.py", line 62 in __call__
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 142 in __call__
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 139 in exec
  File "/home/x/workspace/project/tinygrad/tinygrad/ops.py", line 197 in exec_ast
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 157 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 352 in _realize_from
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 144 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/lazy.py", line 147 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/tensor.py", line 94 in realize
  File "/home/x/workspace/project/tinygrad/tinygrad/state.py", line 56 in load_state_dict
  File "examples/llama.py", line 246 in build
  File "examples/llama.py", line 387 in <module>
Segmentation fault (core dumped)

@oliverhu
Copy link

I did a bit more debugging, it seems to be related to missing compiler-rt symbols.

Script to reproduce in Ubuntu:

from tinygrad.tensor import Tensor


a = Tensor([1.0]).to("LLVM").half().to('CPU').realize()
print(a.numpy())

The compiled assembly

        .text
        .file   "<string>"
        .globl  exec                            # -- Begin function exec
        .p2align        4, 0x90
        .type   exec,@function
exec:                                   # @exec
        .cfi_startproc
# %bb.0:                                # %entry
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset %rbx, -16
        movq    %rdi, %rbx
        movss   (%rsi), %xmm0                   # xmm0 = mem[0],zero,zero,zero
        movabsq $__gnu_f2h_ieee, %rax
        callq   *%rax
        movw    %ax, (%rbx)
        popq    %rbx
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   exec, .Lfunc_end0-exec
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

Core dump after disassembling:

(gdb) disassemble exec
Dump of assembler code for function exec:
   0x00007fc078c43000 <+0>:	push   %rbx
   0x00007fc078c43001 <+1>:	mov    %rdi,%rbx
   0x00007fc078c43004 <+4>:	movss  (%rsi),%xmm0
   0x00007fc078c43008 <+8>:	movabs $0x0,%rax
   0x00007fc078c43012 <+18>:	call   *%rax
   0x00007fc078c43014 <+20>:	mov    %ax,(%rbx)
   0x00007fc078c43017 <+23>:	pop    %rbx
   0x00007fc078c43018 <+24>:	ret
End of assembler dump.

It seems __gnu_f2h_ieee can't be find in the current version of llvmlite installation in Ubuntu... and the found this: numba/llvmlite#834

@prusnak @wozeparrot if you can chime in to help here..

@wozeparrot
Copy link
Collaborator

Why isn't this failing in CI, cuz running test_dtypes for me with LLVM fails.

But yea, we probably need compiler-rt for half support on most x86 systems, I thought fptrunc gets compiled down to something else tho?

@prusnak
Copy link
Contributor

prusnak commented Jul 30, 2023

Hopefully will be fixed in llvmlite 0.41? See numba/llvmlite#909

@oliverhu
Copy link

fptrunc is compiled into __gnu_f2h_ieee here, I think it is some trunc* function for float32 -> bfloat. The llvmlite issue has been there for 1 year already...not optimistic that will fixed very soon.

@wozeparrot
Copy link
Collaborator

__gnu_f2h_ieee should be for float32 to float16 on systems without native half support.

@oliverhu
Copy link

You're right https://reviews.llvm.org/D4927. I tried to compile numba/llvmlite#909 locally and linked libclang_rt.builtin, and verified __gnu_f2h_ieee is there

(tinygrad) pi@pig ~/miniconda3/envs/tinygrad $ nm ./lib/python3.10/site-packages/llvmlite/binding/libllvmlite.so | grep __gnu_f2h_ieee

0000000000846df0 t __gnu_f2h_ieee

Still the same seg fault.. scratching my head.

@wozeparrot wozeparrot changed the title Segmentation fault (core dumped) test_dtype fails with a segfault on LLVM on linux Aug 1, 2023
@oliverhu
Copy link

oliverhu commented Aug 2, 2023

Tried to compile a dummy LLVM IR to assembly, linking libclang_rt.builtins.aseems to work:

; ModuleID = "/home/pi/py/tinygrad/tinygrad/codegen/llvmir.py"
; target triple = "unknown-unknown-unknown"
target datalayout = ""
target triple = "x86_64-pc-linux-gnu"

define void @"exec"(half* %".1", float* %".2") "no-nans-fp-math"="true"
{
entry:
  %".4" = getelementptr inbounds float, float* %".2", i64 0
  %".5" = load float, float* %".4"
  %".6" = fptrunc float %".5" to half
  %".7" = getelementptr inbounds half, half* %".1", i64 0
  store half %".6", half* %".7"
  ret void
}

define i32 @main() {
  ret i32 0
}

assembly

code:     file format elf64-x86-64


Disassembly of section .init:

0000000000401000 <_init>:
  401000:       f3 0f 1e fa             endbr64 
  401004:       48 83 ec 08             sub    $0x8,%rsp
  401008:       48 8b 05 e9 2f 00 00    mov    0x2fe9(%rip),%rax        # 403ff8 <__gmon_start__@Base>
  40100f:       48 85 c0                test   %rax,%rax
  401012:       74 02                   je     401016 <_init+0x16>
  401014:       ff d0                   call   *%rax
  401016:       48 83 c4 08             add    $0x8,%rsp
  40101a:       c3                      ret    

Disassembly of section .text:

0000000000401020 <_start>:
  401020:       f3 0f 1e fa             endbr64 
  401024:       31 ed                   xor    %ebp,%ebp
  401026:       49 89 d1                mov    %rdx,%r9
  401029:       5e                      pop    %rsi
  40102a:       48 89 e2                mov    %rsp,%rdx
  40102d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  401031:       50                      push   %rax
  401032:       54                      push   %rsp
  401033:       45 31 c0                xor    %r8d,%r8d
  401036:       31 c9                   xor    %ecx,%ecx
  401038:       48 c7 c7 30 11 40 00    mov    $0x401130,%rdi
  40103f:       ff 15 ab 2f 00 00       call   *0x2fab(%rip)        # 403ff0 <__libc_start_main@GLIBC_2.34>
  401045:       f4                      hlt    
  401046:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  40104d:       00 00 00 

0000000000401110 <exec>:
  401110:       53                      push   %rbx
  401111:       48 89 fb                mov    %rdi,%rbx
  401114:       f3 0f 10 06             movss  (%rsi),%xmm0
  401118:       e8 23 01 00 00          call   401240 <__gnu_f2h_ieee>
  40111d:       66 89 03                mov    %ax,(%rbx)
  401120:       5b                      pop    %rbx
  401121:       c3                      ret    
  401122:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  401129:       00 00 00 
  40112c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000401130 <main>:
  401130:       31 c0                   xor    %eax,%eax
  401132:       c3                      ret    
  401133:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  40113a:       00 00 00 
  40113d:       0f 1f 00                nopl   (%rax)

0000000000401140 <__truncsfhf2>:
  401140:       66 0f 7e c1             movd   %xmm0,%ecx
  401144:       66 0f 7e c2             movd   %xmm0,%edx
  401148:       81 e1 ff ff ff 7f       and    $0x7fffffff,%ecx
  40114e:       8d b1 00 00 80 c7       lea    -0x38800000(%rcx),%esi
  401154:       8d 81 00 00 80 b8       lea    -0x47800000(%rcx),%eax
  40115a:       39 c6                   cmp    %eax,%esi
  40115c:       73 32                   jae    401190 <__truncsfhf2+0x50>
...
  401238:       8d 41 01                lea    0x1(%rcx),%eax
  40123b:       eb d0                   jmp    40120d <__truncsfhf2+0xcd>
  40123d:       0f 1f 00                nopl   (%rax)

0000000000401240 <__gnu_f2h_ieee>:
  401240:       e9 fb fe ff ff          jmp    401140 <__truncsfhf2>

Disassembly of section .fini:

0000000000401248 <_fini>:
  401248:       f3 0f 1e fa             endbr64 
  40124c:       48 83 ec 08             sub    $0x8,%rsp
  401250:       48 83 c4 08             add    $0x8,%rsp
  401254:       c3                      ret   

So __gnu_f2h_ieee jumps to __truncsfhf2

@oliverhu
Copy link

oliverhu commented Aug 3, 2023

Finally fixed the issue by setting proper flags in llvmlite build script. __gnu_f2h_ieee symbol was hidden when linked to the libllvmlite.so. Will submit a patch there and then close this ticket.

readelf -s ./lib/python3.10/site-packages/llvmlite/binding/libllvmlite.so | grep f2h
  4652: 00000000000a560a    32 FUNC    GLOBAL DEFAULT   14 __gnu_f2h_ieee

@sobomax
Copy link
Contributor

sobomax commented Dec 30, 2023

I've posted a potential workaround for the issue in #2915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Something to do with upstream packages
Projects
None yet
Development

No branches or pull requests

5 participants