Run on CPU without AVX2 #315

ZanMax · 2024-04-14T23:19:15Z

Hello,
I have a server with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and 5x WX9100 and want to run Mistral 7b on each GPU.
But I received an error: "Illegal instruction (core dumped)" when I tried to do it.
Is it possible to run exllama on the CPU without AVX2?

turboderp · 2024-04-15T23:24:15Z

Are you on the latest version?

ZanMax · 2024-04-15T23:54:41Z

steps:

git clone https://github.com/turboderp/exllama
cd exllama
pip install -r requirements.txt
python test_benchmark_inference.py -d <path_to_model_files> -p -ppl

result

python test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl
Successfully preprocessed all matching files.
-- Perplexity:
-- - Dataset: datasets/wikitext2_val_sample.jsonl
-- - Chunks: 100
-- - Chunk size: 2048 -> 2048
-- - Chunk overlap: 0
-- - Min. chunk size: 50
-- - Key: text
-- Tokenizer: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/tokenizer.model
-- Model config: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/config.json
-- Model: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/model.safetensors
-- Sequence length: 2048
-- Tuning:
-- --sdp_thd: 8
-- --matmul_recons_thd: 8
-- --fused_mlp_thd: 2
-- --rmsnorm_no_half2
-- --rope_no_half2
-- --matmul_no_half2
-- --silu_no_half2
-- Options: ['perf', 'perplexity']
** Time, Load model: 21.56 seconds
** Time, Load tokenizer: 0.02 seconds
-- Groupsize (inferred): 128
-- Act-order (inferred): yes
** VRAM, Model: [cuda:0] 3,877.87 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB
** VRAM, Cache: [cuda:0] 256.00 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB
-- Warmup pass 1...
Illegal instruction (core dumped)

As I know Illegal instruction (core dumped) means that problem with AVX2 instruction. When I tried the GGUF format with llama.cpp I received the same Illegal instruction (core dumped).

ZanMax · 2024-04-18T00:07:53Z

Maybe this gives more information about an error:

gdb --args python3 test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl

#0 0x00007fff4e89540e in rocblas_hgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#1 0x00007fff86e491dd in hipblasHgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/libhipblas.so
#2 0x00007ffe8ba50855 in q4_matmul_recons_cuda(ExLlamaTuning*, __half const*, int, Q4Matrix*, __half*, void*, bool) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#3 0x00007ffe8ba364e8 in q4_matmul(at::Tensor, unsigned long, at::Tensor) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#4 0x00007ffe8ba4e423 in pybind11::cpp_function::initialize<void (&)(at::Tensor, unsigned long, at::Tensor), void, at::Tensor, unsigned long, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [10]>(void (&)(at::Tensor, unsigned long, at::Tensor), void ()(at::Tensor, unsigned long, at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [10])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#5 0x00007ffe8ba4aa4d in pybind11::cpp_function::dispatcher(_object, _object*, _object*) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#6 0x00005555556ae10e in ?? ()
#7 0x00005555556a4a7b in _PyObject_MakeTpCall ()
#8 0x000055555569d096 in _PyEval_EvalFrameDefault ()
#9 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#10 0x000055555569ccfa in _PyEval_EvalFrameDefault ()
#11 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#12 0x000055555569745c in _PyEval_EvalFrameDefault ()
#13 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#14 0x000055555569745c in _PyEval_EvalFrameDefault ()
#15 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#16 0x000055555569745c in _PyEval_EvalFrameDefault ()
#17 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#18 0x000055555569745c in _PyEval_EvalFrameDefault ()
#19 0x00005555556bc7f1 in ?? ()
#20 0x000055555569853c in _PyEval_EvalFrameDefault ()
#21 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#22 0x000055555569726d in _PyEval_EvalFrameDefault ()
#23 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#24 0x000055555569726d in _PyEval_EvalFrameDefault ()
#25 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#26 0x000055555569726d in _PyEval_EvalFrameDefault ()
#27 0x00005555556939c6 in ?? ()
#28 0x0000555555789256 in PyEval_EvalCode ()
#29 0x00005555557b4108 in ?? ()
#30 0x00005555557ad9cb in ?? ()
#31 0x00005555557b3e55 in ?? ()
#32 0x00005555557b3338 in _PyRun_SimpleFileObject ()
#33 0x00005555557b2f83 in _PyRun_AnyFileObject ()
#34 0x00005555557a5a5e in Py_RunMain ()
#35 0x000055555577c02d in Py_BytesMain ()
#36 0x00007ffff7c7ed90 in __libc_start_call_main (main=main@entry=0x55555577bff0, argc=argc@entry=6, argv=argv@entry=0x7fffffffe328) at ../sysdeps/nptl/libc_start_call_main.h:58
#37 0x00007ffff7c7ee40 in __libc_start_main_impl (main=0x55555577bff0, argc=6, argv=0x7fffffffe328, init=, fini=, rtld_fini=, stack_end=0x7fffffffe318)
at ../csu/libc-start.c:392
#38 0x000055555577bf25 in _start ()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run on CPU without AVX2 #315

Run on CPU without AVX2 #315

ZanMax commented Apr 14, 2024

turboderp commented Apr 15, 2024

ZanMax commented Apr 15, 2024 •

edited

Loading

ZanMax commented Apr 18, 2024

Run on CPU without AVX2 #315

Run on CPU without AVX2 #315

Comments

ZanMax commented Apr 14, 2024

turboderp commented Apr 15, 2024

ZanMax commented Apr 15, 2024 • edited Loading

ZanMax commented Apr 18, 2024

ZanMax commented Apr 15, 2024 •

edited

Loading