请问模型启动一半卡住，是什么问题？ #106

YukinoshitaKaren · 2023-04-10T08:47:05Z

main: seed = 1681116321
llama_model_load: loading model from 'zh-models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 49954
llama_model_load: n_ctx = 2048
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 1
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 13134.21 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 14926.29 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'zh-models/7B/ggml-model-f16.bin'
llama_model_load: model size = 13133.55 MB / num tensors = 291
llama_init_from_file: kv self size = 1024.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 2048, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to LLaMa.
If you want to submit another line, end your input in ''.

Below is an instruction that describes a

使用的是./main -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3这个命令，输出就卡在a这个地方，也没办法进行交互

ymcui · 2023-04-10T08:48:57Z

嗯，有的时候是会这样，回车一下有时会解决。
另外，看你的log应该是上一版的llama.cpp，可以pull一下最新版重新编译试试，个人感觉启动比之前一版流畅一些。

YukinoshitaKaren · 2023-04-10T09:35:09Z

更新了llama.cpp之后确实可以跑起来了，但是生成速度非常慢，可能5-10Min生成1个字，这是正常的情况吗？比如下面是运行了20分钟之后的结果
main: seed = 1681118045
llama.cpp: loading model from zh-models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 49954
llama_model_load_internal: n_ctx = 256
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: f16 = 2
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5896.99 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 128.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 256, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.
Press Return to return control to LLaMa.
If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

你是一个导游，请你介绍一下中国的名胜古迹
中国

airaria · 2023-04-10T09:36:27Z

更新了llama.cpp之后确实可以跑起来了，但是生成速度非常慢，可能5-10Min生成1个字，这是正常的情况吗？比如下面是运行了20分钟之后的结果 main: seed = 1681118045 llama.cpp: loading model from zh-models/7B/ggml-model-q4_0.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 49954 llama_model_load_internal: n_ctx = 256 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5896.99 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 128.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. Reverse prompt: '### Instruction:

' sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 generate: n_ctx = 256, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

Press Ctrl+C to interject at any time.

Press Return to return control to LLaMa.

If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

你是一个导游，请你介绍一下中国的名胜古迹
中国

看一下这个#51

YukinoshitaKaren · 2023-04-10T10:48:44Z

我尝试了一下，速度还是很慢，请问程序是不是没有调用我的显卡，导致速度慢呢

ymcui · 2023-04-10T10:52:34Z

我尝试了一下，速度还是很慢，请问程序是不是没有调用我的显卡，导致速度慢呢

llama.cpp不会调用GPU。如果是mac M系列芯片会比较快。
另外可以通过增加-b（batch size）和-t（线程数）来提速（实测不是很明显）。

YulongXia · 2023-04-24T06:58:25Z

我的系统是Linux debian 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64
在使用llama.cpp量化了模型，得到zh-models/7B/ggml-model-q4_0.bin，
此时使用main进行预测，非常慢！！！！！
./main -m zh-models/7B/ggml-model-f16.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 -b 2048 --repeat_penalty 1.3

我使用-b指定了batch_size=2048，但输出的日志显示仍然是512，同时main的默认线程数是4，但是从输出日志来看，默认并不是4，而是将所有核都沾满(56/56)。
然后我显式指定了线程数4，程序就不再卡顿了

github-actions · 2023-05-14T22:02:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions · 2023-05-17T22:02:34Z

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

shiningxy mentioned this issue Apr 18, 2023

win10的cmd输入不了中文吗？输入英文回答中文 #51

Closed

github-actions bot added the stale label May 14, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问模型启动一半卡住，是什么问题？ #106

请问模型启动一半卡住，是什么问题？ #106

YukinoshitaKaren commented Apr 10, 2023

ymcui commented Apr 10, 2023

YukinoshitaKaren commented Apr 10, 2023

airaria commented Apr 10, 2023

YukinoshitaKaren commented Apr 10, 2023

ymcui commented Apr 10, 2023

YulongXia commented Apr 24, 2023 •

edited

Loading

github-actions bot commented May 14, 2023

github-actions bot commented May 17, 2023

请问模型启动一半卡住，是什么问题？ #106

请问模型启动一半卡住，是什么问题？ #106

Comments

YukinoshitaKaren commented Apr 10, 2023

ymcui commented Apr 10, 2023

YukinoshitaKaren commented Apr 10, 2023

airaria commented Apr 10, 2023

YukinoshitaKaren commented Apr 10, 2023

ymcui commented Apr 10, 2023

YulongXia commented Apr 24, 2023 • edited Loading

github-actions bot commented May 14, 2023

github-actions bot commented May 17, 2023

YulongXia commented Apr 24, 2023 •

edited

Loading