Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问模型启动一半卡住,是什么问题? #106

Closed
YukinoshitaKaren opened this issue Apr 10, 2023 · 8 comments
Closed

请问模型启动一半卡住,是什么问题? #106

YukinoshitaKaren opened this issue Apr 10, 2023 · 8 comments
Labels

Comments

@YukinoshitaKaren
Copy link

main: seed = 1681116321
llama_model_load: loading model from 'zh-models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 49954
llama_model_load: n_ctx = 2048
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 1
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 13134.21 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 14926.29 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'zh-models/7B/ggml-model-f16.bin'
llama_model_load: model size = 13133.55 MB / num tensors = 291
llama_init_from_file: kv self size = 1024.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 2048, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to LLaMa.
  • If you want to submit another line, end your input in ''.

Below is an instruction that describes a

使用的是./main -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3这个命令,输出就卡在a这个地方,也没办法进行交互

@ymcui
Copy link
Owner

ymcui commented Apr 10, 2023

嗯,有的时候是会这样,回车一下有时会解决。
另外,看你的log应该是上一版的llama.cpp,可以pull一下最新版重新编译试试,个人感觉启动比之前一版流畅一些。

@YukinoshitaKaren
Copy link
Author

更新了llama.cpp之后确实可以跑起来了,但是生成速度非常慢,可能5-10Min生成1个字,这是正常的情况吗?比如下面是运行了20分钟之后的结果
main: seed = 1681118045
llama.cpp: loading model from zh-models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 49954
llama_model_load_internal: n_ctx = 256
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: f16 = 2
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5896.99 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 128.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 256, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to LLaMa.
  • If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

你是一个导游,请你介绍一下中国的名胜古迹
中国

@airaria
Copy link
Contributor

airaria commented Apr 10, 2023

更新了llama.cpp之后确实可以跑起来了,但是生成速度非常慢,可能5-10Min生成1个字,这是正常的情况吗?比如下面是运行了20分钟之后的结果 main: seed = 1681118045 llama.cpp: loading model from zh-models/7B/ggml-model-q4_0.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 49954 llama_model_load_internal: n_ctx = 256 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: f16 = 2 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5896.99 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 128.00 MB

system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. Reverse prompt: '### Instruction:

' sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 generate: n_ctx = 256, n_batch = 8, n_predict = 256, n_keep = 21

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to LLaMa.
  • If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

你是一个导游,请你介绍一下中国的名胜古迹
中国

看一下这个#51

@YukinoshitaKaren
Copy link
Author

我尝试了一下,速度还是很慢,请问程序是不是没有调用我的显卡,导致速度慢呢

@ymcui
Copy link
Owner

ymcui commented Apr 10, 2023

我尝试了一下,速度还是很慢,请问程序是不是没有调用我的显卡,导致速度慢呢

llama.cpp不会调用GPU。如果是mac M系列芯片会比较快。
另外可以通过增加-b(batch size)和-t(线程数)来提速(实测不是很明显)。

@YulongXia
Copy link

YulongXia commented Apr 24, 2023

我的系统是Linux debian 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64
在使用llama.cpp量化了模型,得到zh-models/7B/ggml-model-q4_0.bin,
此时使用main进行预测,非常慢!!!!!
./main -m zh-models/7B/ggml-model-f16.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 -b 2048 --repeat_penalty 1.3
微信图片_20230424145614
微信图片_20230424145228

我使用-b指定了batch_size=2048,但输出的日志显示仍然是512,同时main的默认线程数是4,但是从输出日志来看,默认并不是4,而是将所有核都沾满(56/56)。
然后我显式指定了线程数4,程序就不再卡顿了
image

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

@github-actions github-actions bot added the stale label May 14, 2023
@github-actions
Copy link

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants