-
Notifications
You must be signed in to change notification settings - Fork 608
Closed
Description
使用 https://github.com/THUDM/GLM-130B/blob/main/docs/inference-with-fastertransformer.md 这个文档来服务化GLM130b时候报了如下错误,请问是什么原因呢
Caught signal 7 (Bus error: nonexistent physical address)
==== backtrace (tid: 12126) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12129) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12130) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12128) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12131) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12133) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
==== backtrace (tid: 12132) ====
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
0 0x0000000000014420 __funlockfile() ???:0
5 0x000000000011f133 clone() ???:0
=================================
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
==== backtrace (tid: 12127) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x00000000000755bd ncclGroupEnd() ???:0
3 0x000000000006af86 ncclGroupEnd() ???:0
4 0x0000000000008609 start_thread() ???:0
5 0x000000000011f133 clone() ???:0
=================================
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 12043 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 12037) of binary: /opt/conda/bin/python
Metadata
Metadata
Assignees
Labels
No labels