You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:43:44,628] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-01-10 21:43:44,628] [INFO] [runner.py:571:main] cmd = /home/zzz/anaconda3/envs/newvisglm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=16666 --enable_each_rank_log=None finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --layer_range 0 14 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 1 --gradient-accumulation-steps 4 --skip-init --fp16 --use_qlora
[2024-01-10 21:43:46,326] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=0
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_DEBUG=info
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_NET_GDR_LEVEL=2
[2024-01-10 21:43:48,984] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2024-01-10 21:43:48,984] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
[2024-01-10 21:43:48,984] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2024-01-10 21:43:48,984] [INFO] [launch.py:163:main] dist_world_size=4
[2024-01-10 21:43:48,984] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2024-01-10 21:43:50,865] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,916] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,938] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,944] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:44:24,179] [INFO] using world size: 4 and model-parallel size: 1
[2024-01-10 21:44:24,179] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
[2024-01-10 21:44:25,235] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-01-10 21:44:25,276] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=4. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-01-10 21:44:25,285] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,285] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,286] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,287] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,288] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,288] [INFO] [checkpointing.py:1045:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2024-01-10 21:44:25,288] [INFO] [checkpointing.py:227:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2024-01-10 21:44:25,289] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,295] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,296] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,422] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-01-10 21:44:44,798] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-01-10 21:44:45,735] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-01-10 21:44:46,718] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-01-10 21:48:30,236] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432905
[2024-01-10 21:48:31,608] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432906
[2024-01-10 21:48:32,635] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432907
[2024-01-10 21:48:32,636] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432908
[2024-01-10 21:48:33,659] [ERROR] [launch.py:321:sigkill_handler] ['/home/zzz/anaconda3/envs/newvisglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=3', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '1', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = -9
(newvisglm) zzz@zzz:~/yz/AllVscodes/VisualGLM-6B-main$ `
The text was updated successfully, but these errors were encountered:
`(newvisglm) zzz@zzz:~/yz/AllVscodes/VisualGLM-6B-main$ bash finetune/finetune_visualglm_qlora.sh
NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 deepspeed --master_port 16666 --include localhost:0,1,2,3 --hostfile hostfile_single finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --layer_range 0 14 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 1 --gradient-accumulation-steps 4 --skip-init --fp16 --use_qlora
[2024-01-10 21:42:21,222] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:43:44,628] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-01-10 21:43:44,628] [INFO] [runner.py:571:main] cmd = /home/zzz/anaconda3/envs/newvisglm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=16666 --enable_each_rank_log=None finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --layer_range 0 14 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 1 --gradient-accumulation-steps 4 --skip-init --fp16 --use_qlora
[2024-01-10 21:43:46,326] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=0
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_DEBUG=info
[2024-01-10 21:43:48,984] [INFO] [launch.py:138:main] 0 NCCL_NET_GDR_LEVEL=2
[2024-01-10 21:43:48,984] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2024-01-10 21:43:48,984] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
[2024-01-10 21:43:48,984] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2024-01-10 21:43:48,984] [INFO] [launch.py:163:main] dist_world_size=4
[2024-01-10 21:43:48,984] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2024-01-10 21:43:50,865] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,916] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,938] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-10 21:43:50,944] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
bin /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so'), PosixPath('/home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.warn(msg)
CUDA SETUP: CUDA runtime path found: /home/zzz/anaconda3/envs/newvisglm/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
[2024-01-10 21:44:24,179] [INFO] using world size: 4 and model-parallel size: 1
[2024-01-10 21:44:24,179] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
[2024-01-10 21:44:25,235] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-01-10 21:44:25,276] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=4. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-01-10 21:44:25,285] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,285] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,286] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,287] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,288] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,288] [INFO] [checkpointing.py:1045:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2024-01-10 21:44:25,288] [INFO] [checkpointing.py:227:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2024-01-10 21:44:25,289] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,295] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-10 21:44:25,296] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-01-10 21:44:25,422] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
/home/zzz/anaconda3/envs/newvisglm/lib/python3.10/site-packages/torch/nn/init.py:403: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-01-10 21:44:44,798] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-01-10 21:44:45,735] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-01-10 21:44:46,718] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-01-10 21:48:30,236] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432905
[2024-01-10 21:48:31,608] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432906
[2024-01-10 21:48:32,635] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432907
[2024-01-10 21:48:32,636] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2432908
[2024-01-10 21:48:33,659] [ERROR] [launch.py:321:sigkill_handler] ['/home/zzz/anaconda3/envs/newvisglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=3', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '1', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = -9
(newvisglm) zzz@zzz:~/yz/AllVscodes/VisualGLM-6B-main$ `
The text was updated successfully, but these errors were encountered: