You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Out of memory error on GPU 1. Cannot allocate 14.563977GB memory on GPU 1, 17.327148GB memory has been allocated and available memory is only 14.421387GB.
Please check whether there is any other process using GPU 1.
If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
The text was updated successfully, but these errors were encountered:
请提出你的问题
使用PaddleNlp训练在保存模型时报显存不足
Paddle版本:2.4.2
PaddleNlp版本:2.5.2.post
python版本:3.7.13
任务类型:文本分类-单分类
硬件环境:4张V100 32G GPU
训练数据集总大小:88.3M
详细异常提示:
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
I0316 08:06:42.398526 41084 tcp_utils.cc:130] Successfully connected to 172.17.0.2:41382
W0316 08:06:47.663002 41084 gpu_resources.cc:61] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0316 08:06:47.665423 41084 gpu_resources.cc:91] device: 1, cuDNN Version: 8.2.
[2023-03-16 08:06:53,052] [ INFO] topology.py:215 - HybridParallelInfo: rank_id: 1, mp_degree: 1, sharding_degree: 4, pp_degree: 1, dp_degree: 1, mp_group: [1], sharding_group: [0, 1, 2, 3], pp_group: [1], dp_group: [1], check/clip
group: [0, 1, 2, 3]
[2023-03-16 08:06:53,054] [ INFO] - +==============================================================================+
| |
| DistributedStrategy Overview |
| |
+==============================================================================+
| a_sync=True <-> a_sync_configs |
+------------------------------------------------------------------------------+
| k_steps -1 |
| max_merge_var_num 1 |
| send_queue_size 16 |
| independent_recv_thread False |
| min_send_grad_num_before_recv 1 |
| thread_pool_size 1 |
| send_wait_times 1 |
| runtime_split_send_recv False |
| launch_barrier True |
| heter_worker_device_guard cpu |
| lr_decay_steps 10 |
| use_ps_gpu 0 |
+==============================================================================+
| Environment Flags, Communication Flags |
+------------------------------------------------------------------------------+
| mode 1 |
| elastic False |
| auto False |
| sync_nccl_allreduce True |
| nccl_comm_num 1 |
| use_hierarchical_allreduce False |
| hierarchical_allreduce_inter_nranks 1 |
| sync_batch_norm False |
| fuse_all_reduce_ops True |
| fuse_grad_size_in_MB 32 |
| fuse_grad_size_in_TFLOPS 50.0 |
| cudnn_exhaustive_search False |
| conv_workspace_size_limit 512 |
| cudnn_batchnorm_spatial_persistent False |
| fp16_allreduce False |
| last_comm_group_size_MB 1.0 |
| find_unused_parameters False |
| without_graph_optimization False |
| fuse_grad_size_in_num 8 |
| calc_comm_same_stream False |
| asp False |
| fuse_grad_merge False |
| semi_auto False |
| adam_d2sum False |
| auto_search False |
| heter_ccl_mode False |
| is_fl_ps_mode False |
| with_coordinator False |
| split_data True |
| downpour_table_param [] |
| fs_client_param |
+==============================================================================+
| Build Strategy |
+------------------------------------------------------------------------------+
| enable_sequential_execution False |
| fuse_elewise_add_act_ops False |
| fuse_bn_act_ops False |
| fuse_relu_depthwise_conv False |
| fuse_broadcast_ops False |
| fuse_all_optimizer_ops False |
| enable_inplace False |
| enable_backward_optimizer_op_deps True |
| cache_runtime_context False |
| fuse_bn_add_act_ops True |
| enable_auto_fusion False |
| enable_addto False |
| fix_op_run_order False |
| allow_cuda_graph_capture False |
| reduce_strategy 0 |
| fuse_gemm_epilogue False |
| debug_graphviz_path |
+==============================================================================+
| Execution Strategy |
+------------------------------------------------------------------------------+
| num_threads 1 |
| num_iteration_per_drop_scope 10 |
| num_iteration_per_run 1 |
| use_thread_barrier False |
+==============================================================================+
[2023-03-16 08:06:53,055] [ INFO] - The default value for the training argument
--report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use--report_to all
to get the same behavioras now. You should start updating your code and make this info disappear :-).
[2023-03-16 08:06:53,056] [ INFO] - ============================================================
[2023-03-16 08:06:53,056] [ INFO] - Model Configuration Arguments
[2023-03-16 08:06:53,056] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009
[2023-03-16 08:06:53,056] [ INFO] - export_model_dir :/home/project/deploy/export/sort/cn
[2023-03-16 08:06:53,056] [ INFO] - model_name_or_path :ernie-3.0-tiny-micro-v2-zh
[2023-03-16 08:06:53,056] [ INFO] -
[2023-03-16 08:06:53,056] [ INFO] - ============================================================
[2023-03-16 08:06:53,056] [ INFO] - Data Configuration Arguments
[2023-03-16 08:06:53,057] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009
[2023-03-16 08:06:53,057] [ INFO] - bad_case_path :./data/bad_case.txt
[2023-03-16 08:06:53,057] [ INFO] - debug :False
[2023-03-16 08:06:53,057] [ INFO] - dev_path :/home/project/paddle_class/data/target/sort/cn/dev.txt
[2023-03-16 08:06:53,057] [ INFO] - early_stopping :True
[2023-03-16 08:06:53,057] [ INFO] - early_stopping_patience :3
[2023-03-16 08:06:53,057] [ INFO] - label_path :/home/project/paddle_class/data/target/sort/cn/label.txt
[2023-03-16 08:06:53,057] [ INFO] - max_length :128
[2023-03-16 08:06:53,057] [ INFO] - test_path :./data/dev.txt
[2023-03-16 08:06:53,057] [ INFO] - train_path :/home/project/paddle_class/data/target/sort/cn/train.txt
[2023-03-16 08:06:53,057] [ INFO] -
[2023-03-16 08:06:53,148] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.modeling.ErnieForSequenceClassification'> to load 'ernie-3.0-tiny-micro-v2-zh'.
[2023-03-16 08:06:54,312] [ INFO] - All model checkpoint weights were used when initializing ErnieForSequenceClassification.
[2023-03-16 08:06:54,312] [ WARNING] - Some weights of ErnieForSequenceClassification were not initialized from the model checkpoint at ernie-3.0-tiny-micro-v2-zh and are newly initialized: ['ernie.pooler.dense.bias', 'classifie
r.bias', 'ernie.pooler.dense.weight', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-03-16 08:06:54,313] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-3.0-tiny-micro-v2-zh'.
[2023-03-16 08:06:54,313] [ INFO] - Already cached /root/.paddlenlp/models/ernie-3.0-tiny-micro-v2-zh/ernie_3.0_tiny_micro_v2_vocab.txt
[2023-03-16 08:06:54,335] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/ernie-3.0-tiny-micro-v2-zh/tokenizer_config.json
[2023-03-16 08:06:54,335] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/ernie-3.0-tiny-micro-v2-zh/special_tokens_map.json
[2023-03-16 08:06:55,785] [ WARNING] - Accessing
enable_recompute
throughmodel.enable_recompute
will be deprecated after v2.6.0. Instead, domodel.config.enable_recompute
[2023-03-16 08:06:55,786] [ WARNING] - Accessing
enable_recompute
throughmodel.enable_recompute
will be deprecated after v2.6.0. Instead, domodel.config.enable_recompute
[2023-03-16 08:06:55,786] [ INFO] - ============================================================
[2023-03-16 08:06:55,786] [ INFO] - Training Configuration Arguments
[2023-03-16 08:06:55,786] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009
[2023-03-16 08:06:55,786] [ INFO] - _no_sync_in_gradient_accumulation:True
[2023-03-16 08:06:55,786] [ INFO] - activation_quantize_type :None
[2023-03-16 08:06:55,786] [ INFO] - adam_beta1 :0.9
[2023-03-16 08:06:55,786] [ INFO] - adam_beta2 :0.999
[2023-03-16 08:06:55,786] [ INFO] - adam_epsilon :1e-08
[2023-03-16 08:06:55,787] [ INFO] - algo_list :None
[2023-03-16 08:06:55,787] [ INFO] - batch_num_list :None
[2023-03-16 08:06:55,787] [ INFO] - batch_size_list :None
[2023-03-16 08:06:55,787] [ INFO] - bf16 :False
[2023-03-16 08:06:55,787] [ INFO] - bf16_full_eval :False
[2023-03-16 08:06:55,787] [ INFO] - bias_correction :False
[2023-03-16 08:06:55,787] [ INFO] - current_device :gpu:1
[2023-03-16 08:06:55,787] [ INFO] - dataloader_drop_last :False
[2023-03-16 08:06:55,787] [ INFO] - dataloader_num_workers :0
[2023-03-16 08:06:55,787] [ INFO] - device :gpu
[2023-03-16 08:06:55,787] [ INFO] - disable_tqdm :True
[2023-03-16 08:06:55,787] [ INFO] - do_compress :False
[2023-03-16 08:06:55,787] [ INFO] - do_eval :True
[2023-03-16 08:06:55,787] [ INFO] - do_export :True
[2023-03-16 08:06:55,788] [ INFO] - do_predict :False
[2023-03-16 08:06:55,788] [ INFO] - do_train :True
[2023-03-16 08:06:55,788] [ INFO] - dp_degree :1
[2023-03-16 08:06:55,788] [ INFO] - eval_batch_size :420
[2023-03-16 08:06:55,788] [ INFO] - eval_steps :None
[2023-03-16 08:06:55,788] [ INFO] - evaluation_strategy :IntervalStrategy.EPOCH
[2023-03-16 08:06:55,788] [ INFO] - flatten_param_grads :False
[2023-03-16 08:06:55,788] [ INFO] - fp16 :False
[2023-03-16 08:06:55,788] [ INFO] - fp16_full_eval :False
[2023-03-16 08:06:55,788] [ INFO] - fp16_opt_level :O1
[2023-03-16 08:06:55,788] [ INFO] - gradient_accumulation_steps :1
[2023-03-16 08:06:55,788] [ INFO] - greater_is_better :True
[2023-03-16 08:06:55,788] [ INFO] - ignore_data_skip :False
[2023-03-16 08:06:55,788] [ INFO] - input_dtype :int64
[2023-03-16 08:06:55,789] [ INFO] - input_infer_model_path :None
[2023-03-16 08:06:55,789] [ INFO] - label_names :None
[2023-03-16 08:06:55,789] [ INFO] - lazy_data_processing :True
[2023-03-16 08:06:55,789] [ INFO] - learning_rate :3e-05
[2023-03-16 08:06:55,789] [ INFO] - load_best_model_at_end :True
[2023-03-16 08:06:55,789] [ INFO] - local_process_index :1
[2023-03-16 08:06:55,789] [ INFO] - local_rank :1
[2023-03-16 08:06:55,789] [ INFO] - log_level :-1
[2023-03-16 08:06:55,789] [ INFO] - log_level_replica :-1
[2023-03-16 08:06:55,789] [ INFO] - log_on_each_node :True
[2023-03-16 08:06:55,789] [ INFO] - logging_dir :/home/project/paddle_class/checkpoint/sort/cn/runs/Mar16_08-06-42_6aac71f1edd1
[2023-03-16 08:06:55,789] [ INFO] - logging_first_step :False
[2023-03-16 08:06:55,789] [ INFO] - logging_steps :5
[2023-03-16 08:06:55,789] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2023-03-16 08:06:55,789] [ INFO] - lr_scheduler_type :SchedulerType.LINEAR
[2023-03-16 08:06:55,790] [ INFO] - max_grad_norm :1.0
[2023-03-16 08:06:55,790] [ INFO] - max_steps :-1
[2023-03-16 08:06:55,790] [ INFO] - metric_for_best_model :accuracy
[2023-03-16 08:06:55,790] [ INFO] - minimum_eval_times :None
[2023-03-16 08:06:55,790] [ INFO] - moving_rate :0.9
[2023-03-16 08:06:55,790] [ INFO] - no_cuda :False
[2023-03-16 08:06:55,790] [ INFO] - num_train_epochs :100.0
[2023-03-16 08:06:55,790] [ INFO] - onnx_format :True
[2023-03-16 08:06:55,790] [ INFO] - optim :OptimizerNames.ADAMW
[2023-03-16 08:06:55,790] [ INFO] - output_dir :/home/project/paddle_class/checkpoint/sort/cn
[2023-03-16 08:06:55,790] [ INFO] - overwrite_output_dir :False
[2023-03-16 08:06:55,790] [ INFO] - past_index :-1
[2023-03-16 08:06:55,790] [ INFO] - per_device_eval_batch_size :420
[2023-03-16 08:06:55,790] [ INFO] - per_device_train_batch_size :420
[2023-03-16 08:06:55,790] [ INFO] - prediction_loss_only :False
[2023-03-16 08:06:55,791] [ INFO] - process_index :1
[2023-03-16 08:06:55,791] [ INFO] - prune_embeddings :False
[2023-03-16 08:06:55,791] [ INFO] - recompute :True
[2023-03-16 08:06:55,791] [ INFO] - remove_unused_columns :True
[2023-03-16 08:06:55,791] [ INFO] - report_to :['visualdl']
[2023-03-16 08:06:55,791] [ INFO] - resume_from_checkpoint :True
[2023-03-16 08:06:55,791] [ INFO] - round_type :round
[2023-03-16 08:06:55,791] [ INFO] - run_name :/home/project/paddle_class/checkpoint/sort/cn
[2023-03-16 08:06:55,791] [ INFO] - save_on_each_node :False
[2023-03-16 08:06:55,791] [ INFO] - save_steps :100
[2023-03-16 08:06:55,791] [ INFO] - save_strategy :IntervalStrategy.EPOCH
[2023-03-16 08:06:55,791] [ INFO] - save_total_limit :1
[2023-03-16 08:06:55,791] [ INFO] - scale_loss :32768
[2023-03-16 08:06:55,791] [ INFO] - seed :42
[2023-03-16 08:06:55,792] [ INFO] - sharding :[<ShardingOption.SHARD_GRAD_OP: 'stage2'>]
[2023-03-16 08:06:55,792] [ INFO] - sharding_degree :4
[2023-03-16 08:06:55,792] [ INFO] - should_log :False
[2023-03-16 08:06:55,792] [ INFO] - should_save :False
[2023-03-16 08:06:55,792] [ INFO] - skip_memory_metrics :True
[2023-03-16 08:06:55,792] [ INFO] - strategy :dynabert+ptq
[2023-03-16 08:06:55,792] [ INFO] - train_batch_size :420
[2023-03-16 08:06:55,792] [ INFO] - use_pact :True
[2023-03-16 08:06:55,792] [ INFO] - warmup_ratio :0.1
[2023-03-16 08:06:55,792] [ INFO] - warmup_steps :0
[2023-03-16 08:06:55,792] [ INFO] - weight_decay :0.0
[2023-03-16 08:06:55,792] [ INFO] - weight_quantize_type :channel_wise_abs_max
[2023-03-16 08:06:55,792] [ INFO] - width_mult_list :None
[2023-03-16 08:06:55,792] [ INFO] - world_size :4
[2023-03-16 08:06:55,793] [ INFO] -
WARNING:root:While using ClipGradByGlobalNorm in GroupShardedOptimizerStage2, the grad clip of original optimizer will be changed.
[2023-03-16 08:06:57,190] [ INFO] - ***** Running training *****
[2023-03-16 08:06:57,191] [ INFO] - Num examples = 800000
[2023-03-16 08:06:57,191] [ INFO] - Num Epochs = 100
[2023-03-16 08:06:57,191] [ INFO] - Instantaneous batch size per device = 420
[2023-03-16 08:06:57,191] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 1680
[2023-03-16 08:06:57,191] [ INFO] - Gradient Accumulation steps = 1
[2023-03-16 08:06:57,191] [ INFO] - Total optimization steps = 47700.0
[2023-03-16 08:06:57,191] [ INFO] - Total num train samples = 80000000.0
[2023-03-16 08:06:57,193] [ INFO] - Number of trainable parameters = 98054787
Can not add param: embedding_0.w_0, param's shape: [40000, 384], param align: 0, grad_storages fill: 0,
Can not add param: linear_25.w_0, param's shape: [384, 193923], param align: 0, grad_storages fill: 2419200,
[2023-03-16 08:10:42,126] [ INFO] - ***** Running Evaluation *****
[2023-03-16 08:10:42,126] [ INFO] - Num examples = 199993
[2023-03-16 08:10:42,127] [ INFO] - Total prediction steps = 120
[2023-03-16 08:10:42,127] [ INFO] - Pre device batch size = 420
[2023-03-16 08:10:42,127] [ INFO] - Total Batch size = 1680
terminate called after throwing an instance of 'paddle::memory::allocation::BadAlloc'
what():
C++ Traceback (most recent call last):
0 concat_ad_func(std::vector<paddle::experimental::Tensor, std::allocatorpaddle::experimental::Tensor > const&, paddle::experimental::ScalarBasepaddle::experimental::Tensor)
1 paddle::experimental::concat(std::vector<paddle::experimental::Tensor, std::allocatorpaddle::experimental::Tensor > const&, paddle::experimental::ScalarBasepaddle::experimental::Tensor const&)
2 void phi::ConcatKernel<float, phi::GPUContext>(phi::GPUContext const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::experimental::ScalarBasephi::DenseTensor const&, phi::DenseTens
or*)
3 phi::DenseTensor::mutable_data(phi::Place const&, paddle::experimental::DataType, unsigned long)
4 paddle::memory::AllocShared(phi::Place const&, unsigned long)
5 paddle::memory::allocation::AllocatorFacade::AllocShared(phi::Place const&, unsigned long)
6 paddle::memory::allocation::AllocatorFacade::Alloc(phi::Place const&, unsigned long)
7 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
8 paddle::memory::allocation::Allocator::Allocate(unsigned long)
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::Allocator::Allocate(unsigned long)
11 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
12 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
13 phi::enforce::GetCurrentTraceBackStringabi:cxx11
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 1. Cannot allocate 14.563977GB memory on GPU 1, 17.327148GB memory has been allocated and available memory is only 14.421387GB.
Please check whether there is any other process using GPU 1.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is
export FLAGS_use_cuda_managed_memory=false
.(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
The text was updated successfully, but these errors were encountered: