-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault #18
Comments
It seems that you are using the Python version of CIFAR-10 data set.
Please use the binary version instead. Same issue as this one. |
|
It seems PocketFlow failed to find an idle GPU device. Can you post the result of nvidia-smi? $ nvidia-smi |
`(tf-1.10-cp3) [zgz@localhost models]$ nvidia-smi +-----------------------------------------------------------------------------+ `>>> from tensorflow.python.client import device_lib
|
In To temporarily override this, you may skip calling |
OK, thanks |
Closing this issue. Reopen it if there is any further questions. |
python3.6 cuda9 cudnn7 tensorflow:1.10.0
`(tf-1.10-cp3) [zgz@localhost PocketFlow]$ ./scripts/run_local.sh nets/resnet_at_cifar10_run.py
Python script: nets/resnet_at_cifar10_run.py
of GPUs: 1
extra arguments: --model_http_url https://api.ai.tencent.com/pocketflow --data_dir_local /home/zgz/project/data_set/cifar-10-batches-py
Traceback (most recent call last):
File "utils/get_idle_gpus.py", line 54, in
raise ValueError('not enough idle GPUs; idle GPUs are: {}'.format(idle_gpus))
ValueError: not enough idle GPUs; idle GPUs are: []
‘nets/resnet_at_cifar10_run.py’ -> ‘main.py’
multi-GPU training disabled
[WARNING] TF-Plus & Horovod cannot be imported; multi-GPU training is unsupported
INFO:tensorflow:FLAGS:
INFO:tensorflow:data_disk: local
INFO:tensorflow:data_hdfs_host: None
INFO:tensorflow:data_dir_local: /home/zgz/project/data_set/cifar-10-batches-py
INFO:tensorflow:data_dir_hdfs: None
INFO:tensorflow:cycle_length: 4
INFO:tensorflow:nb_threads: 8
INFO:tensorflow:buffer_size: 1024
INFO:tensorflow:prefetch_size: 8
INFO:tensorflow:nb_classes: 10
INFO:tensorflow:nb_smpls_train: 50000
INFO:tensorflow:nb_smpls_val: 5000
INFO:tensorflow:nb_smpls_eval: 10000
INFO:tensorflow:batch_size: 128
INFO:tensorflow:batch_size_eval: 100
INFO:tensorflow:resnet_size: 20
INFO:tensorflow:lrn_rate_init: 0.1
INFO:tensorflow:batch_size_norm: 128.0
INFO:tensorflow:momentum: 0.9
INFO:tensorflow:loss_w_dcy: 0.0002
INFO:tensorflow:model_http_url: https://api.ai.tencent.com/pocketflow
INFO:tensorflow:summ_step: 100
INFO:tensorflow:save_step: 10000
INFO:tensorflow:save_path: ./models/model.ckpt
INFO:tensorflow:save_path_eval: ./models_eval/model.ckpt
INFO:tensorflow:enbl_dst: False
INFO:tensorflow:enbl_warm_start: False
INFO:tensorflow:loss_w_dst: 4.0
INFO:tensorflow:tempr_dst: 4.0
INFO:tensorflow:save_path_dst: ./models_dst/model.ckpt
INFO:tensorflow:nb_epochs_rat: 1.0
INFO:tensorflow:ddpg_actor_depth: 2
INFO:tensorflow:ddpg_actor_width: 64
INFO:tensorflow:ddpg_critic_depth: 2
INFO:tensorflow:ddpg_critic_width: 64
INFO:tensorflow:ddpg_noise_type: param
INFO:tensorflow:ddpg_noise_prtl: tdecy
INFO:tensorflow:ddpg_noise_std_init: 1.0
INFO:tensorflow:ddpg_noise_dst_finl: 0.01
INFO:tensorflow:ddpg_noise_adpt_rat: 1.03
INFO:tensorflow:ddpg_noise_std_finl: 1e-05
INFO:tensorflow:ddpg_rms_eps: 0.0001
INFO:tensorflow:ddpg_tau: 0.01
INFO:tensorflow:ddpg_gamma: 0.9
INFO:tensorflow:ddpg_lrn_rate: 0.001
INFO:tensorflow:ddpg_loss_w_dcy: 0.0
INFO:tensorflow:ddpg_record_step: 1
INFO:tensorflow:ddpg_batch_size: 64
INFO:tensorflow:ddpg_enbl_bsln_func: True
INFO:tensorflow:ddpg_bsln_decy_rate: 0.95
INFO:tensorflow:ws_save_path: ./models_ws/model.ckpt
INFO:tensorflow:ws_prune_ratio: 0.75
INFO:tensorflow:ws_prune_ratio_prtl: optimal
INFO:tensorflow:ws_nb_rlouts: 200
INFO:tensorflow:ws_nb_rlouts_min: 50
INFO:tensorflow:ws_reward_type: single-obj
INFO:tensorflow:ws_lrn_rate_rg: 0.03
INFO:tensorflow:ws_nb_iters_rg: 20
INFO:tensorflow:ws_lrn_rate_ft: 0.0003
INFO:tensorflow:ws_nb_iters_ft: 400
INFO:tensorflow:ws_nb_iters_feval: 25
INFO:tensorflow:ws_prune_ratio_exp: 3.0
INFO:tensorflow:ws_iter_ratio_beg: 0.1
INFO:tensorflow:ws_iter_ratio_end: 0.5
INFO:tensorflow:ws_mask_update_step: 500.0
INFO:tensorflow:cp_lasso: True
INFO:tensorflow:cp_quadruple: False
INFO:tensorflow:cp_reward_policy: accuracy
INFO:tensorflow:cp_nb_points_per_layer: 10
INFO:tensorflow:cp_nb_batches: 60
INFO:tensorflow:cp_prune_option: auto
INFO:tensorflow:cp_prune_list_file: ratio.list
INFO:tensorflow:cp_best_path: ./models/best_model.ckpt
INFO:tensorflow:cp_original_path: ./models/original_model.ckpt
INFO:tensorflow:cp_preserve_ratio: 0.5
INFO:tensorflow:cp_uniform_preserve_ratio: 0.6
INFO:tensorflow:cp_noise_tolerance: 0.15
INFO:tensorflow:cp_lrn_rate_ft: 0.0001
INFO:tensorflow:cp_nb_iters_ft_ratio: 0.2
INFO:tensorflow:cp_finetune: False
INFO:tensorflow:cp_retrain: False
INFO:tensorflow:cp_list_group: 1000
INFO:tensorflow:cp_nb_rlouts: 200
INFO:tensorflow:cp_nb_rlouts_min: 50
INFO:tensorflow:dcp_save_path: ./models_dcp/model.ckpt
INFO:tensorflow:dcp_save_path_eval: ./models_dcp_eval/model.ckpt
INFO:tensorflow:dcp_prune_ratio: 0.5
INFO:tensorflow:dcp_nb_stages: 3
INFO:tensorflow:dcp_lrn_rate_adam: 0.001
INFO:tensorflow:dcp_nb_iters_block: 10000
INFO:tensorflow:dcp_nb_iters_layer: 500
INFO:tensorflow:uql_equivalent_bits: 4
INFO:tensorflow:uql_nb_rlouts: 200
INFO:tensorflow:uql_w_bit_min: 2
INFO:tensorflow:uql_w_bit_max: 8
INFO:tensorflow:uql_tune_layerwise_steps: 100
INFO:tensorflow:uql_tune_global_steps: 2000
INFO:tensorflow:uql_tune_save_path: ./rl_tune_models/model.ckpt
INFO:tensorflow:uql_tune_disp_steps: 300
INFO:tensorflow:uql_enbl_random_layers: True
INFO:tensorflow:uql_enbl_rl_agent: False
INFO:tensorflow:uql_enbl_rl_global_tune: True
INFO:tensorflow:uql_enbl_rl_layerwise_tune: False
INFO:tensorflow:uql_weight_bits: 4
INFO:tensorflow:uql_activation_bits: 32
INFO:tensorflow:uql_use_buckets: False
INFO:tensorflow:uql_bucket_size: 256
INFO:tensorflow:uql_quant_epochs: 60
INFO:tensorflow:uql_save_quant_model_path: ./uql_quant_models/uql_quant_model.ckpt
INFO:tensorflow:uql_quantize_all_layers: False
INFO:tensorflow:uql_bucket_type: channel
INFO:tensorflow:uqtf_save_path: ./models_uqtf/model.ckpt
INFO:tensorflow:uqtf_save_path_eval: ./models_uqtf_eval/model.ckpt
INFO:tensorflow:uqtf_weight_bits: 8
INFO:tensorflow:uqtf_activation_bits: 8
INFO:tensorflow:uqtf_quant_delay: 0
INFO:tensorflow:uqtf_freeze_bn_delay: None
INFO:tensorflow:uqtf_lrn_rate_dcy: 0.01
INFO:tensorflow:nuql_equivalent_bits: 4
INFO:tensorflow:nuql_nb_rlouts: 200
INFO:tensorflow:nuql_w_bit_min: 2
INFO:tensorflow:nuql_w_bit_max: 8
INFO:tensorflow:nuql_tune_layerwise_steps: 100
INFO:tensorflow:nuql_tune_global_steps: 2101
INFO:tensorflow:nuql_tune_save_path: ./rl_tune_models/model.ckpt
INFO:tensorflow:nuql_tune_disp_steps: 300
INFO:tensorflow:nuql_enbl_random_layers: True
INFO:tensorflow:nuql_enbl_rl_agent: False
INFO:tensorflow:nuql_enbl_rl_global_tune: True
INFO:tensorflow:nuql_enbl_rl_layerwise_tune: False
INFO:tensorflow:nuql_init_style: quantile
INFO:tensorflow:nuql_opt_mode: weights
INFO:tensorflow:nuql_weight_bits: 4
INFO:tensorflow:nuql_activation_bits: 32
INFO:tensorflow:nuql_use_buckets: False
INFO:tensorflow:nuql_bucket_size: 256
INFO:tensorflow:nuql_quant_epochs: 60
INFO:tensorflow:nuql_save_quant_model_path: ./nuql_quant_models/model.ckpt
INFO:tensorflow:nuql_quantize_all_layers: False
INFO:tensorflow:nuql_bucket_type: split
INFO:tensorflow:log_dir: ./logs
INFO:tensorflow:enbl_multi_gpu: False
INFO:tensorflow:learner: full-prec
INFO:tensorflow:exec_mode: train
INFO:tensorflow:debug: False
INFO:tensorflow:h: False
INFO:tensorflow:help: False
INFO:tensorflow:helpfull: False
INFO:tensorflow:helpshort: False
2018-11-06 12:51:30.550266: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-06 12:51:30.582314: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-11-06 12:51:30.582432: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: localhost.localdomain
2018-11-06 12:51:30.582455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: localhost.localdomain
2018-11-06 12:51:30.582543: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 396.26.0
2018-11-06 12:51:30.582644: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 396.26.0
2018-11-06 12:51:30.582667: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 396.26.0
./scripts/run_local.sh: line 45: 43704 Segmentation fault (core dumped) python main.py ${extra_args}
`
path.conf
`# data files
data_hdfs_host = None
data_dir_local_cifar10 = /home/zgz/project/data_set/cifar-10-batches-py
data_dir_hdfs_cifar10 = None
data_dir_seven_cifar10 = None
data_dir_docker_cifar10 = /opt/ml/data # DO NOT EDIT
data_dir_local_ilsvrc12 = None
data_dir_hdfs_ilsvrc12 = None
data_dir_seven_ilsvrc12 = None
data_dir_docker_ilsvrc12 = /opt/ml/data # DO NOT EDIT
model files
model_http_url = https://api.ai.tencent.com/pocketflow
`
The text was updated successfully, but these errors were encountered: