Can't use GPU in the local mode. #34

howtocodewang · 2018-11-08T01:38:03Z

Describe the bug
My environment settings are:

Anaconda env and Python = 3.6
Tensorflow = 1.11.0
CUDA version = 9.0

To Reproduce
Steps to reproduce the behavior:

I run the demo shown in the local mode:
"./scripts/run_local.sh nets/resnet_at_cifar10_run.py"
Then I got these information as blow:
(pruning_tf) daisy@deep-learning:~/Pruning_and_Compression/PocketFlow$ ./scripts/run_local.sh nets/resnet_at_cifar10_run.py Python script: nets/resnet_at_cifar10_run.py of GPUs: 1 extra arguments: --model_http_url https://api.ai.tencent.com/pocketflow --data_dir_local /home/daisy/Pruning_and_Compression/PocketFlow/data/cifar-10-batches-bin Traceback (most recent call last): File "utils/get_idle_gpus.py", line 54, in <module> raise ValueError('not enough idle GPUs; idle GPUs are: {}'.format(idle_gpus)) ValueError: not enough idle GPUs; idle GPUs are: [] 'nets/resnet_at_cifar10_run.py' -> 'main.py' multi-GPU training disabled [WARNING] TF-Plus & Horovod cannot be imported; multi-GPU training is unsupported INFO:tensorflow:FLAGS: INFO:tensorflow:data_disk: local INFO:tensorflow:data_hdfs_host: None INFO:tensorflow:data_dir_local: /home/daisy/Pruning_and_Compression/PocketFlow/data/cifar-10-batches-bin INFO:tensorflow:data_dir_hdfs: None INFO:tensorflow:cycle_length: 4 INFO:tensorflow:nb_threads: 8 INFO:tensorflow:buffer_size: 1024 INFO:tensorflow:prefetch_size: 8 INFO:tensorflow:nb_classes: 10 INFO:tensorflow:nb_smpls_train: 50000 INFO:tensorflow:nb_smpls_val: 5000 INFO:tensorflow:nb_smpls_eval: 10000 INFO:tensorflow:batch_size: 128 INFO:tensorflow:batch_size_eval: 100 INFO:tensorflow:resnet_size: 20 INFO:tensorflow:lrn_rate_init: 0.1 INFO:tensorflow:batch_size_norm: 128.0 INFO:tensorflow:momentum: 0.9 INFO:tensorflow:loss_w_dcy: 0.0002 INFO:tensorflow:model_http_url: https://api.ai.tencent.com/pocketflow INFO:tensorflow:summ_step: 100 INFO:tensorflow:save_step: 10000 INFO:tensorflow:save_path: ./models/model.ckpt INFO:tensorflow:save_path_eval: ./models_eval/model.ckpt INFO:tensorflow:enbl_dst: False INFO:tensorflow:enbl_warm_start: False INFO:tensorflow:loss_w_dst: 4.0 INFO:tensorflow:tempr_dst: 4.0 INFO:tensorflow:save_path_dst: ./models_dst/model.ckpt INFO:tensorflow:nb_epochs_rat: 1.0 INFO:tensorflow:ddpg_actor_depth: 2 INFO:tensorflow:ddpg_actor_width: 64 INFO:tensorflow:ddpg_critic_depth: 2 INFO:tensorflow:ddpg_critic_width: 64 INFO:tensorflow:ddpg_noise_type: param INFO:tensorflow:ddpg_noise_prtl: tdecy INFO:tensorflow:ddpg_noise_std_init: 1.0 INFO:tensorflow:ddpg_noise_dst_finl: 0.01 INFO:tensorflow:ddpg_noise_adpt_rat: 1.03 INFO:tensorflow:ddpg_noise_std_finl: 1e-05 INFO:tensorflow:ddpg_rms_eps: 0.0001 INFO:tensorflow:ddpg_tau: 0.01 INFO:tensorflow:ddpg_gamma: 0.9 INFO:tensorflow:ddpg_lrn_rate: 0.001 INFO:tensorflow:ddpg_loss_w_dcy: 0.0 INFO:tensorflow:ddpg_record_step: 1 INFO:tensorflow:ddpg_batch_size: 64 INFO:tensorflow:ddpg_enbl_bsln_func: True INFO:tensorflow:ddpg_bsln_decy_rate: 0.95 INFO:tensorflow:ws_save_path: ./models_ws/model.ckpt INFO:tensorflow:ws_prune_ratio: 0.75 INFO:tensorflow:ws_prune_ratio_prtl: optimal INFO:tensorflow:ws_nb_rlouts: 200 INFO:tensorflow:ws_nb_rlouts_min: 50 INFO:tensorflow:ws_reward_type: single-obj INFO:tensorflow:ws_lrn_rate_rg: 0.03 INFO:tensorflow:ws_nb_iters_rg: 20 INFO:tensorflow:ws_lrn_rate_ft: 0.0003 INFO:tensorflow:ws_nb_iters_ft: 400 INFO:tensorflow:ws_nb_iters_feval: 25 INFO:tensorflow:ws_prune_ratio_exp: 3.0 INFO:tensorflow:ws_iter_ratio_beg: 0.1 INFO:tensorflow:ws_iter_ratio_end: 0.5 INFO:tensorflow:ws_mask_update_step: 500.0 INFO:tensorflow:cp_lasso: True INFO:tensorflow:cp_quadruple: False INFO:tensorflow:cp_reward_policy: accuracy INFO:tensorflow:cp_nb_points_per_layer: 10 INFO:tensorflow:cp_nb_batches: 60 INFO:tensorflow:cp_prune_option: auto INFO:tensorflow:cp_prune_list_file: ratio.list INFO:tensorflow:cp_best_path: ./models/best_model.ckpt INFO:tensorflow:cp_original_path: ./models/original_model.ckpt INFO:tensorflow:cp_preserve_ratio: 0.5 INFO:tensorflow:cp_uniform_preserve_ratio: 0.6 INFO:tensorflow:cp_noise_tolerance: 0.15 INFO:tensorflow:cp_lrn_rate_ft: 0.0001 INFO:tensorflow:cp_nb_iters_ft_ratio: 0.2 INFO:tensorflow:cp_finetune: False INFO:tensorflow:cp_retrain: False INFO:tensorflow:cp_list_group: 1000 INFO:tensorflow:cp_nb_rlouts: 200 INFO:tensorflow:cp_nb_rlouts_min: 50 INFO:tensorflow:dcp_save_path: ./models_dcp/model.ckpt INFO:tensorflow:dcp_save_path_eval: ./models_dcp_eval/model.ckpt INFO:tensorflow:dcp_prune_ratio: 0.5 INFO:tensorflow:dcp_nb_stages: 3 INFO:tensorflow:dcp_lrn_rate_adam: 0.001 INFO:tensorflow:dcp_nb_iters_block: 10000 INFO:tensorflow:dcp_nb_iters_layer: 500 INFO:tensorflow:uql_equivalent_bits: 4 INFO:tensorflow:uql_nb_rlouts: 200 INFO:tensorflow:uql_w_bit_min: 2 INFO:tensorflow:uql_w_bit_max: 8 INFO:tensorflow:uql_tune_layerwise_steps: 100 INFO:tensorflow:uql_tune_global_steps: 2000 INFO:tensorflow:uql_tune_save_path: ./rl_tune_models/model.ckpt INFO:tensorflow:uql_tune_disp_steps: 300 INFO:tensorflow:uql_enbl_random_layers: True INFO:tensorflow:uql_enbl_rl_agent: False INFO:tensorflow:uql_enbl_rl_global_tune: True INFO:tensorflow:uql_enbl_rl_layerwise_tune: False INFO:tensorflow:uql_weight_bits: 4 INFO:tensorflow:uql_activation_bits: 32 INFO:tensorflow:uql_use_buckets: False INFO:tensorflow:uql_bucket_size: 256 INFO:tensorflow:uql_quant_epochs: 60 INFO:tensorflow:uql_save_quant_model_path: ./uql_quant_models/uql_quant_model.ckpt INFO:tensorflow:uql_quantize_all_layers: False INFO:tensorflow:uql_bucket_type: channel INFO:tensorflow:uqtf_save_path: ./models_uqtf/model.ckpt INFO:tensorflow:uqtf_save_path_eval: ./models_uqtf_eval/model.ckpt INFO:tensorflow:uqtf_weight_bits: 8 INFO:tensorflow:uqtf_activation_bits: 8 INFO:tensorflow:uqtf_quant_delay: 0 INFO:tensorflow:uqtf_freeze_bn_delay: None INFO:tensorflow:uqtf_lrn_rate_dcy: 0.01 INFO:tensorflow:nuql_equivalent_bits: 4 INFO:tensorflow:nuql_nb_rlouts: 200 INFO:tensorflow:nuql_w_bit_min: 2 INFO:tensorflow:nuql_w_bit_max: 8 INFO:tensorflow:nuql_tune_layerwise_steps: 100 INFO:tensorflow:nuql_tune_global_steps: 2101 INFO:tensorflow:nuql_tune_save_path: ./rl_tune_models/model.ckpt INFO:tensorflow:nuql_tune_disp_steps: 300 INFO:tensorflow:nuql_enbl_random_layers: True INFO:tensorflow:nuql_enbl_rl_agent: False INFO:tensorflow:nuql_enbl_rl_global_tune: True INFO:tensorflow:nuql_enbl_rl_layerwise_tune: False INFO:tensorflow:nuql_init_style: quantile INFO:tensorflow:nuql_opt_mode: weights INFO:tensorflow:nuql_weight_bits: 4 INFO:tensorflow:nuql_activation_bits: 32 INFO:tensorflow:nuql_use_buckets: False INFO:tensorflow:nuql_bucket_size: 256 INFO:tensorflow:nuql_quant_epochs: 60 INFO:tensorflow:nuql_save_quant_model_path: ./nuql_quant_models/model.ckpt INFO:tensorflow:nuql_quantize_all_layers: False INFO:tensorflow:nuql_bucket_type: split INFO:tensorflow:log_dir: ./logs INFO:tensorflow:enbl_multi_gpu: False INFO:tensorflow:learner: full-prec INFO:tensorflow:exec_mode: train INFO:tensorflow:debug: False INFO:tensorflow:h: False INFO:tensorflow:help: False INFO:tensorflow:helpfull: False INFO:tensorflow:helpshort: False 2018-11-08 09:10:59.811925: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-11-08 09:10:59.814345: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2018-11-08 09:10:59.814367: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: deep-learning 2018-11-08 09:10:59.814374: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: deep-learning 2018-11-08 09:10:59.814396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.12.0 2018-11-08 09:10:59.814417: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.12.0 2018-11-08 09:10:59.814424: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 390.12.0 INFO:tensorflow:iter #100: lr = 1.0000e-01 | loss = 1.7772e+00 | accuracy = 3.7500e-01 | speed = 95.59 pics / sec

Expected behavior
Firstly, I found that the speed is too slow. I thought that the GPU device is not used. Then, I noticed that
failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected in the print information. So I checked the GPU device information using nvidia-smi in terminal. I got these information,
`(pruning_tf) daisy@deep-learning:~$ nvidia-smi
Thu Nov 8 09:25:12 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12 Driver Version: 390.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti Off | 00000000:01:00.0 On | N/A |
| 30% 67C P0 68W / 250W | 244MiB / 6080MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1091 G /usr/lib/xorg/Xorg 242MiB |
+-----------------------------------------------------------------------------+`

,which proved that I had installed right CUDA version and nvidia driver verison.
Last, I tried to import tensorflow module in the python. But it did not report any error about CUDA. What it printed is
`(pruning_tf) daisy@deep-learning:~$ python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
/home/daisy/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters

`
I don't think the further warning is the key reason.

The reason for slow speed and low GPU utilization is not using GPU device.

So can anyone help me solve this problem? Thanks a lot !

Desktop (please complete the following information):

OS: [Ubuntu 16.04]
CUDA [9.0]
GPU [GTX 980ti]
Python [3.6]
Tensorflow [1.11.0]

The text was updated successfully, but these errors were encountered:

jiaxiang-wu · 2018-11-08T01:45:33Z

Thanks for the detailed description.
This is caused by the same problem as discussed in issue #29. PocketFlow failed to use the GPU device due to the over-strict definition of "idle GPU" in utils/get_idle_gpus.py. In that Python script, only GPUs without any running processes are defined as idle GPUs. This is often not the case, if the GPU is also used for desktop rendering.

We will fix this problem ASAP. Sorry for your trouble.

jiaxiang-wu · 2018-11-08T01:46:05Z

Enhancement required: change the way of detecting available GPUs.

howtocodewang · 2018-11-08T02:24:31Z

@jiaxiang-wu Thanks for reply and explanation. And I realized that why the test passed on a machine with 4 GPUs and failed on this machine with one GPU. I will follow your update and test later.

howtocodewang · 2018-11-08T02:25:52Z

@jiaxiang-wu In addition, I hope you could notice me after fixing the problem. Thanks.

jiaxiang-wu · 2018-11-08T02:34:46Z

@howtocodewang No problem. We will update this issue after the fix is done, and also send you an e-mail as notification.

KranthiGV · 2018-11-09T04:42:08Z

Will be glad to send a PR fixing this. I would like to have a clarification on the new policy though.
What should be termed as an idle GPU? [Currently..a GPU with no processes running is defined as idle gpu in utils/get_idle_gpus.py].

howtocodewang · 2018-11-11T01:16:47Z

@KranthiGV In my opinion, an idle GPU which is defined by authors is a GPU with no any processes even as a display device. That means if you only have one GPU in your PC, your GPU will process some display tasks at the same time while you are running some GPU-based algorithm. In this situation, your GPU is not an idle GPU. So I encountered this issue on a PC with one GPU device but ran the demo successfully on a PC with 4 GPUs.

jiaxiang-wu · 2018-11-11T01:55:13Z

@howtocodewang Yes, you are right.
@KranthiGV How about a GPU with at least 50% GPU memory free? Or, if multiple GPUs are presented, return the GPU with the most free memory? What's your opinion?

KranthiGV · 2018-11-12T06:19:20Z

At least 50% free memory of GPU seems to be a good assumption since display processes won't take more than 50%. They usually take only a few hundreds of MB.
It multiple GPUs are present, returning GPU with most free memory is not desirable. Currently, we return a list of GPUs (example 0,1,2,3 or 0,1, etc). Returning only the GPU with most free memory would prevent multi-GPU usage.

I'll go ahead with implementing 1st policy.

jiaxiang-wu · 2018-11-12T06:23:02Z

@KranthiGV Great, looking forward to your contribution.

KranthiGV · 2018-11-16T20:27:29Z

@jiaxiang-wu
I cannot find dev branch to send the PR. There's only master.
Should I raise PR for master branch?
My fix is here: idle_gpu_fix

jiaxiang-wu · 2018-11-16T23:30:39Z

@KranthiGV
Yes, please raise a pull request to the master branch.

jiaxiang-wu self-assigned this Nov 8, 2018

jiaxiang-wu added the enhancement New feature or request label Nov 8, 2018

KranthiGV mentioned this issue Nov 17, 2018

Fix for using single GPU in local mode; Change of definition of idle GPU #75

Merged

jiaxiang-wu closed this as completed in #75 Nov 20, 2018

GoldenSpark mentioned this issue Nov 21, 2018

DisChnPrunedLearner with resnet18 on ImageNet can't converge in local mode #85

Closed

sunzhe09 mentioned this issue Dec 18, 2018

I purning my own network by DCP，but it ran into a endless loop after a warning #151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use GPU in the local mode. #34

Can't use GPU in the local mode. #34

howtocodewang commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

howtocodewang commented Nov 8, 2018

howtocodewang commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

KranthiGV commented Nov 9, 2018

howtocodewang commented Nov 11, 2018

jiaxiang-wu commented Nov 11, 2018

KranthiGV commented Nov 12, 2018

jiaxiang-wu commented Nov 12, 2018

KranthiGV commented Nov 16, 2018

jiaxiang-wu commented Nov 16, 2018

Can't use GPU in the local mode. #34

Can't use GPU in the local mode. #34

Comments

howtocodewang commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

howtocodewang commented Nov 8, 2018

howtocodewang commented Nov 8, 2018

jiaxiang-wu commented Nov 8, 2018

KranthiGV commented Nov 9, 2018

howtocodewang commented Nov 11, 2018

jiaxiang-wu commented Nov 11, 2018

KranthiGV commented Nov 12, 2018

jiaxiang-wu commented Nov 12, 2018

KranthiGV commented Nov 16, 2018

jiaxiang-wu commented Nov 16, 2018