Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use GPU in the local mode. #34

Closed
howtocodewang opened this issue Nov 8, 2018 · 12 comments · Fixed by #75
Closed

Can't use GPU in the local mode. #34

howtocodewang opened this issue Nov 8, 2018 · 12 comments · Fixed by #75
Assignees
Labels
enhancement New feature or request

Comments

@howtocodewang
Copy link

Describe the bug
My environment settings are:

  1. Anaconda env and Python = 3.6
  2. Tensorflow = 1.11.0
  3. CUDA version = 9.0

To Reproduce
Steps to reproduce the behavior:

  1. I run the demo shown in the local mode:
    "./scripts/run_local.sh nets/resnet_at_cifar10_run.py"
  2. Then I got these information as blow:
    (pruning_tf) daisy@deep-learning:~/Pruning_and_Compression/PocketFlow$ ./scripts/run_local.sh nets/resnet_at_cifar10_run.py Python script: nets/resnet_at_cifar10_run.py of GPUs: 1 extra arguments: --model_http_url https://api.ai.tencent.com/pocketflow --data_dir_local /home/daisy/Pruning_and_Compression/PocketFlow/data/cifar-10-batches-bin Traceback (most recent call last): File "utils/get_idle_gpus.py", line 54, in <module> raise ValueError('not enough idle GPUs; idle GPUs are: {}'.format(idle_gpus)) ValueError: not enough idle GPUs; idle GPUs are: [] 'nets/resnet_at_cifar10_run.py' -> 'main.py' multi-GPU training disabled [WARNING] TF-Plus & Horovod cannot be imported; multi-GPU training is unsupported INFO:tensorflow:FLAGS: INFO:tensorflow:data_disk: local INFO:tensorflow:data_hdfs_host: None INFO:tensorflow:data_dir_local: /home/daisy/Pruning_and_Compression/PocketFlow/data/cifar-10-batches-bin INFO:tensorflow:data_dir_hdfs: None INFO:tensorflow:cycle_length: 4 INFO:tensorflow:nb_threads: 8 INFO:tensorflow:buffer_size: 1024 INFO:tensorflow:prefetch_size: 8 INFO:tensorflow:nb_classes: 10 INFO:tensorflow:nb_smpls_train: 50000 INFO:tensorflow:nb_smpls_val: 5000 INFO:tensorflow:nb_smpls_eval: 10000 INFO:tensorflow:batch_size: 128 INFO:tensorflow:batch_size_eval: 100 INFO:tensorflow:resnet_size: 20 INFO:tensorflow:lrn_rate_init: 0.1 INFO:tensorflow:batch_size_norm: 128.0 INFO:tensorflow:momentum: 0.9 INFO:tensorflow:loss_w_dcy: 0.0002 INFO:tensorflow:model_http_url: https://api.ai.tencent.com/pocketflow INFO:tensorflow:summ_step: 100 INFO:tensorflow:save_step: 10000 INFO:tensorflow:save_path: ./models/model.ckpt INFO:tensorflow:save_path_eval: ./models_eval/model.ckpt INFO:tensorflow:enbl_dst: False INFO:tensorflow:enbl_warm_start: False INFO:tensorflow:loss_w_dst: 4.0 INFO:tensorflow:tempr_dst: 4.0 INFO:tensorflow:save_path_dst: ./models_dst/model.ckpt INFO:tensorflow:nb_epochs_rat: 1.0 INFO:tensorflow:ddpg_actor_depth: 2 INFO:tensorflow:ddpg_actor_width: 64 INFO:tensorflow:ddpg_critic_depth: 2 INFO:tensorflow:ddpg_critic_width: 64 INFO:tensorflow:ddpg_noise_type: param INFO:tensorflow:ddpg_noise_prtl: tdecy INFO:tensorflow:ddpg_noise_std_init: 1.0 INFO:tensorflow:ddpg_noise_dst_finl: 0.01 INFO:tensorflow:ddpg_noise_adpt_rat: 1.03 INFO:tensorflow:ddpg_noise_std_finl: 1e-05 INFO:tensorflow:ddpg_rms_eps: 0.0001 INFO:tensorflow:ddpg_tau: 0.01 INFO:tensorflow:ddpg_gamma: 0.9 INFO:tensorflow:ddpg_lrn_rate: 0.001 INFO:tensorflow:ddpg_loss_w_dcy: 0.0 INFO:tensorflow:ddpg_record_step: 1 INFO:tensorflow:ddpg_batch_size: 64 INFO:tensorflow:ddpg_enbl_bsln_func: True INFO:tensorflow:ddpg_bsln_decy_rate: 0.95 INFO:tensorflow:ws_save_path: ./models_ws/model.ckpt INFO:tensorflow:ws_prune_ratio: 0.75 INFO:tensorflow:ws_prune_ratio_prtl: optimal INFO:tensorflow:ws_nb_rlouts: 200 INFO:tensorflow:ws_nb_rlouts_min: 50 INFO:tensorflow:ws_reward_type: single-obj INFO:tensorflow:ws_lrn_rate_rg: 0.03 INFO:tensorflow:ws_nb_iters_rg: 20 INFO:tensorflow:ws_lrn_rate_ft: 0.0003 INFO:tensorflow:ws_nb_iters_ft: 400 INFO:tensorflow:ws_nb_iters_feval: 25 INFO:tensorflow:ws_prune_ratio_exp: 3.0 INFO:tensorflow:ws_iter_ratio_beg: 0.1 INFO:tensorflow:ws_iter_ratio_end: 0.5 INFO:tensorflow:ws_mask_update_step: 500.0 INFO:tensorflow:cp_lasso: True INFO:tensorflow:cp_quadruple: False INFO:tensorflow:cp_reward_policy: accuracy INFO:tensorflow:cp_nb_points_per_layer: 10 INFO:tensorflow:cp_nb_batches: 60 INFO:tensorflow:cp_prune_option: auto INFO:tensorflow:cp_prune_list_file: ratio.list INFO:tensorflow:cp_best_path: ./models/best_model.ckpt INFO:tensorflow:cp_original_path: ./models/original_model.ckpt INFO:tensorflow:cp_preserve_ratio: 0.5 INFO:tensorflow:cp_uniform_preserve_ratio: 0.6 INFO:tensorflow:cp_noise_tolerance: 0.15 INFO:tensorflow:cp_lrn_rate_ft: 0.0001 INFO:tensorflow:cp_nb_iters_ft_ratio: 0.2 INFO:tensorflow:cp_finetune: False INFO:tensorflow:cp_retrain: False INFO:tensorflow:cp_list_group: 1000 INFO:tensorflow:cp_nb_rlouts: 200 INFO:tensorflow:cp_nb_rlouts_min: 50 INFO:tensorflow:dcp_save_path: ./models_dcp/model.ckpt INFO:tensorflow:dcp_save_path_eval: ./models_dcp_eval/model.ckpt INFO:tensorflow:dcp_prune_ratio: 0.5 INFO:tensorflow:dcp_nb_stages: 3 INFO:tensorflow:dcp_lrn_rate_adam: 0.001 INFO:tensorflow:dcp_nb_iters_block: 10000 INFO:tensorflow:dcp_nb_iters_layer: 500 INFO:tensorflow:uql_equivalent_bits: 4 INFO:tensorflow:uql_nb_rlouts: 200 INFO:tensorflow:uql_w_bit_min: 2 INFO:tensorflow:uql_w_bit_max: 8 INFO:tensorflow:uql_tune_layerwise_steps: 100 INFO:tensorflow:uql_tune_global_steps: 2000 INFO:tensorflow:uql_tune_save_path: ./rl_tune_models/model.ckpt INFO:tensorflow:uql_tune_disp_steps: 300 INFO:tensorflow:uql_enbl_random_layers: True INFO:tensorflow:uql_enbl_rl_agent: False INFO:tensorflow:uql_enbl_rl_global_tune: True INFO:tensorflow:uql_enbl_rl_layerwise_tune: False INFO:tensorflow:uql_weight_bits: 4 INFO:tensorflow:uql_activation_bits: 32 INFO:tensorflow:uql_use_buckets: False INFO:tensorflow:uql_bucket_size: 256 INFO:tensorflow:uql_quant_epochs: 60 INFO:tensorflow:uql_save_quant_model_path: ./uql_quant_models/uql_quant_model.ckpt INFO:tensorflow:uql_quantize_all_layers: False INFO:tensorflow:uql_bucket_type: channel INFO:tensorflow:uqtf_save_path: ./models_uqtf/model.ckpt INFO:tensorflow:uqtf_save_path_eval: ./models_uqtf_eval/model.ckpt INFO:tensorflow:uqtf_weight_bits: 8 INFO:tensorflow:uqtf_activation_bits: 8 INFO:tensorflow:uqtf_quant_delay: 0 INFO:tensorflow:uqtf_freeze_bn_delay: None INFO:tensorflow:uqtf_lrn_rate_dcy: 0.01 INFO:tensorflow:nuql_equivalent_bits: 4 INFO:tensorflow:nuql_nb_rlouts: 200 INFO:tensorflow:nuql_w_bit_min: 2 INFO:tensorflow:nuql_w_bit_max: 8 INFO:tensorflow:nuql_tune_layerwise_steps: 100 INFO:tensorflow:nuql_tune_global_steps: 2101 INFO:tensorflow:nuql_tune_save_path: ./rl_tune_models/model.ckpt INFO:tensorflow:nuql_tune_disp_steps: 300 INFO:tensorflow:nuql_enbl_random_layers: True INFO:tensorflow:nuql_enbl_rl_agent: False INFO:tensorflow:nuql_enbl_rl_global_tune: True INFO:tensorflow:nuql_enbl_rl_layerwise_tune: False INFO:tensorflow:nuql_init_style: quantile INFO:tensorflow:nuql_opt_mode: weights INFO:tensorflow:nuql_weight_bits: 4 INFO:tensorflow:nuql_activation_bits: 32 INFO:tensorflow:nuql_use_buckets: False INFO:tensorflow:nuql_bucket_size: 256 INFO:tensorflow:nuql_quant_epochs: 60 INFO:tensorflow:nuql_save_quant_model_path: ./nuql_quant_models/model.ckpt INFO:tensorflow:nuql_quantize_all_layers: False INFO:tensorflow:nuql_bucket_type: split INFO:tensorflow:log_dir: ./logs INFO:tensorflow:enbl_multi_gpu: False INFO:tensorflow:learner: full-prec INFO:tensorflow:exec_mode: train INFO:tensorflow:debug: False INFO:tensorflow:h: False INFO:tensorflow:help: False INFO:tensorflow:helpfull: False INFO:tensorflow:helpshort: False 2018-11-08 09:10:59.811925: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-11-08 09:10:59.814345: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2018-11-08 09:10:59.814367: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: deep-learning 2018-11-08 09:10:59.814374: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: deep-learning 2018-11-08 09:10:59.814396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.12.0 2018-11-08 09:10:59.814417: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.12.0 2018-11-08 09:10:59.814424: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 390.12.0 INFO:tensorflow:iter #100: lr = 1.0000e-01 | loss = 1.7772e+00 | accuracy = 3.7500e-01 | speed = 95.59 pics / sec

Expected behavior
Firstly, I found that the speed is too slow. I thought that the GPU device is not used. Then, I noticed that
failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected in the print information. So I checked the GPU device information using nvidia-smi in terminal. I got these information,
`(pruning_tf) daisy@deep-learning:~$ nvidia-smi
Thu Nov 8 09:25:12 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12 Driver Version: 390.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti Off | 00000000:01:00.0 On | N/A |
| 30% 67C P0 68W / 250W | 244MiB / 6080MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1091 G /usr/lib/xorg/Xorg 242MiB |
+-----------------------------------------------------------------------------+`

,which proved that I had installed right CUDA version and nvidia driver verison.
Last, I tried to import tensorflow module in the python. But it did not report any error about CUDA. What it printed is
`(pruning_tf) daisy@deep-learning:~$ python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
/home/daisy/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters

`
I don't think the further warning is the key reason.

The reason for slow speed and low GPU utilization is not using GPU device.

So can anyone help me solve this problem? Thanks a lot !

Desktop (please complete the following information):

  • OS: [Ubuntu 16.04]
  • CUDA [9.0]
  • GPU [GTX 980ti]
  • Python [3.6]
  • Tensorflow [1.11.0]
@jiaxiang-wu
Copy link
Contributor

Thanks for the detailed description.
This is caused by the same problem as discussed in issue #29. PocketFlow failed to use the GPU device due to the over-strict definition of "idle GPU" in utils/get_idle_gpus.py. In that Python script, only GPUs without any running processes are defined as idle GPUs. This is often not the case, if the GPU is also used for desktop rendering.

We will fix this problem ASAP. Sorry for your trouble.

@jiaxiang-wu jiaxiang-wu self-assigned this Nov 8, 2018
@jiaxiang-wu jiaxiang-wu added the enhancement New feature or request label Nov 8, 2018
@jiaxiang-wu
Copy link
Contributor

Enhancement required: change the way of detecting available GPUs.

@howtocodewang
Copy link
Author

@jiaxiang-wu Thanks for reply and explanation. And I realized that why the test passed on a machine with 4 GPUs and failed on this machine with one GPU. I will follow your update and test later.

@howtocodewang
Copy link
Author

@jiaxiang-wu In addition, I hope you could notice me after fixing the problem. Thanks.

@jiaxiang-wu
Copy link
Contributor

@howtocodewang No problem. We will update this issue after the fix is done, and also send you an e-mail as notification.

@KranthiGV
Copy link
Contributor

Will be glad to send a PR fixing this. I would like to have a clarification on the new policy though.
What should be termed as an idle GPU? [Currently..a GPU with no processes running is defined as idle gpu in utils/get_idle_gpus.py].

@howtocodewang
Copy link
Author

@KranthiGV In my opinion, an idle GPU which is defined by authors is a GPU with no any processes even as a display device. That means if you only have one GPU in your PC, your GPU will process some display tasks at the same time while you are running some GPU-based algorithm. In this situation, your GPU is not an idle GPU. So I encountered this issue on a PC with one GPU device but ran the demo successfully on a PC with 4 GPUs.

@jiaxiang-wu
Copy link
Contributor

@howtocodewang Yes, you are right.
@KranthiGV How about a GPU with at least 50% GPU memory free? Or, if multiple GPUs are presented, return the GPU with the most free memory? What's your opinion?

@KranthiGV
Copy link
Contributor

  1. At least 50% free memory of GPU seems to be a good assumption since display processes won't take more than 50%. They usually take only a few hundreds of MB.
  2. It multiple GPUs are present, returning GPU with most free memory is not desirable. Currently, we return a list of GPUs (example 0,1,2,3 or 0,1, etc). Returning only the GPU with most free memory would prevent multi-GPU usage.

I'll go ahead with implementing 1st policy.

@jiaxiang-wu
Copy link
Contributor

@KranthiGV Great, looking forward to your contribution.

@KranthiGV
Copy link
Contributor

@jiaxiang-wu
I cannot find dev branch to send the PR. There's only master.
Should I raise PR for master branch?
My fix is here: idle_gpu_fix

@jiaxiang-wu
Copy link
Contributor

@KranthiGV
Yes, please raise a pull request to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants