Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

Open
galoiscch opened this issue Jul 5, 2017 · 20 comments

Comments

@galoiscch
Copy link

I succeeded running the program with tensorflow without gpu. However, I can't run the program with tensorflow with gpu.
The following error appears when I run the program:

Using TensorFlow backend.
2017-07-05 10:18:44.115782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-05 10:18:44.116126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.72GiB
2017-07-05 10:18:44.116175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0
2017-07-05 10:18:44.116189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y
2017-07-05 10:18:44.116214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
Illegal instruction (core dumped)

Does this program compatible with tensorflow with gpu?
The system I am using is list as following:
Ubuntu 16.04,Python 2.7.12 ,Keras 2.0.5,Tensorflow 1.2.0,CUDA 8.0, V8.0.61
,cuDNN 6.0

@galoiscch
Copy link
Author

update:
I now realize that I actually didn't try this age-estimation program in this computer. I only produced a successful result in another computer with i5 cpu. The problem of this computer is that it has a very old cpu(E5200), the old cpu is not supported by dlib installed by .whl(sudo pip install dlib)
The solution is as following:
davisking/dlib#620
By downloading dlib and compile it yourself, the dlib will suit your computer hardware configuration.

I downloaded dlib here: https://github.com/davisking/dlib/

Before compiling dlib, I edited dlib's tools/python/CMakeLists.txt file from:

set(USE_SSE4_INSTRUCTIONS ON CACHE BOOL "Use SSE4 instructions")

to:

set(USE_SSE2_INSTRUCTIONS ON CACHE BOOL "Use SSE2 instructions")

Then I run

python3 setup.py install

But Now, I encounter another problem. After I run the program, a window showing webcam captured image is pop out. However, when there is a human face captured by the webcam, the program crashed.
The following is the error:

Using TensorFlow backend.
2017-07-06 09:35:29.039507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-06 09:35:29.039853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.71GiB
2017-07-06 09:35:29.039903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0
2017-07-06 09:35:29.039917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y
2017-07-06 09:35:29.039941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
2017-07-06 09:35:33.377692: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-07-06 09:35:33.377776: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-07-06 09:35:33.377796: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

"could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" only appears when a human is captured.
P.S. I change the program a little bit. I added a line " if len(results)>0:" before the line "predicted_genders = results[0]", so that a window will pop out even if there is no human face in it

@galoiscch
Copy link
Author

Update:
I suspected that the problem stem from the memory allocation method of tensorflow. Knowing that we are unable to limit the gpu's memory usage when using keras with tensorflow backend(such as gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)), I switch to use Keras with Theano.
It works. However, the age were misjudged by a relatively large amount. The result is less desirable than the output produced using CPU(i5). Therefore, I wonder whether this program is incompatible with Theano, or it is just the problem of the insufficient computation power of my GPU(gtx 1050)

@yu4u
Copy link
Owner

yu4u commented Jul 13, 2017

Thank you for your useful information.
Firstly, I fixed demo.py according to your comment "I added a line " if len(results)>0:".

As I did not try training the model using Theano backend, I'm not sure my program is perfectly compatible with Theano. But I think it will be.
I think the problem is in using the weights obtained with TensorFlow. I'm afraid that the Theano-trained weights are not compatible with the TensorFlow one. You can convert the weights bidirectionally as explained here to solve this problem.

@galoiscch
Copy link
Author

import os
from keras import backend as K
from keras.utils.conv_utils import convert_kernel
from wide_resnet import WideResNet

img_size=64
model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      K.set_value(layer.W, converted_w)

model.save_weights(os.path.join("pretrained_models", 'weights.18-4.06_theano.h5'))

Will this python script convert the weight file correctly? I tried to use the 'weights.18-4.06_theano.h5', but the output is the same, the age predicted from most people is around 40 years old.

@yu4u
Copy link
Owner

yu4u commented Jul 18, 2017

The above code seems to work fine according to the instruction I referred to. But it also does not work for me...
I trained the model with Theano backend so please try it:
https://drive.google.com/file/d/0B_cG1nzvVZlQWGJMc2JjdzkwcVk/view?usp=sharing

@galoiscch
Copy link
Author

Thank a lot

@galoiscch
Copy link
Author

How much time does the training process need? I used cpu for training using wiki dataset and it can only reached the fourth epoch in one day. What is the hardware configuration of your computer?

@yu4u
Copy link
Owner

yu4u commented Jul 19, 2017

I trained on GPU: CPU: i7-7700 3.60GHz, GPU: GeForce GTX1080.
Training requires 1-2 hours for imdb and 6 minutes for wiki.

If the problem is memory allocation, please try smaller model and smaller batch size:

python3 train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 16

If the image size is 64, the number of parameters can also be reduced by changing

pool = AveragePooling2D(pool_size=(8, 8), strides=(1, 1), padding="same")(relu)

to

pool = AveragePooling2D(pool_size=(16, 16), strides=(1, 1), padding="same")(relu)

@galoiscch
Copy link
Author

The Theano weight works well.

@galoiscch
Copy link
Author

After running the command
python train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 32
,I can run the training program with tensorflow with GPU binding.
Howeven, when I test the new weight file, folloing error appears,

Using TensorFlow backend.
Traceback (most recent call last):
  File "demo.py", line 97, in <module>
    main()
  File "demo.py", line 25, in main
    model.load_weights(os.path.join("pretrained_models", "weights.15-4.02.hdf5"))
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2572, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2981, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 19 layers into a model with 31 layers.

I wonder if it is due to the change I made in the command line. Thanks.
I didn't change the number of parameters.

@galoiscch
Copy link
Author

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

@yu4u
Copy link
Owner

yu4u commented Jul 21, 2017

demo.py is just a demo script, which assumes to use the pre-trained model as you can see:

model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

But I added demo.py options to identify the weight file, depth, and width parameters. Please refer to the latest version of demo.py.

@yu4u
Copy link
Owner

yu4u commented Jul 21, 2017

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

These options --depth 10 --width 4 control the number of parameters used in the CNN, thus it is natural that the size of the weight file changes.

@galoiscch
Copy link
Author

Much obliged. I can run the demo.py with my weight file now.

@sbharadwajj
Copy link

Hi, did you run with tensorflow backend using GPU?

@galoiscch
Copy link
Author

I think I tried running the program with tensorflow backend using GPU, but it failed. It has been a long time and my memory on this project became quite rusty. I am sorry about that.

@sbharadwajj
Copy link

Thank you.
@yu4u do you run it on GPU? Do you have any suggestions on how to fix it for Gpu?

@yu4u
Copy link
Owner

yu4u commented Oct 17, 2018

I did not run demo.py on a machine with GPUs but I think it works.
Is there any problem?

@sbharadwajj
Copy link

Works perfectly with tensorflow-gpu 1.10.

@nyck33
Copy link

nyck33 commented Sep 11, 2019

@galoiscch

I'm running into memory issues when running this in a conda env with Pytorch GPU and Tensorflow GPU (detection done by Tencent DSFD and not dlib):
https://github.com/TencentYoutuResearch/FaceDetection-DSFD

So I want to use a shallower and narrower model. Can you provide the smaller weights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants