"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

galoiscch · 2017-07-05T03:07:21Z

I succeeded running the program with tensorflow without gpu. However, I can't run the program with tensorflow with gpu.
The following error appears when I run the program:

Using TensorFlow backend.
2017-07-05 10:18:44.115782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-05 10:18:44.116126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.72GiB
2017-07-05 10:18:44.116175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0
2017-07-05 10:18:44.116189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y
2017-07-05 10:18:44.116214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
Illegal instruction (core dumped)

Does this program compatible with tensorflow with gpu?
The system I am using is list as following:
Ubuntu 16.04,Python 2.7.12 ,Keras 2.0.5,Tensorflow 1.2.0,CUDA 8.0, V8.0.61
,cuDNN 6.0

galoiscch · 2017-07-06T02:27:37Z

update:
I now realize that I actually didn't try this age-estimation program in this computer. I only produced a successful result in another computer with i5 cpu. The problem of this computer is that it has a very old cpu(E5200), the old cpu is not supported by dlib installed by .whl(sudo pip install dlib)
The solution is as following:
davisking/dlib#620
By downloading dlib and compile it yourself, the dlib will suit your computer hardware configuration.

I downloaded dlib here: https://github.com/davisking/dlib/

Before compiling dlib, I edited dlib's tools/python/CMakeLists.txt file from:

set(USE_SSE4_INSTRUCTIONS ON CACHE BOOL "Use SSE4 instructions")

to:

set(USE_SSE2_INSTRUCTIONS ON CACHE BOOL "Use SSE2 instructions")

Then I run

python3 setup.py install

But Now, I encounter another problem. After I run the program, a window showing webcam captured image is pop out. However, when there is a human face captured by the webcam, the program crashed.
The following is the error:

Using TensorFlow backend.
2017-07-06 09:35:29.039507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-06 09:35:29.039853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.71GiB
2017-07-06 09:35:29.039903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0
2017-07-06 09:35:29.039917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y
2017-07-06 09:35:29.039941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
2017-07-06 09:35:33.377692: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-07-06 09:35:33.377776: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-07-06 09:35:33.377796: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

"could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" only appears when a human is captured.
P.S. I change the program a little bit. I added a line " if len(results)>0:" before the line "predicted_genders = results[0]", so that a window will pop out even if there is no human face in it

galoiscch · 2017-07-07T03:16:21Z

Update:
I suspected that the problem stem from the memory allocation method of tensorflow. Knowing that we are unable to limit the gpu's memory usage when using keras with tensorflow backend(such as gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)), I switch to use Keras with Theano.
It works. However, the age were misjudged by a relatively large amount. The result is less desirable than the output produced using CPU(i5). Therefore, I wonder whether this program is incompatible with Theano, or it is just the problem of the insufficient computation power of my GPU(gtx 1050)

yu4u · 2017-07-13T16:53:44Z

Thank you for your useful information.
Firstly, I fixed demo.py according to your comment "I added a line " if len(results)>0:".

As I did not try training the model using Theano backend, I'm not sure my program is perfectly compatible with Theano. But I think it will be.
I think the problem is in using the weights obtained with TensorFlow. I'm afraid that the Theano-trained weights are not compatible with the TensorFlow one. You can convert the weights bidirectionally as explained here to solve this problem.

galoiscch · 2017-07-17T02:09:13Z

import os
from keras import backend as K
from keras.utils.conv_utils import convert_kernel
from wide_resnet import WideResNet

img_size=64
model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      K.set_value(layer.W, converted_w)

model.save_weights(os.path.join("pretrained_models", 'weights.18-4.06_theano.h5'))

Will this python script convert the weight file correctly? I tried to use the 'weights.18-4.06_theano.h5', but the output is the same, the age predicted from most people is around 40 years old.

yu4u · 2017-07-18T14:59:30Z

The above code seems to work fine according to the instruction I referred to. But it also does not work for me...
I trained the model with Theano backend so please try it:
https://drive.google.com/file/d/0B_cG1nzvVZlQWGJMc2JjdzkwcVk/view?usp=sharing

galoiscch · 2017-07-18T15:20:23Z

Thank a lot

galoiscch · 2017-07-18T15:25:37Z

How much time does the training process need? I used cpu for training using wiki dataset and it can only reached the fourth epoch in one day. What is the hardware configuration of your computer?

yu4u · 2017-07-19T18:13:44Z

I trained on GPU: CPU: i7-7700 3.60GHz, GPU: GeForce GTX1080.
Training requires 1-2 hours for imdb and 6 minutes for wiki.

If the problem is memory allocation, please try smaller model and smaller batch size:

python3 train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 16

If the image size is 64, the number of parameters can also be reduced by changing

pool = AveragePooling2D(pool_size=(8, 8), strides=(1, 1), padding="same")(relu)

to

pool = AveragePooling2D(pool_size=(16, 16), strides=(1, 1), padding="same")(relu)

galoiscch · 2017-07-20T02:52:25Z

The Theano weight works well.

galoiscch · 2017-07-21T01:43:26Z

After running the command
python train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 32
,I can run the training program with tensorflow with GPU binding.
Howeven, when I test the new weight file, folloing error appears,

Using TensorFlow backend.
Traceback (most recent call last):
  File "demo.py", line 97, in <module>
    main()
  File "demo.py", line 25, in main
    model.load_weights(os.path.join("pretrained_models", "weights.15-4.02.hdf5"))
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2572, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2981, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 19 layers into a model with 31 layers.

I wonder if it is due to the change I made in the command line. Thanks.
I didn't change the number of parameters.

galoiscch · 2017-07-21T01:48:21Z

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

yu4u · 2017-07-21T17:58:02Z

demo.py is just a demo script, which assumes to use the pre-trained model as you can see:

model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

But I added demo.py options to identify the weight file, depth, and width parameters. Please refer to the latest version of demo.py.

yu4u · 2017-07-21T18:00:18Z

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

These options --depth 10 --width 4 control the number of parameters used in the CNN, thus it is natural that the size of the weight file changes.

galoiscch · 2017-07-24T02:26:56Z

Much obliged. I can run the demo.py with my weight file now.

sbharadwajj · 2018-10-15T07:10:57Z

Hi, did you run with tensorflow backend using GPU?

galoiscch · 2018-10-16T13:45:11Z

I think I tried running the program with tensorflow backend using GPU, but it failed. It has been a long time and my memory on this project became quite rusty. I am sorry about that.

sbharadwajj · 2018-10-17T06:04:46Z

Thank you.
@yu4u do you run it on GPU? Do you have any suggestions on how to fix it for Gpu?

yu4u · 2018-10-17T16:42:08Z

I did not run demo.py on a machine with GPUs but I think it works.
Is there any problem?

sbharadwajj · 2018-10-17T16:43:41Z

Works perfectly with tensorflow-gpu 1.10.

nyck33 · 2019-09-11T12:21:05Z

@galoiscch

I'm running into memory issues when running this in a conda env with Pytorch GPU and Tensorflow GPU (detection done by Tencent DSFD and not dlib):
https://github.com/TencentYoutuResearch/FaceDetection-DSFD

So I want to use a shallower and narrower model. Can you provide the smaller weights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

galoiscch commented Jul 5, 2017

galoiscch commented Jul 6, 2017

galoiscch commented Jul 7, 2017

yu4u commented Jul 13, 2017

galoiscch commented Jul 17, 2017

yu4u commented Jul 18, 2017 •

edited

galoiscch commented Jul 18, 2017

galoiscch commented Jul 18, 2017

yu4u commented Jul 19, 2017

galoiscch commented Jul 20, 2017

galoiscch commented Jul 21, 2017

galoiscch commented Jul 21, 2017

yu4u commented Jul 21, 2017

yu4u commented Jul 21, 2017

galoiscch commented Jul 24, 2017

sbharadwajj commented Oct 15, 2018

galoiscch commented Oct 16, 2018

sbharadwajj commented Oct 17, 2018

yu4u commented Oct 17, 2018 •

edited

sbharadwajj commented Oct 17, 2018

nyck33 commented Sep 11, 2019 •

edited

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

Comments

galoiscch commented Jul 5, 2017

galoiscch commented Jul 6, 2017

galoiscch commented Jul 7, 2017

yu4u commented Jul 13, 2017

galoiscch commented Jul 17, 2017

yu4u commented Jul 18, 2017 • edited

galoiscch commented Jul 18, 2017

galoiscch commented Jul 18, 2017

yu4u commented Jul 19, 2017

galoiscch commented Jul 20, 2017

galoiscch commented Jul 21, 2017

galoiscch commented Jul 21, 2017

yu4u commented Jul 21, 2017

yu4u commented Jul 21, 2017

galoiscch commented Jul 24, 2017

sbharadwajj commented Oct 15, 2018

galoiscch commented Oct 16, 2018

sbharadwajj commented Oct 17, 2018

yu4u commented Oct 17, 2018 • edited

sbharadwajj commented Oct 17, 2018

nyck33 commented Sep 11, 2019 • edited

yu4u commented Jul 18, 2017 •

edited

yu4u commented Oct 17, 2018 •

edited

nyck33 commented Sep 11, 2019 •

edited