ubuntu@ubuntu:~/projects/tf_examples/mnist$ python convolutional.py I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally Extracting data/train-images-idx3-ubyte.gz Extracting data/train-labels-idx1-ubyte.gz Extracting data/t10k-images-idx3-ubyte.gz Extracting data/t10k-labels-idx1-ubyte.gz I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:0f:00.0 Total memory: 1.99GiB Free memory: 1.89GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0) Initialized! E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1 Aborted (core dumped) ubuntu@ubuntu:~/projects/tf_examples/alexnet$ python alexnet_benchmark.py I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally conv1 [128, 56, 56, 64] pool1 [128, 27, 27, 64] conv2 [128, 27, 27, 192] pool2 [128, 13, 13, 192] conv3 [128, 13, 13, 384] conv4 [128, 13, 13, 256] conv5 [128, 13, 13, 256] pool5 [128, 6, 6, 256] I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:0f:00.0 Total memory: 1.99GiB Free memory: 1.89GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0) 2016-06-14 10:31:35.247875: step 10, duration = 0.119 2016-06-14 10:31:36.442826: step 20, duration = 0.119 2016-06-14 10:31:37.631731: step 30, duration = 0.119 2016-06-14 10:31:38.826521: step 40, duration = 0.121 2016-06-14 10:31:40.019796: step 50, duration = 0.119 2016-06-14 10:31:41.216240: step 60, duration = 0.120 2016-06-14 10:31:42.406382: step 70, duration = 0.119 2016-06-14 10:31:43.602085: step 80, duration = 0.119 2016-06-14 10:31:44.793611: step 90, duration = 0.120 2016-06-14 10:31:45.867695: Forward across 100 steps, 0.118 +/- 0.012 sec / batch 2016-06-14 10:31:55.405647: step 10, duration = 0.349 2016-06-14 10:31:58.908079: step 20, duration = 0.350 2016-06-14 10:32:02.417997: step 30, duration = 0.348 2016-06-14 10:32:05.920378: step 40, duration = 0.352 2016-06-14 10:32:09.420454: step 50, duration = 0.349 2016-06-14 10:32:12.922577: step 60, duration = 0.351 2016-06-14 10:32:16.427018: step 70, duration = 0.353 2016-06-14 10:32:19.929480: step 80, duration = 0.353 2016-06-14 10:32:23.436328: step 90, duration = 0.358 2016-06-14 10:32:26.595217: Forward-backward across 100 steps, 0.347 +/- 0.035 sec / batch ubuntu@ubuntu:~/projects/tf_examples/imagenet$ python classify_image.py I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally >> Downloading inception-2015-12-05.tgz 100.0% Succesfully downloaded inception-2015-12-05.tgz 88931400 bytes. I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:0f:00.0 Total memory: 1.99GiB Free memory: 1.89GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0) W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization(). W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264) custard apple (score = 0.00141) earthstar (score = 0.00107) ubuntu@ubuntu:~/projects/tf_examples/cifar10$ python cifar10_multi_gpu_train.py I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate (GHz) 1.137 pciBusID 0000:0f:00.0 Total memory: 1.99GiB Free memory: 1.89GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0) 2016-06-14 13:11:45.038193: step 0, loss = 4.68 (4.2 examples/sec; 30.480 sec/batch) 2016-06-14 13:11:48.198885: step 10, loss = 4.66 (610.0 examples/sec; 0.210 sec/batch) 2016-06-14 13:11:50.423817: step 20, loss = 4.64 (672.0 examples/sec; 0.190 sec/batch) 2016-06-14 13:11:52.543108: step 30, loss = 4.62 (581.4 examples/sec; 0.220 sec/batch) 2016-06-14 13:11:54.715308: step 40, loss = 4.60 (609.9 examples/sec; 0.210 sec/batch) 2016-06-14 13:11:56.863078: step 50, loss = 4.58 (653.9 examples/sec; 0.196 sec/batch) 2016-06-14 13:11:59.099276: step 60, loss = 4.57 (566.5 examples/sec; 0.226 sec/batch)