ubuntu@ubuntu:~/projects/tf_examples/mnist$ python convolutional.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:0f:00.0
Total memory: 1.99GiB
Free memory: 1.89GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0)
Initialized!
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
Aborted (core dumped)

ubuntu@ubuntu:~/projects/tf_examples/alexnet$ python alexnet_benchmark.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
conv1   [128, 56, 56, 64]
pool1   [128, 27, 27, 64]
conv2   [128, 27, 27, 192]
pool2   [128, 13, 13, 192]
conv3   [128, 13, 13, 384]
conv4   [128, 13, 13, 256]
conv5   [128, 13, 13, 256]
pool5   [128, 6, 6, 256]
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:0f:00.0
Total memory: 1.99GiB
Free memory: 1.89GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0)
2016-06-14 10:31:35.247875: step 10, duration = 0.119
2016-06-14 10:31:36.442826: step 20, duration = 0.119
2016-06-14 10:31:37.631731: step 30, duration = 0.119
2016-06-14 10:31:38.826521: step 40, duration = 0.121
2016-06-14 10:31:40.019796: step 50, duration = 0.119
2016-06-14 10:31:41.216240: step 60, duration = 0.120
2016-06-14 10:31:42.406382: step 70, duration = 0.119
2016-06-14 10:31:43.602085: step 80, duration = 0.119
2016-06-14 10:31:44.793611: step 90, duration = 0.120
2016-06-14 10:31:45.867695: Forward across 100 steps, 0.118 +/- 0.012 sec / batch
2016-06-14 10:31:55.405647: step 10, duration = 0.349
2016-06-14 10:31:58.908079: step 20, duration = 0.350
2016-06-14 10:32:02.417997: step 30, duration = 0.348
2016-06-14 10:32:05.920378: step 40, duration = 0.352
2016-06-14 10:32:09.420454: step 50, duration = 0.349
2016-06-14 10:32:12.922577: step 60, duration = 0.351
2016-06-14 10:32:16.427018: step 70, duration = 0.353
2016-06-14 10:32:19.929480: step 80, duration = 0.353
2016-06-14 10:32:23.436328: step 90, duration = 0.358
2016-06-14 10:32:26.595217: Forward-backward across 100 steps, 0.347 +/- 0.035 sec / batch


ubuntu@ubuntu:~/projects/tf_examples/imagenet$ python classify_image.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>> Downloading inception-2015-12-05.tgz 100.0%
Succesfully downloaded inception-2015-12-05.tgz 88931400 bytes.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:0f:00.0
Total memory: 1.99GiB
Free memory: 1.89GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0)
W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
custard apple (score = 0.00141)
earthstar (score = 0.00107)

ubuntu@ubuntu:~/projects/tf_examples/cifar10$ python cifar10_multi_gpu_train.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:0f:00.0
Total memory: 1.99GiB
Free memory: 1.89GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:0f:00.0)
2016-06-14 13:11:45.038193: step 0, loss = 4.68 (4.2 examples/sec; 30.480 sec/batch)
2016-06-14 13:11:48.198885: step 10, loss = 4.66 (610.0 examples/sec; 0.210 sec/batch)
2016-06-14 13:11:50.423817: step 20, loss = 4.64 (672.0 examples/sec; 0.190 sec/batch)
2016-06-14 13:11:52.543108: step 30, loss = 4.62 (581.4 examples/sec; 0.220 sec/batch)
2016-06-14 13:11:54.715308: step 40, loss = 4.60 (609.9 examples/sec; 0.210 sec/batch)
2016-06-14 13:11:56.863078: step 50, loss = 4.58 (653.9 examples/sec; 0.196 sec/batch)
2016-06-14 13:11:59.099276: step 60, loss = 4.57 (566.5 examples/sec; 0.226 sec/batch)