# GPU x CPU 
### Teste de performance utilizando Tensorflow
Código criado para os testes do artigo **GPU for Deep Learning** em [vitormeriat.com.br/2018/09/27/gpu-for-deeplearning](http://www.vitormeriat.com.br/2018/09/27/gpu-for-deeplearning/).

In [1]:
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

### Verificando a GPU

In [2]:
!lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2596.994
BogoMIPS:              5193.98
Hypervisor vendor:     Microsoft
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-5
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c r

In [3]:
!lspci
#!lspci -v | grep "3D" -A 12

0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
e0e1:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)


In [4]:
from tensorflow.python.client import device_lib

tf.test.is_gpu_available()

True

In [5]:
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 668097085573383147, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 11281927373
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 3535949983722528147
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: e0e1:00:00.0, compute capability: 3.7"]

In [6]:
!nvidia-smi

Sat Sep 29 21:14:49 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 0000E0E1:00:00.0 Off |                    0 |
| N/A   42C    P0    71W / 149W |  10877MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

### Definição do shape utilizado para criação da matriz

In [11]:
#shapeMtx = 50
#shapeMtx = 10000
shapeMtx = 20000

# Numpy CPU

In [15]:
%%time

print("Numpy Version:", np.__version__, '\n')

startTime = datetime.now()
A=np.random.normal(size=(shapeMtx, shapeMtx))
B=np.random.normal(size=(shapeMtx, shapeMtx))

subtract = np.subtract(A, B)
print("Subtract time:\t", datetime.now() - startTime)
add = np.divide(A, B)
print("Divide time:\t", datetime.now() - startTime)
multiply = np.multiply(A, B)
print("Multiply time:\t", datetime.now() - startTime)

print("\nShape:", shapeMtx, '\n')

Numpy Version: 1.13.3 

Subtract time:	 0:00:09.057307
Divide time:	 0:00:09.451058
Multiply time:	 0:00:09.791327

Shape: 10000 

CPU times: user 9.06 s, sys: 740 ms, total: 9.8 s
Wall time: 9.79 s


## Tensorflow CPU

In [16]:
%%time

device_name = "/cpu:0"

shape = (shapeMtx, shapeMtx)

print("Tensorflow Version:", tf.__version__, '\n')

with tf.device(device_name):
    A = tf.random_uniform(shape=shape, minval=0, maxval=1)
    B = tf.random_uniform(shape=shape, minval=0, maxval=1)
    
    subtract_operation = tf.subtract(A, B)
    divide_operation = tf.divide(A, B)
    multiply_operation = tf.multiply(A, B)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as session:
    session.run(subtract_operation)
    print("subtract time:\t", datetime.now() - startTime)
    session.run(divide_operation)
    print("divide time:\t", datetime.now() - startTime)
    session.run(multiply_operation)
    print("multiply time:\t", datetime.now() - startTime)

print("\nShape:", shape, "Device:", device_name, '\n')

Tensorflow Version: 1.10.0 

subtract time:	 0:00:00.268926
divide time:	 0:00:00.536718
multiply time:	 0:00:00.800083

Shape: (10000, 10000) Device: /cpu:0 

CPU times: user 3.87 s, sys: 524 ms, total: 4.4 s
Wall time: 816 ms


## Tensorflow GPU

In [17]:
%%time

device_name = "/gpu:0"

shape = (shapeMtx, shapeMtx)

print("Tensorflow Version:", tf.__version__, '\n')

with tf.device(device_name):
    A = tf.random_uniform(shape=shape, minval=0, maxval=1)
    B = tf.random_uniform(shape=shape, minval=0, maxval=1)
    
    subtract_operation = tf.subtract(A, B)
    divide_operation = tf.divide(A, B)
    multiply_operation = tf.multiply(A, B)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as session:
    session.run(subtract_operation)
    print("subtract time:\t", datetime.now() - startTime)
    session.run(divide_operation)
    print("divide time:\t", datetime.now() - startTime)
    session.run(multiply_operation)
    print("multiply time:\t", datetime.now() - startTime)

print("\nShape:", shape, "Device:", device_name, '\n')

Tensorflow Version: 1.10.0 

subtract time:	 0:00:00.089966
divide time:	 0:00:00.175842
multiply time:	 0:00:00.261440

Shape: (10000, 10000) Device: /gpu:0 

CPU times: user 187 ms, sys: 108 ms, total: 295 ms
Wall time: 273 ms
