# Tesla V100-DGXS-32Gb: test FP32 vs. FP16

#### Pay attention: Same data for FP32 and FP16, but the batch size is doubled in the latter case.

TL;DR version:

FP32: bs=105, vram occupation peaks at `11623` Mb **(see NOTE)**, wall time: `1:04`, final val_loss `0.023873`

FP16: bs=210, vram occupation peaks at `11782` Mb **(see NOTE)**, wall time: `0:51`, final val_loss `0.053197`

*NOTE: `ipyexeriments` fails to correctly report vram occupation in the V100 case. Data about vram occupation were visually monitored with `gpustat`

---
The 1080ti achieved:

FP32: bs=105, vram occupation peaks at `10074` Mb, wall time: `1:45`, final val_loss `0.020775`

FP16: bs=210, vram occupation peaks at `10162` Mb, wall time: `1:37`, final val_loss `0.056734`

---

In [1]:
%matplotlib inline

from fastai import *
from fastai.vision import *
import re
import scipy.ndimage
from ipyexperiments import *
import fastai
fastai.__version__

torch.cuda.set_device(3)
path    = Path('/raid/data/DATASET3/TESTVALID/')
fp32exp = IPyExperimentsPytorch()


*** Experiment started with the Pytorch backend
Device: ID 3, Tesla V100-DGXS-32GB (32478 RAM)


*** Current state:
RAM:     Used     Free    Total        Util
CPU:    2,447  230,583  257,868 MB   0.95% 
GPU:      953   31,524   32,478 MB   2.94% 


･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:00.000
･ CPU:          0          0      2,448 MB |
･ GPU:          0        491        953 MB |


In [2]:
path = Path('/raid/data/DATASET3/TESTVALID/')
bs=105
data = ImageDataBunch.from_folder(path,
                                  train='backup224',
                                  valid='backup224valid',
                                  size=224, bs=bs,
                                  ).normalize(imagenet_stats)

･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:00.782
･ CPU:          3          2      2,546 MB |
･ GPU:          0        491        953 MB |


In [3]:
learn = create_cnn(data, models.resnet50, metrics=accuracy)

･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:01.287
･ CPU:          0          0      2,658 MB |
･ GPU:        106        385      1,059 MB |


In [4]:
%%time
learn.fit_one_cycle(5)

epoch,train_loss,valid_loss,accuracy
1,1.088248,0.447044,0.864670
2,0.667350,0.104905,0.976876
3,0.402612,0.045042,0.991660
4,0.255537,0.027617,0.996588
5,0.171024,0.023873,0.997346


CPU times: user 31.9 s, sys: 41.9 s, total: 1min 13s
Wall time: 1min 4s
･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:01:04.826
･ CPU:          0          0      2,702 MB |
･ GPU:        300         85      1,359 MB |


### The Kernel is now restarted.

In [1]:
%matplotlib inline

from fastai import *
from fastai.vision import *
import re
import scipy.ndimage
from ipyexperiments import *
import fastai
fastai.__version__

torch.cuda.set_device(3)
path    = Path('/raid/data/DATASET3/TESTVALID/')
fp32exp = IPyExperimentsPytorch()


*** Experiment started with the Pytorch backend
Device: ID 3, Tesla V100-DGXS-32GB (32478 RAM)


*** Current state:
RAM:     Used     Free    Total        Util
CPU:    2,419  230,608  257,868 MB   0.94% 
GPU:      953   31,524   32,478 MB   2.94% 


･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:00.000
･ CPU:          0          0      2,421 MB |
･ GPU:          0        491        953 MB |


In [2]:
path = Path('/raid/data/DATASET3/TESTVALID/')
bs=210
data = ImageDataBunch.from_folder(path,
                                  train='backup224',
                                  valid='backup224valid',
                                  size=224, bs=bs,
                                  ).normalize(imagenet_stats)

･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:01.405
･ CPU:          3          2      2,608 MB |
･ GPU:          0        491        953 MB |


In [3]:
learn = create_cnn(data, models.resnet50, metrics=accuracy).to_fp16()

･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:01.144
･ CPU:          0          0      2,639 MB |
･ GPU:         82        409      1,035 MB |


In [4]:
%%time
learn.fit_one_cycle(5)

epoch,train_loss,valid_loss,accuracy
1,1.159473,0.532623,0.854435
2,0.839299,0.227999,0.952616
3,0.592292,0.089819,0.981425
4,0.429999,0.057339,0.990523
5,0.327299,0.053197,0.992419


CPU times: user 17.3 s, sys: 32 s, total: 49.3 s
Wall time: 51.8 s
･ RAM:  △Consumed    △Peaked    Used Total | Exec time 0:00:51.770
･ CPU:          0          0      2,684 MB |
･ GPU:        662       -252      1,697 MB |
