Skip to content

Commit

Permalink
Merge branch 'master' into lr-warmup-gammainit
Browse files Browse the repository at this point in the history
  • Loading branch information
hma02 committed Aug 2, 2017
2 parents 80580c0 + 645d183 commit f8e9c9a
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 24 deletions.
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,20 @@ More examples can be found [here](https://github.com/uoguelph-mlrg/Theano-MPI/tr

## Example Performance

###BSP tested on up to eight Tesla K80 GPUs
Time per 5120 images in seconds: [allow_gc = True]
### BSP tested on up to eight Tesla K80 GPUs

Training (+communication) time per 5120 images in seconds: [allow_gc = True, using nccl32 on [copper](https://www.sharcnet.ca/my/systems/show/108)]

| Model | 1GPU | 2GPU | 4GPU | 8GPU |
| :---: | :---: | :---: | :---: | :---: |
| AlexNet-128b | 20.50 | 10.35+0.78 | 5.13+0.54 | 2.63+0.61 |
| GoogLeNet-32b | 63.89 | 31.40+1.00 | 15.51+0.71 | 7.69+0.80 |
| VGG16-16b | 358.29 | 176.08+13.90 | 90.44+9.28 | 55.12+12.59 |
| VGG16-32b | 343.37 | 169.12+7.14 | 86.97++4.80 | 43.29+5.41 |
| VGG16-32b | 343.37 | 169.12+7.14 | 86.97+4.80 | 43.29+5.41 |
| ResNet50-64b | 163.15 | 80.09+0.81 | 40.25+0.56 | 20.12+0.57 |

More details on the benchmark can be found in this [notebook](https://github.com/uoguelph-mlrg/Theano-MPI/blob/master/examples/speedup-n_workers.ipynb).

<img src=https://github.com/uoguelph-mlrg/Theano-MPI/raw/master/show/val_a.png width=500/>
<img src=https://github.com/uoguelph-mlrg/Theano-MPI/raw/master/show/val_g.png width=500/>

Expand All @@ -123,7 +127,7 @@ Time per 5120 images in seconds: [allow_gc = True]

* You may want to use those helper functions in `/theanompi/lib/opt.py` to construct optimizers in order to avoid common pitfalls mentioned in (#22) and get better convergence.

* To get the best running speed performance, the memory cache may need to be cleaned before running.
* Binding cores according to your NUMA topology may give better performance. Try the `-bind` option with the launcher (needs [hwloc](https://www.open-mpi.org/projects/hwloc/) depedency).

* Binding cores according to your NUMA topology may give better performance. Try the `-bind` option with the launcher (needs [hwloc](https://www.open-mpi.org/projects/hwloc/) depedency).

Expand Down
27 changes: 14 additions & 13 deletions examples/speedup-n_workers.ipynb

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion theanompi/models/alex_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,6 @@ def val_iter(self, count,recorder):

else:


img_mean = self.data.rawdata[4]
img_std = self.data.rawdata[5]
import hickle as hkl
Expand Down
1 change: 0 additions & 1 deletion theanompi/models/googlenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -866,7 +866,6 @@ def val_iter(self, count,recorder):

else:


img_mean = self.data.rawdata[4]
img_std = self.data.rawdata[5]
import hickle as hkl
Expand Down
3 changes: 1 addition & 2 deletions theanompi/models/lasagne_model_zoo/resnet50.py
Original file line number Diff line number Diff line change
Expand Up @@ -597,8 +597,7 @@ def val_iter(self, count,recorder):


else:



img_mean = self.data.rawdata[4]
img_std = self.data.rawdata[5]

Expand Down
5 changes: 2 additions & 3 deletions theanompi/models/lasagne_model_zoo/vgg16.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ def train_iter(self, count,recorder):


else:

img_mean = self.data.rawdata[4]
img_std = self.data.rawdata[5]
import hickle as hkl
Expand Down Expand Up @@ -492,8 +492,7 @@ def val_iter(self, count,recorder):


else:



img_mean = self.data.rawdata[4]
img_std = self.data.rawdata[5]
import hickle as hkl
Expand Down

0 comments on commit f8e9c9a

Please sign in to comment.