Permalink
Browse files

[DoReFa] more models

  • Loading branch information...
ppwwyyxx committed Aug 1, 2018
1 parent deaf1de commit 67e1ac5b1d8018207429c25e1cda34e3749212a2
@@ -81,5 +81,6 @@ To run distributed training, first install horovod properly, then refer to the
documentation of [HorovodTrainer](../modules/train.html#tensorpack.train.HorovodTrainer).
Tensorpack has implemented some other distributed trainers using TF's native API,
but TF's native support for distributed training isn't very high-performance even today.
but TensorFlow is not actively supporting its distributed training features, and
its native distributed performance isn't very good even today.
Therefore those trainers are not actively maintained and are not recommended for use.
@@ -13,18 +13,18 @@ Alternative link to this page: [http://dorefa.net](http://dorefa.net)
This is a good set of baselines for research in model quantization.
These quantization techniques, when applied on AlexNet, achieves the following ImageNet performance in this implementation:
| Model | Bit Width <br/> (weights, activations, gradients) | Top 1 Validation Error <sup>[1](#ft1)</sup> |
|:----------------------------------:|:-------------------------------------------------:|:-------------------------------------------------------------------------------:|
| Full Precision<sup>[2](#ft2)</sup> | 32,32,32 | 40.3% |
| TTQ | t,32,32 | 42.0% |
| BWN | 1,32,32 | 44.6% |
| BNN | 1,1,32 | 51.9% |
| DoReFa | 8,8,8 | 42.0% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-8,8,8.npz) |
| DoReFa | 1,2,32 | 46.6% |
| DoReFa | 1,2,6 | 46.8% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,2,6.npz) |
| DoReFa | 1,2,4 | 54.0% |
<a id="ft1">1</a>: These numbers were obtained by training on 8 GPUs with a total batch size of 256.
| Model | Bit Width <br/> (weights, activations, gradients) | Top 1 Validation Error <sup>[1](#ft1)</sup> |
|:----------------------------------:|:-------------------------------------------------:|:--------------------------------------------------------------------------------:|
| Full Precision<sup>[2](#ft2)</sup> | 32,32,32 | 40.3% |
| TTQ | t,32,32 | 42.0% |
| BWN | 1,32,32 | 44.3% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,32,32.npz) |
| BNN | 1,1,32 | 51.5% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,1,32.npz) |
| DoReFa | 8,8,8 | 42.0% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-8,8,8.npz) |
| DoReFa | 1,2,32 | 46.6% |
| DoReFa | 1,2,6 | 46.8% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,2,6.npz) |
| DoReFa | 1,2,4 | 54.0% |
<a id="ft1">1</a>: These numbers were obtained by training on 8 GPUs with a total batch size of 256 (otherwise the performance may become slightly different).
The DoReFa-Net models reach slightly better performance than our paper, due to
more sophisticated augmentations.
@@ -13,11 +13,13 @@
from tensorpack import *
from tensorpack.tfutils.summary import add_param_summary
from tensorpack.tfutils.sessinit import get_model_loader
from tensorpack.tfutils.varreplace import remap_variables
from tensorpack.dataflow import dataset
from tensorpack.utils.gpu import get_num_gpu
from imagenet_utils import get_imagenet_dataflow, fbresnet_augmentor, ImageNetModel
from imagenet_utils import (
get_imagenet_dataflow, fbresnet_augmentor, ImageNetModel, eval_on_ILSVRC12)
from dorefa import get_dorefa, ternarize
"""
@@ -199,6 +201,7 @@ def run_image(model, sess_init, inputs):
parser.add_argument('--dorefa', required=True,
help='number of bits for W,A,G, separated by comma. W="t" means TTQ')
parser.add_argument('--run', help='run on a list of images with the pretrained model', nargs='*')
parser.add_argument('--eval', action='store_true')
args = parser.parse_args()
dorefa = args.dorefa.split(',')
@@ -215,6 +218,11 @@ def run_image(model, sess_init, inputs):
assert args.load.endswith('.npz')
run_image(Model(), DictRestore(dict(np.load(args.load))), args.run)
sys.exit()
if args.eval:
BATCH_SIZE = 128
ds = get_data('val')
eval_on_ILSVRC12(Model(), get_model_loader(args.load), ds)
sys.exit()
nr_tower = max(get_num_gpu(), 1)
BATCH_SIZE = TOTAL_BATCH_SIZE // nr_tower
@@ -80,7 +80,7 @@ MaskRCNN results contain both box and mask mAP.
| R50-FPN | 39.8;35.5 | 39.5;34.4<sup>[2](#ft2)</sup> | 34h | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R50-FPN | 40.3;36.4 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-StandardGN.npz) | 40.3;35.7 | 44h | <details><summary>standard+GN</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` |
| R101-C4 | 41.7;35.5 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101C4-MaskRCNN-Standard.npz) | | 63h | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.9[:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101FPN-MaskRCNN-Standard.npz) | 40.9;36.4 | 40h | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.9 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101FPN-MaskRCNN-Standard.npz) | 40.9;36.4 | 40h | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
<a id="ft1">1</a>: This implementation has slightly different configurations from detectron (e.g. batch size).
@@ -102,8 +102,8 @@ def image_preprocess(image, bgr=True):
mean = mean[::-1]
std = std[::-1]
image_mean = tf.constant(mean, dtype=tf.float32)
image_std = tf.constant(std, dtype=tf.float32)
image = (image - image_mean) / image_std
image_invstd = tf.constant(1.0 / std, dtype=tf.float32)
image = (image - image_mean) * image_invstd
return image
@@ -81,7 +81,7 @@ def rpn_losses(anchor_labels, anchor_boxes, label_logits, box_logits):
add_moving_summary(*summaries)
# Per-level loss summaries in FPN may appear lower due to the use of a small placeholder.
# But the total loss is still the same. TODO make the summary op smarter
# But the total RPN loss will be fine. TODO make the summary op smarter
placeholder = 0.
label_loss = tf.nn.sigmoid_cross_entropy_with_logits(
labels=tf.to_float(valid_anchor_labels), logits=valid_label_logits)
@@ -3,7 +3,7 @@ ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VG
To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`.
Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12).
Pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
### ShuffleNet
@@ -35,11 +35,7 @@ accuracy after 100 epochs (21 hours on 2 V100s).
It also puts in tensorboard the first-layer filter visualizations similar to the paper.
See `./alexnet.py --help` for usage.
### Inception-BN, VGG16
This Inception-BN script reaches 27% single-crop validation error after 300k steps with 6 GPUs.
The training recipe is very different from the original paper because the paper
is a bit vague on these details.
### VGG16
This VGG16 script, when trained with 32x8 batch size, reaches the following
validation error after 100 epochs (30h with 8 P100s). This is the code for the VGG
@@ -53,6 +49,12 @@ See `./vgg16.py --help` for usage.
Note that the purpose of this experiment in the paper is not to claim GroupNorm is better
than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy.
### Inception-BN
This Inception-BN script reaches 27% single-crop validation error after 300k steps with 6 GPUs.
The training recipe is very different from the original paper because the paper
is a bit vague on these details.
### ResNet
See [ResNet examples](../ResNet). It includes variants like pre-activation
@@ -72,9 +72,10 @@ class DistributedParameterServerBuilder(DataParallelBuilder, DistributedBuilderB
`tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
However this implementation hasn't been well tested.
It probably still has issues in model saving, etc.
Also, TensorFlow team is not actively maintaining distributed training features.
Check :class:`HorovodTrainer` and
`ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
for faster distributed examples.
for better distributed training support.
Note:
1. Gradients are not averaged across workers, but applied to PS variables
@@ -143,10 +144,11 @@ class DistributedReplicatedBuilder(DataParallelBuilder, DistributedBuilderBase):
It is an equivalent of ``--variable_update=distributed_replicated`` in
`tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
Note that the performance of this trainer is still not satisfactory.
Note that the performance of this trainer is still not satisfactory,
and TensorFlow team is not actively maintaining distributed training features.
Check :class:`HorovodTrainer` and
`ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
for faster distributed examples.
for better distributed training support.
Note:
1. Gradients are not averaged across workers, but applied to PS variables
@@ -11,6 +11,7 @@
__all__ = []
# TODO should also describe model_variables
def describe_trainable_vars():
"""
Print a description of the current model parameters.

0 comments on commit 67e1ac5

Please sign in to comment.