diff --git a/docs/tutorial/inference.md b/docs/tutorial/inference.md
index b135a83a3..dbeda7d93 100644
--- a/docs/tutorial/inference.md
+++ b/docs/tutorial/inference.md
@@ -14,7 +14,7 @@ There are two ways to do inference during training.
"evaluate some tensors for each input, and aggregate the results in the end".
You can use the `InferenceRunner` interface with some `Inferencer`.
This will further support prefetch & data-parallel inference.
-
+
Currently this lacks documentation, but you can refer to examples
that uses `InferenceRunner` or custom `Inferencer` to learn more.
@@ -55,8 +55,8 @@ predictor = OfflinePredictor(pred_config)
output1_array, output2_array = predictor(input1_array, input2_array)
```
-It's __common to use a different graph for inference__,
-e.g., use NHWC format, support encoded image format, etc.
+It's __common to use a different graph for inference__,
+e.g., use NHWC format, support encoded image format, etc.
You can make these changes inside the `model` or `tower_func` in your `PredictConfig`.
The example in [examples/basics/export-model.py](../examples/basics/export-model.py) demonstrates such an altered inference graph.
@@ -90,7 +90,7 @@ you can also save your models into other formats after training, so it may be mo
- Removes all unnecessary operations (training-only ops, e.g., learning-rate) to compress the graph.
This creates a self-contained graph which includes all necessary information to run inference.
-
+
To load the saved graph, you can simply:
```python
graph_def = tf.GraphDef()
@@ -116,7 +116,7 @@ training:
1. The model (the graph): you've already written it yourself with TF symbolic functions.
Nothing about it is related to the tensorpack interface.
- If you use tensorpack layers, they are mainly just wrappers around `tf.layers`.
+ If you use tensorpack layers, they are not so different from `tf.layers`.
2. The trained parameters: tensorpack saves them in standard TF checkpoint format.
Nothing about the format is related to tensorpack.
@@ -139,14 +139,16 @@ with TowerContext('', is_training=False):
```eval_rst
.. note:: **Do not use metagraph for inference!**
- Metagraph is the wrong abstraction for a "model".
+ Tensorpack saves a metagraph during training. Users should not try to load it for inference.
+
+ Metagraph is the wrong abstraction for a "model".
It stores the entire graph which contains not only the mathematical model, but also all the
training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
Therefore it is usually wrong to import a training metagraph for inference.
- It's especially error-prone to load a metagraph on top of a non-empty graph.
- The potential name conflicts between the current graph and the nodes in the
- metagraph can lead to esoteric bugs or sometimes completely ruin the model.
+ It's especially error-prone to load a metagraph on top of a non-empty graph.
+ The potential name conflicts between the current graph and the nodes in the
+ metagraph can lead to esoteric bugs or sometimes completely ruin the model.
It's also very common to change the graph for inference.
For example, you may need a different data layout for CPU inference,
@@ -161,7 +163,7 @@ with TowerContext('', is_training=False):
You can just use `tf.train.Saver` for all the work.
Alternatively, use tensorpack's `get_model_loader(path).init(tf.get_default_session())`
-Now, you've already built a graph for inference, and the checkpoint is also loaded.
+Now, you've already built a graph for inference, and the checkpoint is also loaded.
You may now:
1. use `sess.run` to do inference
diff --git a/examples/A3C-Gym/README.md b/examples/A3C-Gym/README.md
index ae5b0af44..6586666ef 100644
--- a/examples/A3C-Gym/README.md
+++ b/examples/A3C-Gym/README.md
@@ -28,7 +28,7 @@ Some practicical notes:
### To test a model:
-Download models from [model zoo](http://models.tensorpack.com/OpenAIGym/).
+Download models from [model zoo](http://models.tensorpack.com/#OpenAIGym).
Watch the agent play:
`./train-atari.py --task play --env Breakout-v0 --load Breakout-v0.npz`
diff --git a/examples/CaffeModels/README.md b/examples/CaffeModels/README.md
index c847ec77a..2e5a9cf0a 100644
--- a/examples/CaffeModels/README.md
+++ b/examples/CaffeModels/README.md
@@ -1,8 +1,7 @@
Example code to convert, load and run inference of some Caffe models.
Require caffe python bindings to be installed.
-Converted models can also be found at [tensorpack model zoo](http://models.tensorpack.com).
-
+Converted models can also be found at [tensorpack model zoo](http://models.tensorpack.com/#Caffe-Converted).
## AlexNet:
Download: https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet
diff --git a/examples/DoReFa-Net/README.md b/examples/DoReFa-Net/README.md
index 8f0e46faf..abc52f8fe 100644
--- a/examples/DoReFa-Net/README.md
+++ b/examples/DoReFa-Net/README.md
@@ -46,7 +46,7 @@ In this implementation, quantized operations are all performed through `tf.float
+ Look at the docstring in `*-dorefa.py` to see detailed usage and performance.
Pretrained model for (1,4,32)-ResNet18 and several AlexNet are available at
-[tensorpack model zoo](http://models.tensorpack.com/DoReFa-Net/).
+[tensorpack model zoo](http://models.tensorpack.com/#DoReFa-Net).
They're provided in the format of numpy dictionary.
The __binary-weight 4-bit-activation ResNet-18__ model has 59.2% top-1 validation accuracy.
diff --git a/examples/FasterRCNN/README.md b/examples/FasterRCNN/README.md
index fb2b8fefb..792b69ecd 100644
--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -18,8 +18,8 @@ This is likely the best-performing open source TensorFlow reimplementation of th
## Dependencies
+ Python 3.3+; OpenCV
+ TensorFlow ≥ 1.6
-+ pycocotools: `for i in cython 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'; do pip install $i; done`
-+ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/)
++ pycocotools/scipy: `for i in cython 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' scipy; do pip install $i; done`
++ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/#FasterRCNN)
from tensorpack model zoo
+ [COCO data](http://cocodataset.org/#download). It needs to have the following directory structure:
```
@@ -83,24 +83,24 @@ prediction have to be run with the corresponding configs used in training.
These models are trained on train2017 and evaluated on val2017 using mAP@IoU=0.50:0.95.
Unless otherwise noted, all models are fine-tuned from ImageNet pre-trained R50/R101 models in
-[tensorpack model zoo](http://models.tensorpack.com/FasterRCNN/),
+[tensorpack model zoo](http://models.tensorpack.com/#FasterRCNN),
using 8 NVIDIA V100s.
Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be reproduced.
| Backbone | mAP
(box;mask) | Detectron mAP [1](#ft1)
(box;mask) | Time
(on 8 V100s) | Configurations
(click to expand) |
| - | - | - | - | - |
- | R50-C4 | 34.1 | | 7.5h | super quick
`MODE_MASK=False FRCNN.BATCH_PER_IM=64`
`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`
`TRAIN.LR_SCHEDULE=[140000,180000,200000]` |
- | R50-C4 | 35.6 | 34.8 | 23h | standard
`MODE_MASK=False` |
- | R50-FPN | 37.5 | 36.7 | 11h | standard
`MODE_MASK=False MODE_FPN=True` |
- | R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23.5h | standard
this is the default, no changes in config needed |
- | R50-FPN | 38.2;34.8 | 37.7;33.9 | 13.5h | standard
`MODE_FPN=True` |
- | R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 25h | 2x
`MODE_FPN=True`
`TRAIN.LR_SCHEDULE=2x` |
- | R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 31h | 2x+GN
`MODE_FPN=True`
`FPN.NORM=GN BACKBONE.NORM=GN`
`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`
`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`
`TRAIN.LR_SCHEDULE=2x` |
- | R50-FPN | 41.7;36.2 | | 17h | +Cascade
`MODE_FPN=True FPN.CASCADE=True` |
- | R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 28h | standard
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` |
- | R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 18h | standard
`MODE_FPN=True`
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` |
- | R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] [2](#ft2) | | 69h | 3x+Cascade+TrainAug
`MODE_FPN=True FPN.CASCADE=True`
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`
`TEST.RESULT_SCORE_THRESH=1e-4`
`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`
`TRAIN.LR_SCHEDULE=3x` |
+ | R50-C4 | 34.1 | | 7h | super quick
`MODE_MASK=False FRCNN.BATCH_PER_IM=64`
`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`
`TRAIN.LR_SCHEDULE=[140000,180000,200000]` |
+ | R50-C4 | 35.6 | 34.8 | 22.5h | standard
`MODE_MASK=False` |
+ | R50-FPN | 37.5 | 36.7 | 10.5h | standard
`MODE_MASK=False MODE_FPN=True` |
+ | R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23h | standard
this is the default, no changes in config needed |
+ | R50-FPN | 38.2;34.8 | 37.7;33.9 | 12.5h | standard
`MODE_FPN=True` |
+ | R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 24h | 2x
`MODE_FPN=True`
`TRAIN.LR_SCHEDULE=2x` |
+ | R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 29h | 2x+GN
`MODE_FPN=True`
`FPN.NORM=GN BACKBONE.NORM=GN`
`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`
`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`
`TRAIN.LR_SCHEDULE=2x` |
+ | R50-FPN | 41.7;36.2 | | 16h | +Cascade
`MODE_FPN=True FPN.CASCADE=True` |
+ | R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 27h | standard
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` |
+ | R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 17h | standard
`MODE_FPN=True`
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` |
+ | R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] [2](#ft2) | | 64h | 3x+Cascade+TrainAug
`MODE_FPN=True FPN.CASCADE=True`
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`
`TEST.RESULT_SCORE_THRESH=1e-4`
`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`
`TRAIN.LR_SCHEDULE=3x` |
| R101-FPN-GN
(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] [3](#ft3) | 47.4;40.5 | 28h (on 64 V100s) | 9x+GN+Cascade+TrainAug
`MODE_FPN=True FPN.CASCADE=True`
`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`
`FPN.NORM=GN BACKBONE.NORM=GN`
`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`
`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`
`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`
`TRAIN.LR_SCHEDULE=9x`
`BACKBONE.FREEZE_AT=0` |
[R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
diff --git a/examples/FasterRCNN/config.py b/examples/FasterRCNN/config.py
index 70ad7ae3f..508106c40 100644
--- a/examples/FasterRCNN/config.py
+++ b/examples/FasterRCNN/config.py
@@ -140,7 +140,7 @@ def __ne__(self, _):
# Therefore, there is *no need* to modify the config if you only change the number of GPUs.
_C.TRAIN.LR_SCHEDULE = "1x" # "1x" schedule in detectron
-_C.TRAIN.EVAL_PERIOD = 25 # period (epochs) to run evaluation
+_C.TRAIN.EVAL_PERIOD = 50 # period (epochs) to run evaluation
_C.TRAIN.CHECKPOINT_PERIOD = 20 # period (epochs) to save model
# preprocessing --------------------
diff --git a/examples/GAN/BEGAN.py b/examples/GAN/BEGAN.py
index d0ade4d21..2a76285dc 100755
--- a/examples/GAN/BEGAN.py
+++ b/examples/GAN/BEGAN.py
@@ -17,7 +17,7 @@
Boundary Equilibrium GAN.
See the docstring in DCGAN.py for usage.
-A pretrained model on CelebA is at http://models.tensorpack.com/GAN/
+A pretrained model on CelebA is at http://models.tensorpack.com/#GAN
"""
diff --git a/examples/GAN/DCGAN.py b/examples/GAN/DCGAN.py
index a558e645c..56a43aa74 100755
--- a/examples/GAN/DCGAN.py
+++ b/examples/GAN/DCGAN.py
@@ -30,7 +30,7 @@
You can also train on other images (just use any directory of jpg files in
`--data`). But you may need to change the preprocessing.
-A pretrained model on CelebA is at http://models.tensorpack.com/GAN/
+A pretrained model on CelebA is at http://models.tensorpack.com/#GAN
"""
diff --git a/examples/GAN/InfoGAN-mnist.py b/examples/GAN/InfoGAN-mnist.py
index 6302850ba..1b993a7c9 100755
--- a/examples/GAN/InfoGAN-mnist.py
+++ b/examples/GAN/InfoGAN-mnist.py
@@ -24,7 +24,7 @@
To visualize:
./InfoGAN-mnist.py --sample --load path/to/model
-A pretrained model is at http://models.tensorpack.com/GAN/
+A pretrained model is at http://models.tensorpack.com/#GAN
"""
BATCH = 128
diff --git a/examples/HED/README.md b/examples/HED/README.md
index ed415496d..cc51fdc9f 100644
--- a/examples/HED/README.md
+++ b/examples/HED/README.md
@@ -33,4 +33,4 @@ To inference (produce a heatmap at each level at out*.png):
```bash
./hed.py --load pretrained.model --run a.jpg
```
-Models I trained can be downloaded [here](http://models.tensorpack.com/HED/).
+Models I trained can be downloaded [here](http://models.tensorpack.com/#HED).
diff --git a/examples/ImageNetModels/README.md b/examples/ImageNetModels/README.md
index 3ad4efb61..a6f070a1f 100644
--- a/examples/ImageNetModels/README.md
+++ b/examples/ImageNetModels/README.md
@@ -4,7 +4,7 @@ ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VG
To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`.
More options are available in `./{model}.py --help`.
Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12).
-Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
+Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/#ImageNetModels).
### ShuffleNet
diff --git a/examples/Saliency/README.md b/examples/Saliency/README.md
index 8e33f334a..30eb48b6c 100644
--- a/examples/Saliency/README.md
+++ b/examples/Saliency/README.md
@@ -39,7 +39,7 @@ Usage:
./CAM-resnet.py --data /path/to/imagenet [--load ImageNet-ResNet18-Preact.npz] [--gpu 0,1,2,3]
```
Pretrained and fine-tuned ResNet can be downloaded
-in the [model zoo](http://models.tensorpack.com/).
+in the [model zoo](http://models.tensorpack.com/#Visualization).
2. Generate CAM on ImageNet validation set:
```bash
diff --git a/examples/SpatialTransformer/README.md b/examples/SpatialTransformer/README.md
index 275bf9008..484e3dcc1 100644
--- a/examples/SpatialTransformer/README.md
+++ b/examples/SpatialTransformer/README.md
@@ -20,7 +20,7 @@ To train (takes about 300 epochs to reach 8.8% error):
./mnist-addition.py
```
-To draw the above visualization with [pretrained model](http://models.tensorpack.com/SpatialTransformer/):
+To draw the above visualization with [pretrained model](http://models.tensorpack.com/#SpatialTransformer):
```bash
./mnist-addition.py --load mnist-addition.npz --view
```
diff --git a/examples/SuperResolution/README.md b/examples/SuperResolution/README.md
index c1137d2b6..96c74e587 100644
--- a/examples/SuperResolution/README.md
+++ b/examples/SuperResolution/README.md
@@ -35,7 +35,7 @@ python enet-pat.py --vgg19 /path/to/vgg19.npz --data train2017.lmdb
Training is highly unstable and does not often give good results.
The pretrained model may also fail on different types of images.
-You can download and play with the pretrained model [here](http://models.tensorpack.com/SuperResolution/).
+You can download and play with the pretrained model [here](http://models.tensorpack.com/#SuperResolution).
3. Inference on an image and output in current directory:
diff --git a/tensorpack/graph_builder/training.py b/tensorpack/graph_builder/training.py
index a22be6484..a7280f84b 100644
--- a/tensorpack/graph_builder/training.py
+++ b/tensorpack/graph_builder/training.py
@@ -304,7 +304,7 @@ def build(self, grad_list, get_opt_fn):
with tf.name_scope('sync_variables'):
post_init_op = SyncMultiGPUReplicatedBuilder.get_post_init_ops()
else:
- post_init_op = tf.no_op(name='empty_sync_variables')
+ post_init_op = None
return train_op, post_init_op
# Adopt from https://github.com/tensorflow/benchmarks/blob/master/scripts/tf_cnn_benchmarks/variable_mgr.py
diff --git a/tensorpack/train/trainers.py b/tensorpack/train/trainers.py
index 476590fe5..0afd52f48 100644
--- a/tensorpack/train/trainers.py
+++ b/tensorpack/train/trainers.py
@@ -190,13 +190,16 @@ def _setup_graph(self, input, get_cost_fn, get_opt_fn):
grad_list = self._builder.call_for_each_tower(tower_fn)
self.train_op, post_init_op = self._builder.build(grad_list, get_opt_fn)
- cb = RunOp(
- post_init_op,
- run_before=True,
- run_as_trigger=self.BROADCAST_EVERY_EPOCH,
- verbose=True)
- cb.name_scope = "SyncVariables"
- return [cb]
+ if post_init_op is not None:
+ cb = RunOp(
+ post_init_op,
+ run_before=True,
+ run_as_trigger=self.BROADCAST_EVERY_EPOCH,
+ verbose=True)
+ cb.name_scope = "SyncVariables"
+ return [cb]
+ else:
+ return []
class DistributedTrainerBase(SingleCostTrainer):