diff --git a/docs/tutorial/inference.md b/docs/tutorial/inference.md
index b135a83a3..dbeda7d93 100644
--- a/docs/tutorial/inference.md
+++ b/docs/tutorial/inference.md
@@ -14,7 +14,7 @@ There are two ways to do inference during training.
 	"evaluate some tensors for each input, and aggregate the results in the end".
 	You can use the `InferenceRunner` interface with some `Inferencer`.
 	This will further support prefetch & data-parallel inference.
-	
+
     Currently this lacks documentation, but you can refer to examples
     that uses `InferenceRunner` or custom `Inferencer` to learn more.
 
@@ -55,8 +55,8 @@ predictor = OfflinePredictor(pred_config)
 output1_array, output2_array = predictor(input1_array, input2_array)
 ```
 
-It's __common to use a different graph for inference__, 
-e.g., use NHWC format, support encoded image format, etc. 
+It's __common to use a different graph for inference__,
+e.g., use NHWC format, support encoded image format, etc.
 You can make these changes inside the `model` or `tower_func` in your `PredictConfig`.
 The example in [examples/basics/export-model.py](../examples/basics/export-model.py) demonstrates such an altered inference graph.
 
@@ -90,7 +90,7 @@ you can also save your models into other formats after training, so it may be mo
    - Removes all unnecessary operations (training-only ops, e.g., learning-rate) to compress the graph.
 
    This creates a self-contained graph which includes all necessary information to run inference.
-   
+
    To load the saved graph, you can simply:
    ```python
    graph_def = tf.GraphDef()
@@ -116,7 +116,7 @@ training:
 
 1. The model (the graph): you've already written it yourself with TF symbolic functions.
    Nothing about it is related to the tensorpack interface.
-   If you use tensorpack layers, they are mainly just wrappers around `tf.layers`.
+   If you use tensorpack layers, they are not so different from `tf.layers`.
 
 2. The trained parameters: tensorpack saves them in standard TF checkpoint format.
    Nothing about the format is related to tensorpack.
@@ -139,14 +139,16 @@ with TowerContext('', is_training=False):
 ```eval_rst
 .. note:: **Do not use metagraph for inference!**
 
-	Metagraph is the wrong abstraction for a "model". 
+	Tensorpack saves a metagraph during training. Users should not try to load it for inference.
+
+	Metagraph is the wrong abstraction for a "model".
 	It stores the entire graph which contains not only the mathematical model, but also all the
 	training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
 	Therefore it is usually wrong to import a training metagraph for inference.
 
-    It's especially error-prone to load a metagraph on top of a non-empty graph.
-    The potential name conflicts between the current graph and the nodes in the
-    metagraph can lead to esoteric bugs or sometimes completely ruin the model.
+  It's especially error-prone to load a metagraph on top of a non-empty graph.
+  The potential name conflicts between the current graph and the nodes in the
+  metagraph can lead to esoteric bugs or sometimes completely ruin the model.
 
 	It's also very common to change the graph for inference.
 	For example, you may need a different data layout for CPU inference,
@@ -161,7 +163,7 @@ with TowerContext('', is_training=False):
 You can just use `tf.train.Saver` for all the work.
 Alternatively, use tensorpack's `get_model_loader(path).init(tf.get_default_session())`
 
-Now, you've already built a graph for inference, and the checkpoint is also loaded. 
+Now, you've already built a graph for inference, and the checkpoint is also loaded.
 You may now:
 
 1. use `sess.run` to do inference
diff --git a/examples/A3C-Gym/README.md b/examples/A3C-Gym/README.md
index ae5b0af44..6586666ef 100644
--- a/examples/A3C-Gym/README.md
+++ b/examples/A3C-Gym/README.md
@@ -28,7 +28,7 @@ Some practicical notes:
 
 ### To test a model:
 
-Download models from [model zoo](http://models.tensorpack.com/OpenAIGym/).
+Download models from [model zoo](http://models.tensorpack.com/#OpenAIGym).
 
 Watch the agent play:
 `./train-atari.py --task play --env Breakout-v0 --load Breakout-v0.npz`
diff --git a/examples/CaffeModels/README.md b/examples/CaffeModels/README.md
index c847ec77a..2e5a9cf0a 100644
--- a/examples/CaffeModels/README.md
+++ b/examples/CaffeModels/README.md
@@ -1,8 +1,7 @@
 
 Example code to convert, load and run inference of some Caffe models.
 Require caffe python bindings to be installed.
-Converted models can also be found at [tensorpack model zoo](http://models.tensorpack.com).
-
+Converted models can also be found at [tensorpack model zoo](http://models.tensorpack.com/#Caffe-Converted).
 ## AlexNet:
 
 Download: https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet
diff --git a/examples/DoReFa-Net/README.md b/examples/DoReFa-Net/README.md
index 8f0e46faf..abc52f8fe 100644
--- a/examples/DoReFa-Net/README.md
+++ b/examples/DoReFa-Net/README.md
@@ -46,7 +46,7 @@ In this implementation, quantized operations are all performed through `tf.float
 + Look at the docstring in `*-dorefa.py` to see detailed usage and performance.
 
 Pretrained model for (1,4,32)-ResNet18 and several AlexNet are available at
-[tensorpack model zoo](http://models.tensorpack.com/DoReFa-Net/).
+[tensorpack model zoo](http://models.tensorpack.com/#DoReFa-Net).
 They're provided in the format of numpy dictionary.
 The __binary-weight 4-bit-activation ResNet-18__ model has 59.2% top-1 validation accuracy.
 
diff --git a/examples/FasterRCNN/README.md b/examples/FasterRCNN/README.md
index fb2b8fefb..792b69ecd 100644
--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -18,8 +18,8 @@ This is likely the best-performing open source TensorFlow reimplementation of th
 ## Dependencies
 + Python 3.3+; OpenCV
 + TensorFlow ≥ 1.6
-+ pycocotools: `for i in cython 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'; do pip install $i; done`
-+ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/)
++ pycocotools/scipy: `for i in cython 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' scipy; do pip install $i; done`
++ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/#FasterRCNN)
   from tensorpack model zoo
 + [COCO data](http://cocodataset.org/#download). It needs to have the following directory structure:
 ```
@@ -83,24 +83,24 @@ prediction have to be run with the corresponding configs used in training.
 
 These models are trained on train2017 and evaluated on val2017 using mAP@IoU=0.50:0.95.
 Unless otherwise noted, all models are fine-tuned from ImageNet pre-trained R50/R101 models in
-[tensorpack model zoo](http://models.tensorpack.com/FasterRCNN/),
+[tensorpack model zoo](http://models.tensorpack.com/#FasterRCNN),
 using 8 NVIDIA V100s.
 
 Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be reproduced.
 
  | Backbone                       | mAP<br/>(box;mask)                                                      | Detectron mAP <sup>[1](#ft1)</sup><br/> (box;mask) | Time <br/>(on 8 V100s) | Configurations <br/> (click to expand)                                                                                                                                                                                                                                                                                                                                                 |
  | -                              | -                                                                       | -                                                  | -                      | -                                                                                                                                                                                                                                                                                                                                                                                      |
- | R50-C4                         | 34.1                                                                    |                                                    | 7.5h                   | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[140000,180000,200000]` </details>                                                                                                                                                                                 |
- | R50-C4                         | 35.6                                                                    | 34.8                                               | 23h                    | <details><summary>standard</summary>`MODE_MASK=False` </details>                                                                                                                                                                                                                                                                                                                       |
- | R50-FPN                        | 37.5                                                                    | 36.7                                               | 11h                    | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details>                                                                                                                                                                                                                                                                                                         |
- | R50-C4                         | 36.2;31.8 [:arrow_down:][R50C41x]                                       | 35.8;31.4                                          | 23.5h                  | <details><summary>standard</summary>this is the default, no changes in config needed </details>                                                                                                                                                                                                                                                                                        |
- | R50-FPN                        | 38.2;34.8                                                               | 37.7;33.9                                          | 13.5h                  | <details><summary>standard</summary>`MODE_FPN=True` </details>                                                                                                                                                                                                                                                                                                                         |
- | R50-FPN                        | 38.9;35.4 [:arrow_down:][R50FPN2x]                                      | 38.6;34.5                                          | 25h                    | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=2x` </details>                                                                                                                                                                                                                                                                                                    |
- | R50-FPN-GN                     | 40.4;36.3 [:arrow_down:][R50FPN2xGN]                                    | 40.3;35.7                                          | 31h                    | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x`                                                                                                                                                                 |
- | R50-FPN                        | 41.7;36.2                                                               |                                                    | 17h                    | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details>                                                                                                                                                                                                                                                                                                        |
- | R101-C4                        | 40.1;34.6 [:arrow_down:][R101C41x]                                      |                                                    | 28h                    | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                                                 |
- | R101-FPN                       | 40.7;36.8 [:arrow_down:][R101FPN1x]                                     | 40.0;35.9                                          | 18h                    | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                             |
- | R101-FPN                       | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup>          |                                                    | 69h                    | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details>                                                                                                                                    |
+ | R50-C4                         | 34.1                                                                    |                                                    | 7h                   | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[140000,180000,200000]` </details>                                                                                                                                                                                 |
+ | R50-C4                         | 35.6                                                                    | 34.8                                               | 22.5h                    | <details><summary>standard</summary>`MODE_MASK=False` </details>                                                                                                                                                                                                                                                                                                                       |
+ | R50-FPN                        | 37.5                                                                    | 36.7                                               | 10.5h                    | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details>                                                                                                                                                                                                                                                                                                         |
+ | R50-C4                         | 36.2;31.8 [:arrow_down:][R50C41x]                                       | 35.8;31.4                                          | 23h                  | <details><summary>standard</summary>this is the default, no changes in config needed </details>                                                                                                                                                                                                                                                                                        |
+ | R50-FPN                        | 38.2;34.8                                                               | 37.7;33.9                                          | 12.5h                  | <details><summary>standard</summary>`MODE_FPN=True` </details>                                                                                                                                                                                                                                                                                                                         |
+ | R50-FPN                        | 38.9;35.4 [:arrow_down:][R50FPN2x]                                      | 38.6;34.5                                          | 24h                    | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=2x` </details>                                                                                                                                                                                                                                                                                                    |
+ | R50-FPN-GN                     | 40.4;36.3 [:arrow_down:][R50FPN2xGN]                                    | 40.3;35.7                                          | 29h                    | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x`                                                                                                                                                                 |
+ | R50-FPN                        | 41.7;36.2                                                               |                                                    | 16h                    | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details>                                                                                                                                                                                                                                                                                                        |
+ | R101-C4                        | 40.1;34.6 [:arrow_down:][R101C41x]                                      |                                                    | 27h                    | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                                                 |
+ | R101-FPN                       | 40.7;36.8 [:arrow_down:][R101FPN1x]                                     | 40.0;35.9                                          | 17h                    | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                             |
+ | R101-FPN                       | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup>          |                                                    | 64h                    | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details>                                                                                                                                    |
  | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5                                          | 28h (on 64 V100s)      | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> |
 
  [R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
diff --git a/examples/FasterRCNN/config.py b/examples/FasterRCNN/config.py
index 70ad7ae3f..508106c40 100644
--- a/examples/FasterRCNN/config.py
+++ b/examples/FasterRCNN/config.py
@@ -140,7 +140,7 @@ def __ne__(self, _):
 # Therefore, there is *no need* to modify the config if you only change the number of GPUs.
 
 _C.TRAIN.LR_SCHEDULE = "1x"      # "1x" schedule in detectron
-_C.TRAIN.EVAL_PERIOD = 25  # period (epochs) to run evaluation
+_C.TRAIN.EVAL_PERIOD = 50  # period (epochs) to run evaluation
 _C.TRAIN.CHECKPOINT_PERIOD = 20  # period (epochs) to save model
 
 # preprocessing --------------------
diff --git a/examples/GAN/BEGAN.py b/examples/GAN/BEGAN.py
index d0ade4d21..2a76285dc 100755
--- a/examples/GAN/BEGAN.py
+++ b/examples/GAN/BEGAN.py
@@ -17,7 +17,7 @@
 Boundary Equilibrium GAN.
 See the docstring in DCGAN.py for usage.
 
-A pretrained model on CelebA is at http://models.tensorpack.com/GAN/
+A pretrained model on CelebA is at http://models.tensorpack.com/#GAN
 """
 
 
diff --git a/examples/GAN/DCGAN.py b/examples/GAN/DCGAN.py
index a558e645c..56a43aa74 100755
--- a/examples/GAN/DCGAN.py
+++ b/examples/GAN/DCGAN.py
@@ -30,7 +30,7 @@
 You can also train on other images (just use any directory of jpg files in
 `--data`). But you may need to change the preprocessing.
 
-A pretrained model on CelebA is at http://models.tensorpack.com/GAN/
+A pretrained model on CelebA is at http://models.tensorpack.com/#GAN
 """
 
 
diff --git a/examples/GAN/InfoGAN-mnist.py b/examples/GAN/InfoGAN-mnist.py
index 6302850ba..1b993a7c9 100755
--- a/examples/GAN/InfoGAN-mnist.py
+++ b/examples/GAN/InfoGAN-mnist.py
@@ -24,7 +24,7 @@
 To visualize:
     ./InfoGAN-mnist.py --sample --load path/to/model
 
-A pretrained model is at http://models.tensorpack.com/GAN/
+A pretrained model is at http://models.tensorpack.com/#GAN
 """
 
 BATCH = 128
diff --git a/examples/HED/README.md b/examples/HED/README.md
index ed415496d..cc51fdc9f 100644
--- a/examples/HED/README.md
+++ b/examples/HED/README.md
@@ -33,4 +33,4 @@ To inference (produce a heatmap at each level at out*.png):
 ```bash
 ./hed.py --load pretrained.model --run a.jpg
 ```
-Models I trained can be downloaded [here](http://models.tensorpack.com/HED/).
+Models I trained can be downloaded [here](http://models.tensorpack.com/#HED).
diff --git a/examples/ImageNetModels/README.md b/examples/ImageNetModels/README.md
index 3ad4efb61..a6f070a1f 100644
--- a/examples/ImageNetModels/README.md
+++ b/examples/ImageNetModels/README.md
@@ -4,7 +4,7 @@ ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VG
 To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`.
 More options are available in `./{model}.py --help`.
 Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12).
-Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
+Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/#ImageNetModels).
 
 ### ShuffleNet
 
diff --git a/examples/Saliency/README.md b/examples/Saliency/README.md
index 8e33f334a..30eb48b6c 100644
--- a/examples/Saliency/README.md
+++ b/examples/Saliency/README.md
@@ -39,7 +39,7 @@ Usage:
 ./CAM-resnet.py --data /path/to/imagenet [--load ImageNet-ResNet18-Preact.npz] [--gpu 0,1,2,3]
 ```
 Pretrained and fine-tuned ResNet can be downloaded
-in the [model zoo](http://models.tensorpack.com/).
+in the [model zoo](http://models.tensorpack.com/#Visualization).
 
 2. Generate CAM on ImageNet validation set:
 ```bash
diff --git a/examples/SpatialTransformer/README.md b/examples/SpatialTransformer/README.md
index 275bf9008..484e3dcc1 100644
--- a/examples/SpatialTransformer/README.md
+++ b/examples/SpatialTransformer/README.md
@@ -20,7 +20,7 @@ To train (takes about 300 epochs to reach 8.8% error):
 ./mnist-addition.py
 ```
 
-To draw the above visualization with [pretrained model](http://models.tensorpack.com/SpatialTransformer/):
+To draw the above visualization with [pretrained model](http://models.tensorpack.com/#SpatialTransformer):
 ```bash
 ./mnist-addition.py --load mnist-addition.npz --view
 ```
diff --git a/examples/SuperResolution/README.md b/examples/SuperResolution/README.md
index c1137d2b6..96c74e587 100644
--- a/examples/SuperResolution/README.md
+++ b/examples/SuperResolution/README.md
@@ -35,7 +35,7 @@ python enet-pat.py --vgg19 /path/to/vgg19.npz --data train2017.lmdb
 
 Training is highly unstable and does not often give good results.
 The pretrained model may also fail on different types of images.
-You can download and play with the pretrained model [here](http://models.tensorpack.com/SuperResolution/).
+You can download and play with the pretrained model [here](http://models.tensorpack.com/#SuperResolution).
 
 3. Inference on an image and output in current directory:
 
diff --git a/tensorpack/graph_builder/training.py b/tensorpack/graph_builder/training.py
index a22be6484..a7280f84b 100644
--- a/tensorpack/graph_builder/training.py
+++ b/tensorpack/graph_builder/training.py
@@ -304,7 +304,7 @@ def build(self, grad_list, get_opt_fn):
             with tf.name_scope('sync_variables'):
                 post_init_op = SyncMultiGPUReplicatedBuilder.get_post_init_ops()
         else:
-            post_init_op = tf.no_op(name='empty_sync_variables')
+            post_init_op = None
         return train_op, post_init_op
 
 # Adopt from https://github.com/tensorflow/benchmarks/blob/master/scripts/tf_cnn_benchmarks/variable_mgr.py
diff --git a/tensorpack/train/trainers.py b/tensorpack/train/trainers.py
index 476590fe5..0afd52f48 100644
--- a/tensorpack/train/trainers.py
+++ b/tensorpack/train/trainers.py
@@ -190,13 +190,16 @@ def _setup_graph(self, input, get_cost_fn, get_opt_fn):
         grad_list = self._builder.call_for_each_tower(tower_fn)
         self.train_op, post_init_op = self._builder.build(grad_list, get_opt_fn)
 
-        cb = RunOp(
-            post_init_op,
-            run_before=True,
-            run_as_trigger=self.BROADCAST_EVERY_EPOCH,
-            verbose=True)
-        cb.name_scope = "SyncVariables"
-        return [cb]
+        if post_init_op is not None:
+            cb = RunOp(
+                post_init_op,
+                run_before=True,
+                run_as_trigger=self.BROADCAST_EVERY_EPOCH,
+                verbose=True)
+            cb.name_scope = "SyncVariables"
+            return [cb]
+        else:
+            return []
 
 
 class DistributedTrainerBase(SingleCostTrainer):