Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ppwwyyxx committed Sep 2, 2020
1 parent 63f656c commit 229e991
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 9 deletions.
8 changes: 4 additions & 4 deletions examples/PennTreebank/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ This example is mainly to demonstrate:
1. How to train an RNN with persistent state between iterations. Here it simply manages the state inside the graph.
2. How to use a TF reader pipeline instead of a DataFlow, for both training & inference.

It trains an language model on PTB dataset, basically an equivalent of the PTB example
in [tensorflow/models](https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb)
It trains an language model on PTB dataset, and reimplements an equivalent of the PTB example
in [tensorflow/models](https://github.com/tensorflow/models/blob/v1.13.0/tutorials/rnn/ptb/ptb_word_lm.py)
with its "medium" config.
It has the same performance & speed as the original example as well.
It has the same performance as the original example as well.

Note that the data pipeline is completely copied from the tensorflow example.
Note that the input data pipeline is completely copied from the tensorflow example.

To Train:
```
Expand Down
6 changes: 3 additions & 3 deletions tensorpack/models/batch_norm.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,12 +163,12 @@ def BatchNorm(inputs, axis=None, *, training=None, momentum=0.9, epsilon=1e-5,
* "default": same as "collection". Because this is the default behavior in TensorFlow.
* "skip": do not update EMA. This can be useful when you reuse a batch norm layer in several places
but do not want them to all update your EMA.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS`.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS` in the first training tower.
The ops in the collection will be run automatically by the callback :class:`RunUpdateOps`, along with
your training iterations. This can waste compute if your training iterations do not always depend
on the BatchNorm layer.
* "internal": EMA is updated inside this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it has some more benefits:
* "internal": EMA is updated in the first training tower inside this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it supports more scenarios:
1. BatchNorm is used inside dynamic control flow.
The collection-based update does not support dynamic control flows.
Expand Down
12 changes: 10 additions & 2 deletions tensorpack/train/trainers.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,11 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
are supposed to be in-sync).
But this cheap operation may help prevent
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""

@map_arg(gpus=_int_to_range)
Expand Down Expand Up @@ -403,7 +407,11 @@ class HorovodTrainer(SingleCostTrainer):
Theoretically this is a no-op (because the variables
are supposed to be in-sync).
But this cheap operation may help prevent certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""

def __init__(self, average=True, compression=None):
Expand Down

0 comments on commit 229e991

Please sign in to comment.