Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ppwwyyxx committed May 16, 2019
1 parent f6313a0 commit 1e9342a
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 12 deletions.
19 changes: 12 additions & 7 deletions docs/tutorial/efficient-dataflow.md
Expand Up @@ -7,7 +7,6 @@ Since it is simply a generator interface, you can use the DataFlow in any Python
or your own code as well.



**What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images.
The original images (JPEG compressed) are 140G in total.
The average resolution is about 400x350 <sup>[[1]]</sup>.
Expand Down Expand Up @@ -37,10 +36,11 @@ Some things to know before reading:
before doing any optimizations.

The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
including comparison with a similar (but simpler) pipeline built with `tf.data`.
including comparison with a similar pipeline built with `tf.data`.

## Random Read

### Basic
We start from a simple DataFlow:
```python
from tensorpack.dataflow import *
Expand All @@ -64,6 +64,8 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it
because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
Image decoding in `cv2.imread` could also be a bottleneck at this early stage.

### Parallel Prefetch

We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
(because training will need ndarray eventually):
```eval_rst
Expand All @@ -85,11 +87,12 @@ Now it's time to add threads or processes:
ds = PrefetchDataZMQ(ds1, nr_proc=25)
ds = BatchData(ds, 256)
```
Here we start 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
Here we fork 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
which is faster than `multiprocessing.Queue`. You can also apply prefetch after batch, of course.

### Parallel Map
The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`),
it's **not a good idea to use it for validation** (for reasons mentioned at top).
it's **not a good idea to use it for validation** (for reasons mentioned at top. More details at the [documentation](../modules/dataflow.html#tensorpack.dataflow.PrefetchDataZMQ)).
Alternatively, you can use multi-threaded preprocessing like this:

```eval_rst
Expand Down Expand Up @@ -138,11 +141,11 @@ Let's summarize what the above dataflow does:
3. Both 1 and 2 happen together in a separate process, and the results are sent back to main process through ZeroMQ.
4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph.

Note that in an actual training setup, I used the above multiprocess version for training set since
it's faster to run heavy preprocessing in processes, and use this multithread version only for validation set.
There are also `MultiProcessMapData` as well for you to use.

## Sequential Read

### Save and Load a Single-File DataFlow
Random read may not be a good idea when the data is not on an SSD.
We can also dump the dataset into one single LMDB file and read it sequentially.

Expand Down Expand Up @@ -190,6 +193,8 @@ the added line above maintains a buffer of datapoints and shuffle them once a wh
It will not affect the model as long as the buffer is large enough,
but it can also consume much memory if too large.

### Augmentations & Parallel Prefetch

Then we add necessary transformations:
```eval_rst
.. code-block:: python
Expand Down Expand Up @@ -243,7 +248,7 @@ So DataFlow will not be a serious bottleneck if configured properly.

## Distributed DataFlow

To further scale your DataFlow, you can run it on multiple machines and collect them on the
To further scale your DataFlow, you can even run it on multiple machines and collect them on the
training machine. E.g.:
```python
# Data Machine #1, process 1-20:
Expand Down
22 changes: 17 additions & 5 deletions docs/tutorial/extend/input-source.md
Expand Up @@ -30,7 +30,7 @@ down your training by 10%. Think about how many more copies are made during your

Failure to hide the data preparation latency is the major reason why people
cannot see good GPU utilization. You should __always choose a framework that enables latency hiding.__
However most other TensorFlow wrappers are designed to be `feed_dict` based.
However most other TensorFlow wrappers are designed without latency hiding in mind.
Tensorpack has built-in mechanisms to hide latency of the above stages.
This is one of the reasons why tensorpack is [faster](https://github.com/tensorpack/benchmarks).

Expand All @@ -47,11 +47,12 @@ People often think they should use `tf.data` because it's fast.
* Indeed it's often fast, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF.
* Python may be just fast enough.

As long as data preparation keeps up with training, and the latency of all four blocks in the
above figure is hidden, __faster reader brings no gains to overall throughput__.
Keep in mind: as long as data loading speed can keep up with training, and the latency of all four blocks in the
above figure is hidden, __a faster reader brings no gains to overall throughput__.

For most types of problems, up to the scale of multi-GPU ImageNet training,
Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`).
See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow.
See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with `tensorpack.dataflow`.

### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
Expand All @@ -73,7 +74,7 @@ To support all these features which could've been done with __3 lines of code in
API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
(i.e. Python again) to the rescue.

It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
It only makes sense to use TF to read data, if your data is originally very clean and well-formatted.
If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!

Think about it: it's a waste of time to write a Python script to transform from some format to TF-friendly format,
Expand Down Expand Up @@ -108,3 +109,14 @@ If you need to use TF reading ops directly, either define a `tf.data.Dataset`
and use `TFDatasetInput`, or use `TensorInput`.

Refer to the documentation of these `InputSource` for more details.

```eval_rst
.. note:: **InputSource requires tensorpack**
`tensorpack.dataflow` is a pure Python library for efficient data loading which can be used
independently without TensorFlow or tensorpack trainers.
However, the `InputSource` interface does require tensorpack and cannot be
used without tensorpack trainers.
Without tensorpack trainers, you'll have to optimize the copy latency by yourself.
```

0 comments on commit 1e9342a

Please sign in to comment.