reading_data/fully_connected_reader.py VERY slow relative to fully_connected_feed.py #837

nryant · 2016-01-22T01:41:40Z

I noticed that when using a data reader to provide minibatches of examples to a model that performance is greatly reduced relative to just supplying the examples via feed_dict. For instance, when running reading_data/fully_connected_reader.py with the following flags::

--hidden1 512 --hidden2 512 --batch_size 128

it takes 28.7 seconds to process 600 minibatches with a GPU utilization of 13%. If I edit the code so that num_threads=16 (instead of num_threads=2) when shuffle_batch is called, these numbers improve to 14.9 seconds and 23% GPU utilization. However, training the same model via fully_connected_feed.py takes only 2.63 seconds and achieves a GPU utilization of 55%. This is hardly rigorous, but it seems that the overhead involved in reading the Example protos from the TFRecords file, putting them into a queue, etc is much higher than I would expect.

These numbers were compiled using 039981f and running on a Titan X card with no other background processes running.

The text was updated successfully, but these errors were encountered:

nryant · 2016-01-22T12:30:10Z

Related to #551, #763 ?

yaroslavvb · 2016-01-22T22:37:15Z

I was able to get 68x68 images reading/decoding from TF-examples fast enough to saturate my K40 with input pipeline using 6 threads/1 CPU. These tutorials have not been tuned to work efficiently on GPU, so there could be some small ops that are placed on GPU suboptimally and are causing unnecessary data transfers -- try pinning your input pipeline to CPU manually. See #838 for an example of suboptimal placement making GPU version run 100x slower

yaroslavvb · 2016-01-22T23:07:13Z

Re: "Pinning the pipeline to the CPU":

here's how I would optimize it -- pin everything to CPU (export
TF_MIN_GPU_MULTIPROCESSOR_COUNT=800), and remove all the non-reading ops.
Tweak your pipeline design/number of threads until you get maximum
throughput. Then re-enable GPU, and use manual pinning to make sure that
your input pipeline throughput is unchanged. Then attach your processing
ops (on GPU)

On Fri, Jan 22, 2016 at 2:59 PM, nryant notifications@github.com wrote:

Pinning the pipeline to the CPU helps somewhat, but still is worse than I
would expect. For num_threads=2 the time reduces to 12.6 seconds with 13%
GPU utilization and for num_threads=16 to 9.4 seconds with 18% utilization

—
Reply to this email directly or view it on GitHub
#837 (comment)
.

nryant · 2016-01-23T02:33:36Z

This actually fixes #838. Pinning the pipeline to CPU for
fully_connected_reader.py helps somewhat, but performance still lags
fully_connected_feed.py. I did some benchmarking this afternoon before
leaving the office and most of the remaining peformance gap seems to be
from the fact that reading an epoch's worth of images takes 20-25x as long
using a reader (.2 seconds vs about 5 seconds; sorry, out of office and
don't have the precise timings with me). For a more realistically sized
network, this additional overhead wouldn't be such an issue, so this
probably should be closed after fully_connected_reader.py has been
modified.

On Friday, January 22, 2016, Vijay Vasudevan notifications@github.com
wrote:

Closed #837 #837 via
ebe109b
ebe109b
.

—
Reply to this email directly or view it on GitHub
#837 (comment).

Neville Ryant
Linguistic Data Consortium
University of Pennsylvania

yaroslavvb · 2016-01-23T02:44:08Z

oops, off-by-1 error on my part.
When you are using a reader, there's more work done at the beginning because of prefetching, could it be that the extra time is due to it filling up a queue of examples?

girving · 2016-06-06T21:16:51Z

@ebrevdo: Could you take a look since it's queue related?

ebrevdo · 2016-08-10T16:16:02Z

@josh11b could this be due to lack of caching in the readers? not sure if this bug is still relevant given the changes that have been pushed between when this bug report and now.

josh11b · 2016-08-10T17:30:26Z

I believe there is now the ability to read batches from a reader that can reduce overhead, assuming there is no problem with the examples having different dimensions. Also I recall someone is working on ParseExample performance improvements?

ebrevdo · 2016-08-11T03:22:41Z

Yes; the ParseExample work will hopefully get checked in w/in a week or two.

On Wed, Aug 10, 2016 at 10:31 AM, josh11b notifications@github.com wrote:

I believe there is now the ability to read batches from a reader that can
reduce overhead, assuming there is no problem with the examples having
different dimensions. Also I recall someone is working on ParseExample
performance improvements?

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#837 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim6qIclGyr6l59aA8O2h8x9g4HCFOks5qegrVgaJpZM4HKB-s
.

drpngx · 2017-01-24T02:03:06Z

I'm assuming that this is checked in. Feel free to open a new github issue if the problem still persists in recent versions.

vrv closed this as completed in ebe109b Jan 23, 2016

vrv reopened this Jan 23, 2016

girving assigned ebrevdo Jun 6, 2016

girving added the triaged label Jun 6, 2016

aselle removed the triaged label Jul 28, 2016

drpngx closed this as completed Jan 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading_data/fully_connected_reader.py VERY slow relative to fully_connected_feed.py #837

reading_data/fully_connected_reader.py VERY slow relative to fully_connected_feed.py #837

nryant commented Jan 22, 2016

nryant commented Jan 22, 2016

yaroslavvb commented Jan 22, 2016

yaroslavvb commented Jan 22, 2016

nryant commented Jan 23, 2016

yaroslavvb commented Jan 23, 2016

girving commented Jun 6, 2016

ebrevdo commented Aug 10, 2016

josh11b commented Aug 10, 2016

ebrevdo commented Aug 11, 2016

drpngx commented Jan 24, 2017

reading_data/fully_connected_reader.py VERY slow relative to fully_connected_feed.py #837

reading_data/fully_connected_reader.py VERY slow relative to fully_connected_feed.py #837

Comments

nryant commented Jan 22, 2016

nryant commented Jan 22, 2016

yaroslavvb commented Jan 22, 2016

yaroslavvb commented Jan 22, 2016

nryant commented Jan 23, 2016

yaroslavvb commented Jan 23, 2016

girving commented Jun 6, 2016

ebrevdo commented Aug 10, 2016

josh11b commented Aug 10, 2016

ebrevdo commented Aug 11, 2016

drpngx commented Jan 24, 2017