Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 178996911 #15356

Merged
merged 91 commits into from
Dec 14, 2017
Merged

Branch 178996911 #15356

merged 91 commits into from
Dec 14, 2017

Conversation

teamdandelion
Copy link
Contributor

  • Improvement over last PR in that we rolled back a break in the CPU tests
  • Fixed the same merge conflicts in tensorflow/core/platform/cloud/gcs_dns_cache.cc
  • Fixed a lot of trivial merge conflicts in keras

Sanjoy Das and others added 30 commits December 11, 2017 16:35
dot(concat(..), constant) and dot(constant, concat(..)) can be rewritten to
avoid the concatenate.  This can itself be a win, but can also help unlock other
optimization opportunities.

PiperOrigin-RevId: 178691585
PiperOrigin-RevId: 178701096
… errors

when building externally using either the Makefile or Bazel.  The macros use
stderr and fprintf which may not be defined depending on the order of
headers included by the .cc files.

PiperOrigin-RevId: 178708839
… tensor of unknown

rank as a scalar.

PiperOrigin-RevId: 178710185
There is no great need for this yet, but I noticed that the test cases were
broken (they were constructing dots with unset dimension numbers), and one thing
led to another.

PiperOrigin-RevId: 178713597
PiperOrigin-RevId: 178715353
If the stream is not OK, the timer might not have been initialized and
finalized, in which case calling timer->Nanoseconds() is illegal and
will crash.

PiperOrigin-RevId: 178717089
This way when a test fails, it prints out useful information about the
failure, instead of

  "<48-byte object with these bytes: de ad be ef ...>"

PiperOrigin-RevId: 178719733
… is one.

 * TestUtils now supports generating random literals with more than one constraint.
     There is still an error if the constraints conflict.

PiperOrigin-RevId: 178720092
PiperOrigin-RevId: 178723108
PiperOrigin-RevId: 178740804
PiperOrigin-RevId: 178751067
1) It fixes a bug that manifested as `OutOfRange` being returned prematurely.

2) It changes the behavior on sequences of elements whose size is not a multiple of `batch_size`. Previously, the implementation would drop the last small batch (similar to `batch_and_drop_remainder). Newly, the implementation returns the last small batch (similar to `batch`).

PiperOrigin-RevId: 178764508
Without this change, the C++ ImportGraphDef API returns unused
input_map keys (which are plumbed through to the C API as
well). However, the Python import_graph_def API requires slightly
different semantics: it throws an error for unused input_map keys that
are missing from the GraphDef.

This change modifies the C and C++ APIs to limit the returned keys to
those missing from the GraphDef, and plumbs this through to the C
API-enabled import_graph_def implementation.

Note that this is a change to the existing C API. Luckily the modified
method hasn't been released yet, so it's ok to change it.

PiperOrigin-RevId: 178783957
Push constants down add/mul to canonicalize chains and possibly create constant nodes at the bottom. Example:

      +                +             +
     / \              / \           / \
    c1   +     -->   x   +    -->  x c1+c2
        / \             / \
       c2  x           c2 c1

Small cleanup: Consolidate code for manipulating names of nodes added or modified during constant folding.

PiperOrigin-RevId: 178785218
PiperOrigin-RevId: 178787158
CompositeNodeManager has per-device LIFO manager, FirstReadyManagers for _Send
and _Recv ops, and chooses FirstReady among the ops from per-device LIFOManager
and _Send and _Recv FirstReadyManagers.

This one can maximizes producer-consumer locality within a device (with LIFO),
but does not introduce previously reported scheduling inefficiency w.r.t.
multi-device execution with separately managing _Send and _Recv ops and global
FirstReady policy across devices.

It's implemented, but not enabled; VirtualScheduler still uses
FirstReadyManager.

PiperOrigin-RevId: 178787352
…graph mode

Fixes a bug in which EagerTensors were provided as input to an op.

PiperOrigin-RevId: 178957283
- adds support for legacy "BatchMatMul" operators
- adds constant scalar values to graphviz output

PiperOrigin-RevId: 178957498
… new implementation will exist alongside the old one (selectable through the scheduler options) until its superiority is confirmed, at which point the old rate-based implementation will be removed.

The new implementation requires fewer options and no user feedback to achieve a low latency batching. Instead of processing batches at an adjustable rate, we limit the number of batches which can be concurrently processed. Below the limit, batches are immediately processed upon creation. At the limit, the oldest batch is processed once an in-processing batch finishes.  The scheduler continuously adjusts the limit in order to maintain the smallest overall latency.

PiperOrigin-RevId: 178960621
PiperOrigin-RevId: 178962340
This avoids the need for users to add `loss = loss / num_of_towers` code and is in more in line with the current best practices.

I verified this by running cnn_mnist.

PiperOrigin-RevId: 178963334
Previously, Python serialization and deserialization used the half_val field of TensorProto, whereas C++ serialization used the int_val field. However, C++ bfloat16 deserialization was always broken, so it was never possible to correctly deserialize a bfloat16 Tensor.

The only reason serialization worked at all was because of the generic tensor_contents bytes serialization.

PiperOrigin-RevId: 178966536
…on of

conditional HloInstruction.

PiperOrigin-RevId: 178966782
I had to roll in the change to generalize CPU layout assignment as without it we
lose the make-rhs-column-major optimization and that causes a performance
regression.

PiperOrigin-RevId: 178970986
PiperOrigin-RevId: 178977412
* Remove the clustered graph part as it was difficult to keep it updated with the rest of the graph and instead operate on the graph directly;

PiperOrigin-RevId: 178980836
PiperOrigin-RevId: 178989673
PiperOrigin-RevId: 178995589
This allows Variants to sit on resource variables; before, though the
ReadValue op was enabled for Variants on GPU, because assignment happened
on CPU, Variant-based Resource Variables always had to reside on CPU
due to the associated colocation constraints.

PiperOrigin-RevId: 178996911
@gunan gunan merged commit f5f2f78 into tensorflow:master Dec 14, 2017
@teamdandelion teamdandelion deleted the branch_178996911 branch December 14, 2017 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet