Branch 178996911 #15356

teamdandelion · 2017-12-14T06:05:13Z

Improvement over last PR in that we rolled back a break in the CPU tests
Fixed the same merge conflicts in tensorflow/core/platform/cloud/gcs_dns_cache.cc
Fixed a lot of trivial merge conflicts in keras

dot(concat(..), constant) and dot(constant, concat(..)) can be rewritten to avoid the concatenate. This can itself be a win, but can also help unlock other optimization opportunities. PiperOrigin-RevId: 178691585

PiperOrigin-RevId: 178694869

PiperOrigin-RevId: 178695724

PiperOrigin-RevId: 178701096

PiperOrigin-RevId: 178703180

… errors when building externally using either the Makefile or Bazel. The macros use stderr and fprintf which may not be defined depending on the order of headers included by the .cc files. PiperOrigin-RevId: 178708839

… tensor of unknown rank as a scalar. PiperOrigin-RevId: 178710185

PiperOrigin-RevId: 178710439

There is no great need for this yet, but I noticed that the test cases were broken (they were constructing dots with unset dimension numbers), and one thing led to another. PiperOrigin-RevId: 178713597

PiperOrigin-RevId: 178715353

If the stream is not OK, the timer might not have been initialized and finalized, in which case calling timer->Nanoseconds() is illegal and will crash. PiperOrigin-RevId: 178717089

This way when a test fails, it prints out useful information about the failure, instead of "<48-byte object with these bytes: de ad be ef ...>" PiperOrigin-RevId: 178719733

… is one. * TestUtils now supports generating random literals with more than one constraint. There is still an error if the constraints conflict. PiperOrigin-RevId: 178720092

PiperOrigin-RevId: 178723108

PiperOrigin-RevId: 178723711

PiperOrigin-RevId: 178724659

PiperOrigin-RevId: 178734940

…orflow#14257. PiperOrigin-RevId: 178737278

PiperOrigin-RevId: 178740804

PiperOrigin-RevId: 178751067

PiperOrigin-RevId: 178759398

1) It fixes a bug that manifested as `OutOfRange` being returned prematurely. 2) It changes the behavior on sequences of elements whose size is not a multiple of `batch_size`. Previously, the implementation would drop the last small batch (similar to `batch_and_drop_remainder). Newly, the implementation returns the last small batch (similar to `batch`). PiperOrigin-RevId: 178764508

PiperOrigin-RevId: 178767676

PiperOrigin-RevId: 178769850

Without this change, the C++ ImportGraphDef API returns unused input_map keys (which are plumbed through to the C API as well). However, the Python import_graph_def API requires slightly different semantics: it throws an error for unused input_map keys that are missing from the GraphDef. This change modifies the C and C++ APIs to limit the returned keys to those missing from the GraphDef, and plumbs this through to the C API-enabled import_graph_def implementation. Note that this is a change to the existing C API. Luckily the modified method hasn't been released yet, so it's ok to change it. PiperOrigin-RevId: 178783957

Push constants down add/mul to canonicalize chains and possibly create constant nodes at the bottom. Example: + + + / \ / \ / \ c1 + --> x + --> x c1+c2 / \ / \ c2 x c2 c1 Small cleanup: Consolidate code for manipulating names of nodes added or modified during constant folding. PiperOrigin-RevId: 178785218

PiperOrigin-RevId: 178787158

CompositeNodeManager has per-device LIFO manager, FirstReadyManagers for _Send and _Recv ops, and chooses FirstReady among the ops from per-device LIFOManager and _Send and _Recv FirstReadyManagers. This one can maximizes producer-consumer locality within a device (with LIFO), but does not introduce previously reported scheduling inefficiency w.r.t. multi-device execution with separately managing _Send and _Recv ops and global FirstReady policy across devices. It's implemented, but not enabled; VirtualScheduler still uses FirstReadyManager. PiperOrigin-RevId: 178787352

PiperOrigin-RevId: 178788810

PiperOrigin-RevId: 178790193

PiperOrigin-RevId: 178951330

…graph mode Fixes a bug in which EagerTensors were provided as input to an op. PiperOrigin-RevId: 178957283

- adds support for legacy "BatchMatMul" operators - adds constant scalar values to graphviz output PiperOrigin-RevId: 178957498

… new implementation will exist alongside the old one (selectable through the scheduler options) until its superiority is confirmed, at which point the old rate-based implementation will be removed. The new implementation requires fewer options and no user feedback to achieve a low latency batching. Instead of processing batches at an adjustable rate, we limit the number of batches which can be concurrently processed. Below the limit, batches are immediately processed upon creation. At the limit, the oldest batch is processed once an in-processing batch finishes. The scheduler continuously adjusts the limit in order to maintain the smallest overall latency. PiperOrigin-RevId: 178960621

PiperOrigin-RevId: 178961790

PiperOrigin-RevId: 178962340

This avoids the need for users to add `loss = loss / num_of_towers` code and is in more in line with the current best practices. I verified this by running cnn_mnist. PiperOrigin-RevId: 178963334

PiperOrigin-RevId: 178965261

Previously, Python serialization and deserialization used the half_val field of TensorProto, whereas C++ serialization used the int_val field. However, C++ bfloat16 deserialization was always broken, so it was never possible to correctly deserialize a bfloat16 Tensor. The only reason serialization worked at all was because of the generic tensor_contents bytes serialization. PiperOrigin-RevId: 178966536

…on of conditional HloInstruction. PiperOrigin-RevId: 178966782

PiperOrigin-RevId: 178966883

I had to roll in the change to generalize CPU layout assignment as without it we lose the make-rhs-column-major optimization and that causes a performance regression. PiperOrigin-RevId: 178970986

PiperOrigin-RevId: 178974641

PiperOrigin-RevId: 178977412

…ted TensorFlow. PiperOrigin-RevId: 178980799

* Remove the clustered graph part as it was difficult to keep it updated with the rest of the graph and instead operate on the graph directly; PiperOrigin-RevId: 178980836

PiperOrigin-RevId: 178984357

PiperOrigin-RevId: 178986670

PiperOrigin-RevId: 178988579

PiperOrigin-RevId: 178989673

PiperOrigin-RevId: 178995589

This allows Variants to sit on resource variables; before, though the ReadValue op was enabled for Variants on GPU, because assignment happened on CPU, Variant-based Resource Variables always had to reside on CPU due to the associated colocation constraints. PiperOrigin-RevId: 178996911

Sanjoy Das and others added 30 commits December 11, 2017 16:35

[XLA] Optimize dot(concat(..), constant)

379d59d

dot(concat(..), constant) and dot(constant, concat(..)) can be rewritten to avoid the concatenate. This can itself be a win, but can also help unlock other optimization opportunities. PiperOrigin-RevId: 178691585

Initialize local_resources during session initialization.

c4a242f

PiperOrigin-RevId: 178694869

Automated g4 rollback of changelist 178634559

aaf2eb0

PiperOrigin-RevId: 178695724

closes tensorflow#15281

a5300eb

PiperOrigin-RevId: 178701096

Don't materialize BroadcastGradientArgs by default.

ecdecc5

PiperOrigin-RevId: 178703180

Fix the handling of unknown rank. Previous code would wrongly treat a…

2a48746

… tensor of unknown rank as a scalar. PiperOrigin-RevId: 178710185

prefer_static_* functions added to CORE/distributions/util.py

f18d23b

PiperOrigin-RevId: 178710439

[XLA:CPU] Teach the CPU layout assignment about dot dimension numbers

2adbc21

There is no great need for this yet, but I noticed that the test cases were broken (they were constructing dots with unset dimension numbers), and one thing led to another. PiperOrigin-RevId: 178713597

Update ops-related pbtxt files.

5ff2b9d

PiperOrigin-RevId: 178715353

[XLA] Don't call timer->Nanoseconds() on a not-ok stream.

228b3eb

If the stream is not OK, the timer might not have been initialized and finalized, in which case calling timer->Nanoseconds() is illegal and will crash. PiperOrigin-RevId: 178717089

[XLA] Add stringification to BatchNormTestParam.

913175c

This way when a test fails, it prints out useful information about the failure, instead of "<48-byte object with these bytes: de ad be ef ...>" PiperOrigin-RevId: 178719733

* HloTestBase now prints out the HLO parser error message when there…

f379d33

… is one. * TestUtils now supports generating random literals with more than one constraint. There is still an error if the constraints conflict. PiperOrigin-RevId: 178720092

Disable flaky random ops test.

fcca16c

PiperOrigin-RevId: 178723108

Use BlockHostUntilDoneWithStatus in various places.

383a322

PiperOrigin-RevId: 178723711

[XLA] Properly set layout requirements in Hlo parser.

1ffba99

PiperOrigin-RevId: 178724659

Disable flaky //tensorflow/contrib/learn:dnn_linear_combined_test

3d4aa4e

PiperOrigin-RevId: 178734940

Remove real-data shape check in GANEstimator. Fixes github issue tens…

65d2368

…orflow#14257. PiperOrigin-RevId: 178737278

Simplifying tfe function.py

3f5445b

PiperOrigin-RevId: 178740804

Add get started Datasets doc

5f470cf

PiperOrigin-RevId: 178751067

Automated g4 rollback of changelist 178675527

94686be

PiperOrigin-RevId: 178759398

Disable neutral element and reciprocal optimizations again.

0e1c557

PiperOrigin-RevId: 178767676

Integrate tensor pool feature to gan_loss function.

c8a5ffd

PiperOrigin-RevId: 178769850

disabling flaky test

a7c11aa

PiperOrigin-RevId: 178787158

Sliced Wasserstein Distance metric for GANs evaluation.

a6af4dc

PiperOrigin-RevId: 178788810

[XLA] Always fold transposes into convs or dots regardless of use count

bb70caa

PiperOrigin-RevId: 178790193

tensorflower-gardener and others added 23 commits December 13, 2017 13:48

Rename Stream::BlockHostUntilDoneWithStatus to BlockHostUntilDone.

70062d1

PiperOrigin-RevId: 178951330

Call convert_to_tensor on all inputs to the op for _backprop_call in …

9da6e87

…graph mode Fixes a bug in which EagerTensors were provided as input to an op. PiperOrigin-RevId: 178957283

- adds support constant cast operators

38b20f8

- adds support for legacy "BatchMatMul" operators - adds constant scalar values to graphviz output PiperOrigin-RevId: 178957498

Avoid modifying items in dataset_data_provider.

7fbcd37

PiperOrigin-RevId: 178961790

Add support for "Pad".

57600a8

PiperOrigin-RevId: 178962340

Average the loss across replicate_model_fn's towers.

828dfee

This avoids the need for users to add `loss = loss / num_of_towers` code and is in more in line with the current best practices. I verified this by running cnn_mnist. PiperOrigin-RevId: 178963334

Enable Div -> Mul by reciprocal strength reduction.

b09be8e

PiperOrigin-RevId: 178965261

[XLA] Update parser to handle conditional. Also fix the stringificati…

e2e15df

…on of conditional HloInstruction. PiperOrigin-RevId: 178966782

Adds build rule for scan_dataset_op_test.py

8a9aaa3

PiperOrigin-RevId: 178966883

[XLA:CPU] Implement Ax+b dot output fusion for Matrix-vector products

b3e97d5

I had to roll in the change to generalize CPU layout assignment as without it we lose the make-rhs-column-major optimization and that causes a performance regression. PiperOrigin-RevId: 178970986

Avoid unnecessary layout transpose to input of ShapeN.

937604a

PiperOrigin-RevId: 178974641

Update tf.keras to the 2.1.2 API.

f675c12

PiperOrigin-RevId: 178977412

Enable TF_GPU_THREAD_MODE and TF_GPU_THREAD_COUNT support in distribu…

f22809c

…ted TensorFlow. PiperOrigin-RevId: 178980799

[TFXLA] Simplify identification of cond branches.

69e5969

* Remove the clustered graph part as it was difficult to keep it updated with the rest of the graph and instead operate on the graph directly; PiperOrigin-RevId: 178980836

[XLA] Add support for CustomCall in HLO parser.

eb1bbef

PiperOrigin-RevId: 178984357

Add checkpoint_path to evaluation predicate_fn

baef8c3

PiperOrigin-RevId: 178986670

Automated g4 rollback of changelist 178963334

466926a

PiperOrigin-RevId: 178988579

Minor fix on degenerative case.

b8831d3

PiperOrigin-RevId: 178989673

Bugfix in variable naming of GRUBlockCell.

ed0e250

PiperOrigin-RevId: 178995589

Resolve conflicts

7cb8f7f

teamdandelion requested review from martinwicke and yifeif December 14, 2017 06:05

Fix syntax error I introduced when re-applying my merge fix.

d9769ce

googlebot added the cla: yes label Dec 14, 2017

gunan approved these changes Dec 14, 2017

View reviewed changes

gunan merged commit f5f2f78 into tensorflow:master Dec 14, 2017

teamdandelion deleted the branch_178996911 branch December 14, 2017 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 178996911 #15356

Branch 178996911 #15356

teamdandelion commented Dec 14, 2017

Branch 178996911 #15356

Branch 178996911 #15356

Conversation

teamdandelion commented Dec 14, 2017