Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

g-1 #41154

Closed
wants to merge 131 commits into from
Closed

g-1 #41154

wants to merge 131 commits into from

Conversation

caltrops6490
Copy link

just trying a thing

wwwind and others added 30 commits January 17, 2020 12:34
Change-Id: I51baab7d117fb4656fa09f4b9ff78ef34cbbe8a0
* Instead of referencing tf.summary.experimental.set_step, spell out the
  relevant information directly.
* Fix a typo
* Add example to the `as_default` method.
Four new tests are added:
- we test both constants and variables as step values
- we test as_default and set_as_default methods

For as_default, we also check that the original value is
restored if and only if a non-None step was passed.
Change-Id: I2cf421ddda7f9e802202239136ab062bcd63b4aa
Change-Id: I61d7d4a94087d052a782890799211031f6ed3015
After the upstream commit 4de4c60, the following unit-tests started failing on the ROCm platform

```
//tensorflow/python/keras/optimizer_v2:adam_test_gpu
//tensorflow/compiler/xla/tests:scatter_test_gpu
//tensorflow/compiler/tests:scatter_nd_op_test_gpu
```

The cause seems to be a change in the commit above that updates the LLVM version in use.

The LLVM version change (more specifically some AMDGPU backend change contained within the LLVM version change) either introduces an issue or lets manifest an existing issue, w.r.t alloca instructions outside of the entry basic block of a function. The AMDGPU backend seems to expect all alloca instructions to be inside the entry basic block. Having this assumption broken, leads to the regression failures we see above.

This PR/commit changes IR generation for "scatter" op to ensure that the alloca instruction gets emitted in the entry basic block of the function. This changesmakes the above unit tests pass again. This commit also updates other instances in XLA code where alloca instructions were getting added outside of the entry basic block of a function.

-----------------------------

Details on how to isolate the change the causes the `//tensorflow/python/keras/optimizer_v2:adam_test_gpu` testcase to fail

build TF using the commit a1ae008 (which is the parent commit of 4de4c60), and the unit test...it should pass.

The commit 4de4c60 has several changes in it (in addition to the LLVM version change), so apply the following patch to pick up just the LLVM version change

```diff
root@prj47-rack-37:/root/tensorflow# git diff
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index bf64405..15fd1f7 100755
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -655,8 +655,8 @@ def tf_repositories(path_prefix = "", tf_repo_name = ""):
     )

     # Check out LLVM and MLIR from llvm-project.
-    LLVM_COMMIT = "cf86a234ba86acf0bb875e21d27833be36e08be4"
-    LLVM_SHA256 = "5375bdcdabd4886ab86eddfddef6e21dbc3cac9df67af7d3c44fadb527f74e25"
+    LLVM_COMMIT = "b726d071b4aa46004228fc38ee5bfd167f999bfe"
+    LLVM_SHA256 = "d7e67036dc89906cb2f80df7b0b7de6344d86eddf6e98bb4d01a578242889a73"
     LLVM_URLS = [
         "https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
         "https://github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
diff --git a/third_party/mlir/BUILD b/third_party/mlir/BUILD
index df875eb..624f17e 100644
--- a/third_party/mlir/BUILD
+++ b/third_party/mlir/BUILD
@@ -1176,28 +1176,6 @@ cc_library(
     ],
 )

-cc_library(
-    name = "GPURuntimeTransforms",
-    srcs = [
-        "lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp",
-        "lib/Conversion/PassDetail.h",
-    ],
-    hdrs = [
-        "include/mlir/Conversion/GPUCommon/GPUCommonPass.h",
-    ],
-    includes = ["include"],
-    deps = [
-        ":ConversionPassIncGen",
-        ":GPUDialect",
-        ":IR",
-        ":LLVMDialect",
-        ":Pass",
-        ":Support",
-        "@llvm-project//llvm:core",
-        "@llvm-project//llvm:support",
-    ],
-)
-
 gentbl(
     name = "GPUToNVVMGen",
     strip_include_prefix = "lib/Conversion/GPUToNVVM",
@@ -1307,12 +1285,13 @@ cc_library(
 )

 cc_library(
-    name = "GPUToCUDATransforms",
+    name = "GPUToGPURuntimeTransforms",
     srcs = [
-        "lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp",
+        "lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp",
+        "lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp",
         "lib/Conversion/PassDetail.h",
     ],
-    hdrs = ["include/mlir/Conversion/GPUToCUDA/GPUToCUDAPass.h"],
+    hdrs = ["include/mlir/Conversion/GPUCommon/GPUCommonPass.h"],
     includes = ["include"],
     deps = [
         ":ConversionPassIncGen",
@@ -2490,7 +2469,7 @@ cc_library(
     includes = ["include"],
     deps = [
         ":Analysis",
-        ":GPURuntimeTransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUToSPIRVTransforms",
@@ -2570,8 +2549,7 @@ cc_library(
         ":ConversionPassIncGen",
         ":GPUDialect",
         ":GPUPassIncGen",
-        ":GPURuntimeTransforms",
-        ":GPUToCUDATransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUToSPIRVTransforms",
@@ -2776,7 +2754,7 @@ cc_binary(
         ":AllPassesAndDialectsNoRegistration",
         ":ExecutionEngineUtils",
         ":GPUDialect",
-        ":GPURuntimeTransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUTransforms",
@@ -2786,6 +2764,7 @@ cc_binary(
         ":MlirJitRunner",
         ":NVVMDialect",
         ":Pass",
+        ":TargetNVVMIR",
         ":Transforms",
         "//devtools/build/runtime:get_runfiles_dir",
         "//third_party/gpus/cuda:cuda_headers",
diff --git a/third_party/mlir/test.BUILD b/third_party/mlir/test.BUILD
index 24b310f..9b6cb28 100644
--- a/third_party/mlir/test.BUILD
+++ b/third_party/mlir/test.BUILD
@@ -158,7 +158,7 @@ cc_library(
         "@llvm-project//mlir:Analysis",
         "@llvm-project//mlir:EDSC",
         "@llvm-project//mlir:GPUDialect",
-        "@llvm-project//mlir:GPUToCUDATransforms",
+        "@llvm-project//mlir:GPUToGPURuntimeTransforms",
         "@llvm-project//mlir:GPUTransforms",
         "@llvm-project//mlir:IR",
         "@llvm-project//mlir:LinalgOps",
@@ -167,6 +167,8 @@ cc_library(
         "@llvm-project//mlir:SCFDialect",
         "@llvm-project//mlir:StandardOps",
         "@llvm-project//mlir:Support",
+        "@llvm-project//mlir:TargetNVVMIR",
+        "@llvm-project//mlir:TargetROCDLIR",
         "@llvm-project//mlir:TransformUtils",
         "@llvm-project//mlir:Transforms",
         "@llvm-project//mlir:VectorOps",
```

Re-run the unit test, and it will fail.
…tribution settings based on model garden team
Fixes #40758. Go map iteration order is random. If we send randomly ordered lists of inputs to the C core it will create a key in the session executors_ cache for each of the random orderings. With more than a handful of inputs this can get big very quickly.
Change-Id: I9ea50c75014cc03ac91fdef0f5b4fe11395f7074
allenlavoie and others added 22 commits July 6, 2020 18:00
…mpy()

Also tweaks tensor_util.constant_value to return None on unimplemented .numpy() for custom devices. This fixes some errors while computing gradients with a parallel device.

PiperOrigin-RevId: 319891520
Change-Id: I84ce5b1adc8fa066502afbba5afa7362c4464761
… values.

PiperOrigin-RevId: 319892460
Change-Id: Ie1c393dc0a7fcccaccd4399115963de06ff50057
…n't drop real error messages.

PiperOrigin-RevId: 319893870
Change-Id: I8b0a4f62fbfd74c80a6f281f5490e51051a4c7df
…executor dialect.

Island coarsening has been replaced with executor dialect to functional conversion pass, so TPU cluster formation will not get input IRs that have the tf_executor dialect (graph, island).

PiperOrigin-RevId: 319895769
Change-Id: Ice164ed781a38ac23adffe2c56c544199fbbaced
PiperOrigin-RevId: 319900453
Change-Id: I7466d1186d361d11913a2f953ecc8723de423e8e
PiperOrigin-RevId: 319900804
Change-Id: If1f777676754eb1b6275aedf126fb82810f08f6d
…tutorial will use the same pieces of code to demonstrate how to save and load (with MWMS) in a multi-worker environment.

PiperOrigin-RevId: 319902843
Change-Id: If3e00764c0b5f545da0e003f9c14a88be48a1ff6
TPUReplicatedInputs now has support for packed per replica tensors, where passes will need to know about before having some special handling for such operands to a TPU computation. To model packed per replica tensors, an additional Variadic operand has been added to tf_device.replicate, alongside updates to the parser, printer, and verifier.

PiperOrigin-RevId: 319903540
Change-Id: I44fc719ddef752877cee7167a3512f2a0101ee37
PiperOrigin-RevId: 319904600
Change-Id: Ibd84524c647a04eb0b740ec921bca8e45b38d0af
…e read only, so this is always equal to NumOutputs. Use that function instead.

PiperOrigin-RevId: 319904789
Change-Id: I19e71e3538fa2b3446f83e5a037b344d322fd20d
…ation's name when computing fingerprint.

PiperOrigin-RevId: 319911247
Change-Id: Id8ced870114fcbfd26e7ef1d4fd2589d85b5d06b
PiperOrigin-RevId: 319914621
Change-Id: I5436113a6d2dea9094651b78fb2ebeb8624372fd
This is part of the current refactoring of the HLO related dialect.
`xla_hlo` will be reintroduced in a new form later.

PiperOrigin-RevId: 319916753
Change-Id: I2c1b426b8a293927af5569bd35990a54b6b0743e
…ebugging.

cr/286899513 introduced a verbose network name to help debugging for TensorRT
6+. However, the code block was added to the wrong branch for the older
TensorRT versions as dead code. This CL relocates the code block to make it
useful.

PiperOrigin-RevId: 319919276
Change-Id: I16b8fdb376652df94ed82ee1b4b455535c958223
PiperOrigin-RevId: 319921252
Change-Id: I31897286d641f4214dcdb05ea72b048938f51411
PiperOrigin-RevId: 319924478
Change-Id: I37ad2d6fa4e5a0f4338cc7442c9b56e1e0b01314
PiperOrigin-RevId: 319926991
Change-Id: I18a4eb0f23c7b80cc1ccbefa16f6ce75e15a378b
PiperOrigin-RevId: 319937355
Change-Id: Ic7378bf11ca36db6cb05eb8a0a8ef41a05fca566
PiperOrigin-RevId: 319943458
Change-Id: I4a53eb7a62463aab68864eaec38d0763e5b3f731
PiperOrigin-RevId: 319943459
Change-Id: Ifb618ab6f204fdba775cff4066aaa99309ce7bee
PiperOrigin-RevId: 319955712
Change-Id: Ie6edbda86ef01823559979b30356e6a2664f2c42
…nctions. Fixes #39832.

This method is 10-30x slower in benchmarks, and scales poorly with file size. However, it can be sped up by moving processing to C++ or by detecting safe cases by looking at the surrounding code.

Unlike the regular approach from inspect which relied on a coarse mechanism based on line numbers alone, and which used brittle regexp / lexer searches, this approach parses the entire source file, and then searches for matching ast.Lambda nodes based on line number information and argument signature. The mechanism errors out when this resolution cannot be done precisely (e.g. `lamda x: lambda x: x`). This in turn guarantees that the parsed AST matches exactly what the interpreter sees.

A more permanent fix would address the issue in `inspect` which makes an incorrect assumption here: https://github.com/python/cpython/blob/53d2b715d10e5c81cb9c86fd25e88ace4e41bc25/Lib/inspect.py#L924

PiperOrigin-RevId: 319972026
Change-Id: Ie3f522486860f512add9409c246a13b87462f4aa
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.