Add calls to `reserve()` before populating vectors #51739

SamuelMarks · 2021-08-30T00:03:04Z

…and am at capacity

PS: WiP. Will finish going through you codebase adding capacity hints to all vectors with obvious opportunity for this optimisation.

…sorflow/cc/ops/while_loop.cc,tensorflow/compiler/jit/extract_outside_compilation_pass.cc,tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc] Space height

…vice/hlo_instruction.cc}] Correctly constrain vectors to explicit reserves

SamuelMarks · 2021-10-03T20:08:35Z

Following the huge amount of interest from @sanjoy @kkimdev @sherhut @qqfish and @joker-eph—over the past 5 weeks—I merged in the latest master and reserved vector allocation to another 82 files in the TensorFlow codebase.

joker-eph · 2021-10-03T20:55:29Z

Nice! I had missed this pull-request originally.

It is likely that I read the title as some spam somehow though, can you title this explicitly: Add calls to `reserve()` before populating vectors or something like that.

SamuelMarks · 2021-10-04T03:49:51Z

123 CAN HAZ REZERVATIONS

Sure things @joker-eph ; and whilst I was at it I finished the cc files in the compiler dir

…l_ir_emitter}.cc,tensorflow/compiler/tf2xla/kernels/cross_op.cc] Properly reserve vector space

joker-eph · 2021-10-15T04:09:44Z

so I didn't have much option left 😕

There is a wide spectrum between "everything in one PR" and "one PR per file": you went from one extreme to the other ; I'm saying there is also the possibility to exercise some reasonable judgement.
For example it is quite common that a large software has many components, and different people working on these various component. Sharding across these lines helps having the right people reviewing the right PR, without exploding the overhead of so many PR to click through individually.

(I'm puzzled how you still like the one big PR after suffering through rebase / merge conflicts... the larger the code change the more likely it is to suffer through these).

bhack · 2021-10-15T11:33:05Z

For example it is quite common that a large software has many components, and different people working on these various component.

This is true but if we give more care in the maintainership of our codeowners file it could be easier to identifiy 1 PR x component logic:

https://github.com/tensorflow/tensorflow/blob/master/CODEOWNERS

Currently It seems to me quite partial.

james-martens · 2021-10-15T12:25:45Z

Hi, I'm not sure why I was added to this thread. Perhaps someone typed the wrong name? James

…

On Fri, Oct 15, 2021 at 12:33 PM bhack ***@***.***> wrote: For example it is quite common that a large software has many components, and different people working on these various component. This is true but if we give more care in the maintainership of our codeowners file it could be easier to identifiy 1 PR x component logic: https://github.com/tensorflow/tensorflow/blob/master/CODEOWNERS Currently It seems to me quite partial. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51739 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAXP7MLCO24S7LM2JTGJXLUHAGQDANCNFSM5DAYH53A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

bhack · 2021-10-15T12:52:25Z

@james-martens I don't see your user name in any comment.

james-martens · 2021-10-15T12:56:59Z

It was in Thea Lamkin's message from yesterday: ***@***.*** <https://github.com/james-martens> I recommend you follow @akuegel <https://github.com/akuegel> and @joker-eph <https://github.com/joker-eph>'s request to break this PR up, at least by narrowing down to your working changes."

…

On Fri, Oct 15, 2021 at 1:52 PM bhack ***@***.***> wrote: @james-martens <https://github.com/james-martens> I don't see your user name in any comment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51739 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAXP7NSBUPKA34KQYU62I3UHAPZRANCNFSM5DAYH53A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

bhack · 2021-10-15T13:03:24Z

OK I think it was a typo searching for the alias auto-completion mark->mart

joker-eph · 2021-10-15T16:15:01Z

For example it is quite common that a large software has many components, and different people working on these various component.

This is true but if we give more care in the maintainership of our codeowners file it could be easier to identifiy 1 PR x component logic:

https://github.com/tensorflow/tensorflow/blob/master/CODEOWNERS

Currently It seems to me quite partial.

Thanks @bhack, I wasn't aware of this file. There is another tool we have that auto-assign based on path (for example I get all reviews in tensorflow/compiler/mlir auto-assigned. Seems like we could use this CODEOWNERS files instead!

mihaimaruseac · 2021-10-15T16:17:04Z

The auto-assignment is via a github bot configured by the GitHub team (cc @gbaned ) https://github.com/tensorflow/tensorflow/blob/master/.github/bot_config.yml

bhack · 2021-10-15T20:15:47Z

Thanks @bhack, I wasn't aware of this file. There is another tool we have that auto-assign based on path (for example I get all reviews in tensorflow/compiler/mlir auto-assigned. Seems like we could use this CODEOWNERS files instead!

I think that could be useful to maintain a reference github account for every folder. If not in CODEOWNERS as we don't want to notify directly team members in another reference file.

This could help up to write in the contribution guide how to segment a code contribution in multiple PR, like this one, for the single team review unit if and when this is possible.

bhack · 2021-10-15T22:43:53Z

This was transformed in 116 open PRs.
I suggest that they could be aggregated by folder as this approach it is also going to spam CI resources.

bhack · 2021-10-15T23:02:12Z

Just to make an example:

git fetch origin pull/51739/head:reserve
git diff --dirstat=files,0 reserve

   0.3% tensorflow/c/eager/
   0.3% tensorflow/c/experimental/filesystem/plugins/gcs/
   1.9% tensorflow/c/
   0.7% tensorflow/cc/gradients/
   0.3% tensorflow/cc/ops/
   0.7% tensorflow/cc/saved_model/
   0.3% tensorflow/compiler/aot/
   2.2% tensorflow/compiler/jit/
   1.1% tensorflow/compiler/mlir/hlo/include/mlir-hlo/Dialect/mhlo/IR/
   0.3% tensorflow/compiler/mlir/hlo/lib/Analysis/
   0.3% tensorflow/compiler/mlir/hlo/
   0.3% tensorflow/compiler/mlir/lite/ir/
   0.3% tensorflow/compiler/mlir/lite/transforms/
   0.3% tensorflow/compiler/mlir/python/
   1.1% tensorflow/compiler/mlir/tensorflow/ir/
   2.2% tensorflow/compiler/mlir/tensorflow/tests/
  13.3% tensorflow/compiler/mlir/tensorflow/transforms/
   0.3% tensorflow/compiler/mlir/tensorflow/utils/
   0.3% tensorflow/compiler/mlir/tensorflow/
   0.3% tensorflow/compiler/mlir/tfr/examples/mnist/
   0.3% tensorflow/compiler/mlir/tfr/integration/
   0.3% tensorflow/compiler/mlir/tfrt/jit/
   0.3% tensorflow/compiler/mlir/tfrt/python_tests/regression_tests/
   0.3% tensorflow/compiler/mlir/tfrt/python_tests/
   0.3% tensorflow/compiler/mlir/tfrt/tests/tf_to_corert/
   0.7% tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data/
   0.3% tensorflow/compiler/mlir/tfrt/transforms/lmhlo_to_gpu/
   0.3% tensorflow/compiler/mlir/tfrt/
   1.5% tensorflow/compiler/mlir/tools/kernel_gen/transforms/
   0.3% tensorflow/compiler/mlir/tools/kernel_gen/
   0.3% tensorflow/compiler/mlir/xla/experimental/conv_emitter/
   0.3% tensorflow/compiler/mlir/xla/ir/
   0.3% tensorflow/compiler/mlir/xla/tests/
   0.7% tensorflow/compiler/mlir/xla/transforms/
   0.3% tensorflow/compiler/mlir/
   0.3% tensorflow/compiler/tests/
   1.1% tensorflow/compiler/tf2tensorrt/convert/
   0.3% tensorflow/compiler/tf2tensorrt/kernels/
   4.1% tensorflow/compiler/tf2xla/kernels/
   0.3% tensorflow/compiler/tf2xla/lib/
   2.2% tensorflow/compiler/tf2xla/
   2.2% tensorflow/compiler/xla/client/lib/
   1.5% tensorflow/compiler/xla/client/
   0.3% tensorflow/compiler/xla/pjrt/
   0.7% tensorflow/compiler/xla/python/tpu_driver/
   1.5% tensorflow/compiler/xla/python/xla_extension/
   1.5% tensorflow/compiler/xla/python/
   1.9% tensorflow/compiler/xla/service/cpu/
   0.7% tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/
   1.1% tensorflow/compiler/xla/service/gpu/tests/
   2.2% tensorflow/compiler/xla/service/gpu/
   0.3% tensorflow/compiler/xla/service/interpreter/
   1.9% tensorflow/compiler/xla/service/spmd/
  12.9% tensorflow/compiler/xla/service/
   4.1% tensorflow/compiler/xla/tests/
   1.9% tensorflow/compiler/xla/
   0.3% tensorflow/compiler/xrt/kernels/
   0.3% tensorflow/compiler/xrt/tests/
   0.3% tensorflow/core/common_runtime/eager/
   0.3% tensorflow/core/common_runtime/
   0.3% tensorflow/core/distributed_runtime/eager/
   0.3% tensorflow/core/framework/
   0.3% tensorflow/core/kernels/
   0.3% tensorflow/core/ops/
   0.7% tensorflow/core/profiler/utils/
   1.1% tensorflow/core/protobuf/
   0.3% tensorflow/core/public/
   1.1% tensorflow/core/runtime_fallback/kernel/
   0.3% tensorflow/core/runtime_fallback/runtime/
   0.3% tensorflow/core/runtime_fallback/
   0.3% tensorflow/go/op/
   0.3% tensorflow/lite/kernels/
   0.3% tensorflow/lite/toco/graph_transformations/
   0.3% tensorflow/python/autograph/impl/
   0.3% tensorflow/python/compat/
   0.3% tensorflow/python/data/ops/
   0.3% tensorflow/python/distribute/integration_test/
   1.1% tensorflow/python/distribute/
   1.5% tensorflow/python/eager/
   1.1% tensorflow/python/framework/
   0.7% tensorflow/python/keras/estimator/
   0.3% tensorflow/python/keras/
   0.3% tensorflow/python/training/
   0.3% tensorflow/python/types/
   0.7% tensorflow/tools/api/golden/v1/
   0.7% tensorflow/tools/api/golden/v2/
   0.3% tensorflow/tools/api/lib/
   0.7% tensorflow/tools/ci_build/
   0.7% tensorflow/
   0.3% third_party/llvm/
   0.3% third_party/tf_runtime/

So what kind of PR clustering on folders do you suggest?:

tensorflow/c
tensorflow/cc
tensorflow/compiler/jit/
tensorflow/compiler/mlir
tensorflow/compiler/xla
....ETC...

bhack · 2021-10-16T00:58:22Z

P.s. now we have 118 PRs

mihaimaruseac · 2021-10-17T00:31:37Z

5 PRs (compiler/mlir, compiler/xla, python, core, everything else) would be a good compromise I think

SamuelMarks · 2021-10-17T01:37:51Z

@mihaimaruseac So I made #52532 through:

git checkout master
git checkout -b 'tensorflow.compiler.xla'
git branch -a | grep -F ' tensorflow.compiler.xla' | xargs -n 1 git merge
git push --set-upstream offscale
gh pr create --title '[tensorflow/compiler/xla/**/*.cc] Add calls to `reserve()` before populating vectors' \
             --body '#51739#issuecomment-945027209 told me to merge into one PR per "large module/namespace"'

Is that correct? - If so, I'll do the same for the others.

PS: I purposefully didn't squash… do you want me to, or are you happy to just use the GitHub button shortcut?

bhack · 2021-10-18T14:14:00Z

Now it think that your changes are better aggregated.

SamuelMarks · 2021-10-18T16:42:04Z

So what do you want me to do?

bhack · 2021-10-18T16:46:17Z

So what do you want me to do?

That you run tests for your PR locally as I've already commented at #52532 (comment)

mihaimaruseac · 2021-10-18T17:07:45Z

Let's keep it now to one PR per file as those have been reviewed already and are in the pipeline. Assuming they build things should progress from here.

In the future though, please split per directories, #52532 is still quite large. Also, please run at least a bazel build locally with the PR to make sure it builds.

bhack · 2021-10-18T18:03:38Z

If we suppose a split per directories as general advice we are going to still generate 91 PRs in cases like this on. No too much different that the 118 PR generated by 1 file x PR

joker-eph · 2021-10-18T20:21:22Z

I don't think there is an automated way to tell where to split: this is a semantic kind of thing, for example under tensorflow/compiler/ every single directory could be a separate component, similar under tensorflow/core, but not under tensorflow/c.
In general with minimal judgement it isn't too hard to figure out a reasonable grouping.
It is also likely not the common case to have contributions from people who don't really understand the software's high-level components organization.

bhack · 2021-10-18T20:39:06Z

I don't think there is an automated way to tell where to split: this is a semantic kind of thing, for example under tensorflow/compiler/ every single directory could be a separate component, similar under tensorflow/core, but not under tensorflow/c.

That's why in my previous comment I preferred to have a reference fie on github with a github reference team or user account for each component. You could also use a file different from CODEOWNERS.md if don't want to be notified with that logic.

But in that way ware are almost aware how the code is organized on your side at a semantic level.

mihaimaruseac · 2021-10-18T20:41:32Z

I think CODEOWNERS is orthogonal. One is for automatic assignment to reviewers, the other is using judgement to split large PRs.

bhack · 2021-10-18T20:54:39Z

But also now we partially use the CODEOWNERS default github assigment/notification logic but in other cases we use a more complex triaging/notification/assigmnet logic as you have mentioned in #51739 (comment).

What I meant here is that as we don't have an unique traditional assignment file, it would be nice to have a file in the repository where we maintain the proxy ownership and segmentation of folders/components with a so large project like Tensorflow as this is also not documented in the official website so we don't have any source on this topic.

If you think that then the community will abuse notification on these Github alias I think that you could also omit them and just push, in that file, an overview of the folders components semantic.

mihaimaruseac · 2021-10-21T15:50:04Z

So most PRs have been merged. Can you go through the comments on the remaining ones and try to fix them? There are a few with no comments that are going through the pipeline right now but given we've had 100 PRs this resulted in somewhat of a denial of service on CI runners so it will take a while.

SamuelMarks · 2021-10-22T00:49:49Z

Happy to field test your CI runners =)

@mihaimaruseac Just double-checked and did a bunch of comments & commits. That should cover all open—and a couple of closed—PRs.

…is.cc] Add calls to `reserve()` before populating vector Imported from GitHub PR tensorflow/tensorflow#52466 tensorflow/tensorflow#51739 (comment) told me to split the larger PR into one PR per file; thus this (thanks `bash`, `git` and `gh`!) Copybara import of the project: -- cc99fd8ad768f2354674d130357a3efcce9ba475 by Samuel Marks <807580+SamuelMarks@users.noreply.github.com>: [tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc] Add calls to `reserve()` before populating vectors PiperOrigin-RevId: 405540491

[tensorflow/c/c_api_function_test.cc,tensorflow/c/kernels_test.cc,ten…

c98684a

…sorflow/cc/ops/while_loop.cc,tensorflow/compiler/jit/extract_outside_compilation_pass.cc,tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc] Space height

google-ml-butler bot added the size:XS CL Change Size: Extra Small label Aug 30, 2021

google-ml-butler bot requested review from joker-eph, sanjoy and sherhut August 30, 2021 00:03

google-ml-butler bot added the awaiting review Pull request awaiting review label Aug 30, 2021

google-cla bot added the cla: yes label Aug 30, 2021

gbaned self-assigned this Aug 30, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation Aug 30, 2021

SamuelMarks added 2 commits September 1, 2021 11:49

[tensorflow/compiler] Constrain vectors to explicit reserves

55b79fe

[tensorflow/compiler/{jit/extract_outside_compilation_pass.cc,xla/ser…

44adefc

…vice/hlo_instruction.cc}] Correctly constrain vectors to explicit reserves

gbaned requested a review from cheshire October 1, 2021 14:43

cheshire removed their request for review October 1, 2021 16:13

SamuelMarks added 2 commits October 3, 2021 12:39

Merge branch 'master' into reservations

0337b19

[tensorflow/{c,cc,compiler}/*.cc] Reserve vector space

724eccd

SamuelMarks requested review from kkimdev and qqfish as code owners October 3, 2021 20:06

[tensorflow/compiler/**/*.cc] Properly reserve vector space

bce7273

SamuelMarks changed the title ~~I have reservations~~ Add calls to reserve() before populating vectors Oct 4, 2021

[tensorflow/compiler/xla/service/{cpu/vector_support_library,elementa…

1603e7e

…l_ir_emitter}.cc,tensorflow/compiler/tf2xla/kernels/cross_op.cc] Properly reserve vector space

joker-eph approved these changes Oct 4, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 4, 2021

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Oct 4, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 4, 2021

gbaned removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Oct 4, 2021

This was referenced Oct 15, 2021

[tensorflow/compiler/xla/client/lib/comparators_test.cc] Add calls to reserve() before populating vectors #52523

Merged

[tensorflow/compiler/xla/service/convolution_group_converter.cc] Add calls to reserve() before populating vectors #52524

Merged

SamuelMarks mentioned this pull request Oct 17, 2021

[tensorflow/compiler/xla/**/*.cc] Add calls to reserve() before populating vectors #52532

Closed

bhack mentioned this pull request Feb 11, 2022

API ownership tensorflow/community#412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add calls to `reserve()` before populating vectors #51739

Add calls to `reserve()` before populating vectors #51739

SamuelMarks commented Aug 30, 2021

SamuelMarks commented Oct 3, 2021

joker-eph commented Oct 3, 2021

SamuelMarks commented Oct 4, 2021

joker-eph commented Oct 15, 2021

bhack commented Oct 15, 2021

james-martens commented Oct 15, 2021 via email

bhack commented Oct 15, 2021

james-martens commented Oct 15, 2021 via email

bhack commented Oct 15, 2021

joker-eph commented Oct 15, 2021

mihaimaruseac commented Oct 15, 2021

bhack commented Oct 15, 2021

bhack commented Oct 15, 2021 •

edited

bhack commented Oct 15, 2021 •

edited

bhack commented Oct 16, 2021 •

edited

mihaimaruseac commented Oct 17, 2021

SamuelMarks commented Oct 17, 2021

bhack commented Oct 18, 2021

SamuelMarks commented Oct 18, 2021

bhack commented Oct 18, 2021 •

edited

mihaimaruseac commented Oct 18, 2021

bhack commented Oct 18, 2021

joker-eph commented Oct 18, 2021

bhack commented Oct 18, 2021 •

edited

mihaimaruseac commented Oct 18, 2021

bhack commented Oct 18, 2021

mihaimaruseac commented Oct 21, 2021

SamuelMarks commented Oct 22, 2021

Add calls to reserve() before populating vectors #51739

Add calls to reserve() before populating vectors #51739

Conversation

SamuelMarks commented Aug 30, 2021

SamuelMarks commented Oct 3, 2021

joker-eph commented Oct 3, 2021

SamuelMarks commented Oct 4, 2021

joker-eph commented Oct 15, 2021

bhack commented Oct 15, 2021

james-martens commented Oct 15, 2021 via email

bhack commented Oct 15, 2021

james-martens commented Oct 15, 2021 via email

bhack commented Oct 15, 2021

joker-eph commented Oct 15, 2021

mihaimaruseac commented Oct 15, 2021

bhack commented Oct 15, 2021

bhack commented Oct 15, 2021 • edited

bhack commented Oct 15, 2021 • edited

bhack commented Oct 16, 2021 • edited

mihaimaruseac commented Oct 17, 2021

SamuelMarks commented Oct 17, 2021

bhack commented Oct 18, 2021

SamuelMarks commented Oct 18, 2021

bhack commented Oct 18, 2021 • edited

mihaimaruseac commented Oct 18, 2021

bhack commented Oct 18, 2021

joker-eph commented Oct 18, 2021

bhack commented Oct 18, 2021 • edited

mihaimaruseac commented Oct 18, 2021

bhack commented Oct 18, 2021

mihaimaruseac commented Oct 21, 2021

SamuelMarks commented Oct 22, 2021

Add calls to `reserve()` before populating vectors #51739

Add calls to `reserve()` before populating vectors #51739

bhack commented Oct 15, 2021 •

edited

bhack commented Oct 15, 2021 •

edited

bhack commented Oct 16, 2021 •

edited

bhack commented Oct 18, 2021 •

edited

bhack commented Oct 18, 2021 •

edited