Implement cpu refCounter, Modularize kernels Reshape and SpaceToBatchND #3659

lina128 · 2020-07-27T01:58:42Z

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

lina128

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-core/src/engine.ts, line 821 at r1 (raw file):

      // We added this to keep backend refCount to be in sync with engine
      // refCount.
      info.backend.decRef(a.dataId);

I added decRef to KernelBackend and added an empty method for each backend. The reason I couldn't implement it as a cpu backend method only is because, to check whether it is instanceof cpu backend here, I need to import cpu backend which will cause circular dependency. Any idea to avoid that? Otherwise, we may have to have decRef in KernelBackend.

tafsiri

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-core/src/engine.ts, line 821 at r1 (raw file):

Previously, lina128 (Na Li) wrote…

I added decRef to KernelBackend and added an empty method for each backend. The reason I couldn't implement it as a cpu backend method only is because, to check whether it is instanceof cpu backend here, I need to import cpu backend which will cause circular dependency. Any idea to avoid that? Otherwise, we may have to have decRef in KernelBackend.

Quick comment on this (before doing rest of review), in engine you could check if this.backendName === 'cpu' to control whether to call this.

lina128

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-core/src/engine.ts, line 821 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Quick comment on this (before doing rest of review), in engine you could check if this.backendName === 'cpu' to control whether to call this.

But I still need to add decRef to KernelBackend and implement an empty function for each backend that is not cpu, right?

tafsiri

Thanks Na.

Reviewed 16 of 19 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @lina128)

tfjs-backend-cpu/src/backend_cpu.ts, line 109 at r1 (raw file):

      tensorData.refCount++;
    }
    // Do not throw error when dataId not found for testing. Some tests

Could you say more about this? This suggests there are tests which use the backend and call something that will eventually call incRec but wouldn't have written that data to the backend storage? Do you have any pointers to which tests do this? I'm curious if using a specific flag would be more appropriate, or if those tests should explicitly handle this..

tfjs-backend-cpu/src/kernels/Reshape.ts, line 28 at r1 (raw file):

  const {x} = inputs;
  const {shape} = attrs;
  const cpuBackend = backend;

i don't think you need this here since backend is already of type MathBackendCPU?

tfjs-backend-cpu/src/kernels/Reshape_test.ts, line 25 at r1 (raw file):

import {describeWithFlags, ALL_ENVS} from '@tensorflow/tfjs-core/dist/jasmine_util';

describeWithFlags('Reshape.', ALL_ENVS, () => {

could you add a test that calls the reshape kernel on a tensor twice before disposing.

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 64 at r1 (raw file):

  const reshapeInputs: ReshapeInputs = {x: paddedX};
  const reshapeAttrs: ReshapeAttrs = {shape: reshapedPaddedShape};
  const paddedXReshaped = reshapeConfig.kernelFunc({

If you want to avoid the reshapeConfig. part, you can export the reshape function from the reshape kernel, any particular reason you went for this style?

tfjs-backend-wasm/src/backend_wasm.ts, line 60 at r1 (raw file):

  }

  decRef(dataId: DataId): void {}

Let's see if we can remove this.

tfjs-backend-webgl/src/backend_webgl.ts, line 270 at r1 (raw file):

  }

  decRef(dataId: DataId): void {}

Let's see if we can remove this.

tfjs-backend-webgpu/src/backend_webgpu.ts, line 1240 at r1 (raw file):

  }

  decRef(dataId: DataId): void {}

Same as above

tfjs-core/src/engine.ts, line 821 at r1 (raw file):

Previously, lina128 (Na Li) wrote…

But I still need to add decRef to KernelBackend and implement an empty function for each backend that is not cpu, right?

Why? They don't use the incRef mechanism you use so there is nothing to keep in sync. If they add it they should add a similar block here. But since this is temporary I don't think we want to add the public api of backend.

tfjs-core/src/util.ts, line 742 at r1 (raw file):

 * @param strides Tensor strides.
 */
export function coordToIndex(

Same as below with regard to renaming this.

tfjs-core/src/util.ts, line 764 at r1 (raw file):

 * @param strides Strides of tensor.
 */
export function indexToCoord(

Curious to know more about this rename? We use "locs" in a few places for the non flat index, including our TensorBuffer class. coord also has a different specific meaning in webgl that this doesn't overlap with.

tfjs-core/src/backends/backend.ts, line 116 at r1 (raw file):

    return notYetImplemented('write');
  }
  decRef(dataId: DataId): void {

Let's try and remove this.

tfjs-core/src/ops/space_to_batch_nd_test.ts, line 34 at r1 (raw file):

    expectArraysClose(await res.data(), [1, 2, 3, 4]);

    const afterResDataIds = tf.engine().backend.numDataIds();

Is this code left over from development? I think if you want to add a more specific cpu related test you could do that for the kernel, else I would try to generalise what this is testing to a test in engine. Else its not so clear why this is here specifically.

lina128

Thank you for the review Yannick! Please take another look.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-backend-cpu/src/backend_cpu.ts, line 109 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Could you say more about this? This suggests there are tests which use the backend and call something that will eventually call incRec but wouldn't have written that data to the backend storage? Do you have any pointers to which tests do this? I'm curious if using a specific flag would be more appropriate, or if those tests should explicitly handle this..

Removed. Throw error.

tfjs-backend-cpu/src/kernels/Reshape.ts, line 28 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

i don't think you need this here since backend is already of type MathBackendCPU?

Nice catch!

tfjs-backend-cpu/src/kernels/Reshape_test.ts, line 25 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

could you add a test that calls the reshape kernel on a tensor twice before disposing.

Done.

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 64 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

If you want to avoid the reshapeConfig. part, you can export the reshape function from the reshape kernel, any particular reason you went for this style?

Discussed offline, export the function.

tfjs-backend-wasm/src/backend_wasm.ts, line 60 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Let's see if we can remove this.

Done.

tfjs-backend-webgl/src/backend_webgl.ts, line 270 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Let's see if we can remove this.

Done.

tfjs-backend-webgpu/src/backend_webgpu.ts, line 1240 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Same as above

Done.

tfjs-core/src/engine.ts, line 821 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Why? They don't use the incRef mechanism you use so there is nothing to keep in sync. If they add it they should add a similar block here. But since this is temporary I don't think we want to add the public api of backend.

Done.

tfjs-core/src/util.ts, line 742 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Same as below with regard to renaming this.

Reverted.

tfjs-core/src/util.ts, line 764 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Curious to know more about this rename? We use "locs" in a few places for the non flat index, including our TensorBuffer class. coord also has a different specific meaning in webgl that this doesn't overlap with.

Reverted. I just thought coord is more readable without realizing that this has specific meaning in webgl.

tfjs-core/src/backends/backend.ts, line 116 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Let's try and remove this.

Done.

tfjs-core/src/ops/space_to_batch_nd_test.ts, line 34 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Is this code left over from development? I think if you want to add a more specific cpu related test you could do that for the kernel, else I would try to generalise what this is testing to a test in engine. Else its not so clear why this is here specifically.

Done. Created a new test in cpu package.

annxingyuan

Thank you Na - this will be an amazing reference for me when I work on the webgl backend :)

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-backend-cpu/src/backend_cpu_test.ts, line 147 at r3 (raw file):

    expectArraysClose(await res.data(), 15);
    tf.dispose([a, b, res]);
    tf.removeBackend('my-cpu');

Does this test fail with the changes in this PR?

tfjs-backend-cpu/src/kernels/PadV2.ts, line 3 at r3 (raw file):

/**
 * @license
 * Copyright 2019 Google LLC. All Rights Reserved.

2020

tfjs-backend-cpu/src/kernels/Reshape_test.ts, line 28 at r3 (raw file):

  it('does not have memory leak.', async () => {
    const beforeDataIds = tf.engine().backend.numDataIds();

would it make sense to have these tests in core? or are you specifically testing the memory implications of calling runKernel as opposed to calling the reshape op directly?

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 3 at r3 (raw file):

/**
 * @license
 * Copyright 2019 Google LLC. All Rights Reserved.

2020

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 38 at r3 (raw file):

  assertNotComplex([x], 'spaceToBatchND');

  const prod = blockShape.reduce((a, b) => a * b);

you could use util.sizeFromShape here

tfjs-backend-cpu/src/kernels/SpaceToBatchND_test.ts, line 40 at r3 (raw file):

    // 1 input tensor and 1 result tensor
    expect(afterResDataIds).toEqual(initialDataIds + 2);

are you thinking we would add a test like this for every kernel with intermediate outputs? I feel like the engine checkKernelForMemLeak method takes care of this check, but maybe I'm missing something?

tfjs-core/src/engine.ts, line 826 at r3 (raw file):

      if (this.backendName === 'cpu') {
        // tslint:disable-next-line: no-any
        (info.backend as any).decRef(a.dataId);

just for my own understanding, are you saying that right now the cpu backend refCounter can get out of sync with engine because clone and cast fail to incRef on the cpu backend? does this mean that when we implement refCounters on other backends, we will either need to modularize the shallow-copy kernels all at once, or we will need to add more conditional logic here?

tfjs-core/src/backends/backend.ts, line 112 at r3 (raw file):

  disposeData(dataId: object): void {
    return notYetImplemented('disposeData');
  }

sorry if we discussed this already - but I remember at one point you had added incRef / decRef methods to the backend interface - could you remind me why you decided to remove them? we still want other backends to implement the same methods no?

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @lina128 and @tafsiri)

tfjs-core/src/engine.ts, line 826 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

just for my own understanding, are you saying that right now the cpu backend refCounter can get out of sync with engine because clone and cast fail to incRef on the cpu backend? does this mean that when we implement refCounters on other backends, we will either need to modularize the shallow-copy kernels all at once, or we will need to add more conditional logic here?

nevermind - i just read the section you added to the design doc - i understand now!

tafsiri

Looking good, I think just one main pending question about that test in cpu backend and whether we can do it in engine.

Reviewed 14 of 18 files at r3.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @lina128)

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 64 at r1 (raw file):

Previously, lina128 (Na Li) wrote…

Discussed offline, export the function.

Should the call here/import in this file change then?

tfjs-backend-cpu/src/kernels/SpaceToBatchND_test.ts, line 40 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

are you thinking we would add a test like this for every kernel with intermediate outputs? I feel like the engine checkKernelForMemLeak method takes care of this check, but maybe I'm missing something?

Maybe changing the name of this test to reflect that you are trying to check that the ref count didn't go too low inside spaceToBatchND would help? Since this is seems to be less about leaking memory but more about early disposal (if i remember your earlier description correctly)?

However I agree with Ann that it is worth documenting what this catches that the engine checkKernelForMemLeak method cannot, and that isn't covered in the op level test (e.g. if res gets disposed early the op tests will fail). It would be preferable to change that method if you find a discrepancy so that all kernels would get that checked. Maybe rather than calling res.dispose and t.dispose you can check their isDisposed property and assert it to be false (in engine).

tfjs-core/src/backends/backend.ts, line 112 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

sorry if we discussed this already - but I remember at one point you had added incRef / decRef methods to the backend interface - could you remind me why you decided to remove them? we still want other backends to implement the same methods no?

ultimately we don't want core to know/care how backends handle refcounting, so it shouldn't be part of the public interface (ideal state is that no one outside of the backend calls incRef and decRef). In the intermediate mix of unmodularized and modularized things we have to call decRef on (in here the CPU backend) once in engine (the place with a cast to any).

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-core/src/backends/backend.ts, line 112 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

ultimately we don't want core to know/care how backends handle refcounting, so it shouldn't be part of the public interface (ideal state is that no one outside of the backend calls incRef and decRef). In the intermediate mix of unmodularized and modularized things we have to call decRef on (in here the CPU backend) once in engine (the place with a cast to any).

i see! thanks :)

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

e2e/integration_tests/backends_test.ts, line 66 at r4 (raw file):

      await tfc.setBackend('webgl');
      const webglAfter = tfc.engine().backend.numDataIds();
      expect(webglAfter).toBe(0);

should we assert this equal to before, rather than 0, like you do with the CPU? same for the assertion at the bottom of the next test.

e2e/integration_tests/backends_test.ts, line 78 at r4 (raw file):

      const input = tfc.tensor2d([1, 1, 1, 1], [2, 2], 'float32');

      // input tensorData refCount = 2, tensorInfo refCount = 1.

iiuc, this comment refers to what happens after reshape is called? if so, do you think it would be clearer if the comment was placed below the line with reshape?

e2e/integration_tests/backends_test.ts, line 100 at r4 (raw file):

      await tfc.setBackend('cpu');
      const cpuAfter = tfc.engine().backend.numDataIds();
      expect(cpuAfter).toBe(0);

iiuc these tests will fail without the backend.data.get call you introduced to the reshape kernel right? they will fail because reshape will not find the dataid of its input?

tfjs-backend-cpu/src/backend_cpu.ts, line 166 at r4 (raw file):

  }

  disposeData(dataId: DataId, force?: boolean): void {

could you include a comment regarding when you might want to force delete

lina128

Hi Ann and Yannick, thank you for your detailed review! I addressed the comments, changed diposeData to its previous form and created a diposeDataSoft method that will only dispose if refCount < 1, added a e2e test (check custom_op_test.ts), we can add any hypothetical kernels in here, to help test refCounter idea, also refactored backends_test to check more intermediate memory states. TLDR after these changes; we won't have accidental early data delete issue, because data will only truly get disposed when tensorInfo.refCount < 1, so as long as tensorInfo.refCount >= tensorData.refCount, and there is actually no scenario where tensorInfo.refCount < tensorData.refCount, so we are safe. We won't have memory leak issue, because in engine, when tensorData gets disposed, it is the same as before, it's a force delete. All the tests pass now. Please take another look. Thank you for your help!

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

e2e/integration_tests/backends_test.ts, line 66 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

should we assert this equal to before, rather than 0, like you do with the CPU? same for the assertion at the bottom of the next test.

Right. Thank you for pointing this out Ann. Also refactored the test to check more in-between memory status. PTAL.

e2e/integration_tests/backends_test.ts, line 78 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

iiuc, this comment refers to what happens after reshape is called? if so, do you think it would be clearer if the comment was placed below the line with reshape?

Good idea. Done.

e2e/integration_tests/backends_test.ts, line 100 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

iiuc these tests will fail without the backend.data.get call you introduced to the reshape kernel right? they will fail because reshape will not find the dataid of its input?

Right. I ended up having the data.get in incRef, so reshape, clone, and other kernels that may call incRef can all be assured to have the data to increase reference.

tfjs-backend-cpu/src/backend_cpu.ts, line 166 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

could you include a comment regarding when you might want to force delete

Removed. Instead, disposeData keep its function before, i.e. force delete the data. And then introduced a disposeDataSoft that check refCount and only truly delete when refCount < 1.

tfjs-backend-cpu/src/backend_cpu_test.ts, line 147 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Does this test fail with the changes in this PR?

You mean the deleted test? It couldn't find reshape, because it is moved out of the cpu backend class. So we need to explicitly register the kernels here. Then the test should pass again. I just didn't find this test much useful, like why do we want to test this use case, don't we want to encourage always call setBackend with await? Or is there a reason we need this test?

tfjs-backend-cpu/src/kernels/PadV2.ts, line 3 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

2020

Done.

tfjs-backend-cpu/src/kernels/Reshape_test.ts, line 28 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

would it make sense to have these tests in core? or are you specifically testing the memory implications of calling runKernel as opposed to calling the reshape op directly?

Yes, I'm specifically testing the memory implications, Yannick suggests they be tested in each backends.

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 64 at r1 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Should the call here/import in this file change then?

Done. I also found additional benefit of using reshape rather than reshapeConfig, then the type check can check specifically ReshapeInputs and ReshapeAttrs instead of NamedTensorMap and NamedAttrMap. Also no need to cast result as TensorInfo because reshape function indicates whether it will return TensorInfo or TensorInfo[].

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 3 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

2020

Done.

tfjs-backend-cpu/src/kernels/SpaceToBatchND.ts, line 38 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

you could use util.sizeFromShape here

Done.

tfjs-backend-cpu/src/kernels/SpaceToBatchND_test.ts, line 40 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Maybe changing the name of this test to reflect that you are trying to check that the ref count didn't go too low inside spaceToBatchND would help? Since this is seems to be less about leaking memory but more about early disposal (if i remember your earlier description correctly)?

However I agree with Ann that it is worth documenting what this catches that the engine checkKernelForMemLeak method cannot, and that isn't covered in the op level test (e.g. if res gets disposed early the op tests will fail). It would be preferable to change that method if you find a discrepancy so that all kernels would get that checked. Maybe rather than calling res.dispose and t.dispose you can check their isDisposed property and assert it to be false (in engine).

This test was added during experimenting the refCounter and reshape idea to help me check intermediate states. Removed now.

lina128

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-backend-cpu/src/backend_cpu.ts, line 177 at r5 (raw file):

  }

  disposeDataSoft(dataId: DataId): void {

Any suggestion on naming?

tafsiri

LGTM modulo a few requested changes. I'd also wait for @annxingyuan review. We can also discuss my comments synchronously.

Reviewed 11 of 12 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

e2e/integration_tests/backends_test.ts, line 35 at r6 (raw file):

    it(`from webgl to cpu.`, async () => {
      // A backend with no refCounter.

i'd remove this because it shouldn't matter if there is a refcounter or not in the backend.

e2e/integration_tests/backends_test.ts, line 41 at r6 (raw file):

      const input = tfc.tensor2d([1, 1, 1, 1], [2, 2], 'float32');
      // input is stored in webgl backend. tensorInfo refCount = 1.

rather than have these as comments, see if you can assert them by getting the count from the engine. (you might have to ts-ignore some things). If we don't assert them (and these were just guides for writing the test) then maybe it is better to remove then as they reference an internal detail that might change and make the comment obsolete

e2e/integration_tests/backends_test.ts, line 55 at r6 (raw file):

      const inputReshaped2 = tfc.reshape(inputReshaped, [2, 2]);
      // input moved to cpu. tensorData refCount = 1.

In general I would remove comments about tensorData refcount from these tests, since that refCount is purely internal to the cpu backend. These tests should continue to be valid irrespective of how backends internally do refCounting.

e2e/integration_tests/backends_test.ts, line 60 at r6 (raw file):

      expect(cpuAfter).toEqual(cpuBefore + 1);
      // input tensorData refCount = 3, reshape increase by 1, then output

this comment is confusing (and doesn't seem to match the assertion above—maybe its not related to it?) If its about tensorData refcounts I would

e2e/integration_tests/backends_test.ts, line 66 at r6 (raw file):

      // Because input is moved to cpu, data should be deleted from webgl.
      expect(tfc.engine().backend.numDataIds()).toEqual(webglAfter - 1);

this is true before the line above (it should be true right after the call to reshape, could you move it up. You can use findBackend('webgl') to explicitly get the webgl backend even if its not active. then you can remove the line above and below.

e2e/integration_tests/backends_test.ts, line 96 at r6 (raw file):

      const input = tfc.tensor2d([1, 1, 1, 1], [2, 2], 'float32');
      // input is stored in cpu backend. tensorData refCount = 1, tensorInfo

similar comments as above. lets prefer assertions to comments.

e2e/integration_tests/backends_test.ts, line 121 at r6 (raw file):

      // Because input is moved to webgl, data should be deleted from cpu.
      expect(tfc.engine().backend.numDataIds()).toEqual(cpuAfter - 1);

same as above.

e2e/integration_tests/custom_op_test.ts, line 1 at r6 (raw file):

import '@tensorflow/tfjs-backend-cpu';

i think we need a different name for this file since its not primarily trying to test custom_ops? it is using them to test something else.

e2e/integration_tests/custom_op_test.ts, line 54 at r6 (raw file):

    ]);

    // A custom op that calls unmodularized kernels and modularized kernels.

something like this should be the name of the test suite, and then the name of the test can be does not leak memory. similar to above because nothing in this test pokes into the refCounter itself it really asserts something greater about memory manangement.

tfjs-backend-cpu/src/backend_cpu.ts, line 177 at r5 (raw file):

Previously, lina128 (Na Li) wrote…

Any suggestion on naming?

disposeIntermedateData (i suggest this because I want it to imply it should only be used on intermediate tensorInfos in kernels and if you are unable to create an intermediate tensor you should not call this method). In fact you could make it take a tensorInfo and call it disposeIntermediateTensorInfo. This might be clearer at the call site as well.

tfjs-backend-cpu/src/backend_cpu_test.ts, line 147 at r3 (raw file):

Previously, lina128 (Na Li) wrote…

You mean the deleted test? It couldn't find reshape, because it is moved out of the cpu backend class. So we need to explicitly register the kernels here. Then the test should pass again. I just didn't find this test much useful, like why do we want to test this use case, don't we want to encourage always call setBackend with await? Or is there a reason we need this test?

in the tests all kernels should be registered (by index.ts, or if that is not happening then setup test should do that). So i'm not sure why this would fail. I don't think you should remove this test. this backend does not have asynchronous initialization and we should know if there is something that introduces that and decide at that point if it is a bug or a feature.

tfjs-backend-webgl/src/backend_webgl.ts, line 2203 at r6 (raw file):

  reshape<R extends Rank>(x: Tensor, shape: ShapeMap[R]): Tensor<R> {
    const texData = this.texData.get(x.dataId);

remove this newline change.

tafsiri · 2020-08-07T18:41:31Z

@lina128 Really great work on this btw!

lina128

Reviewable status: complete! 2 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

e2e/integration_tests/backends_test.ts, line 35 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

i'd remove this because it shouldn't matter if there is a refcounter or not in the backend.

Done.

e2e/integration_tests/backends_test.ts, line 41 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

rather than have these as comments, see if you can assert them by getting the count from the engine. (you might have to ts-ignore some things). If we don't assert them (and these were just guides for writing the test) then maybe it is better to remove then as they reference an internal detail that might change and make the comment obsolete

Done.

e2e/integration_tests/backends_test.ts, line 55 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

In general I would remove comments about tensorData refcount from these tests, since that refCount is purely internal to the cpu backend. These tests should continue to be valid irrespective of how backends internally do refCounting.

Done.

e2e/integration_tests/backends_test.ts, line 60 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

this comment is confusing (and doesn't seem to match the assertion above—maybe its not related to it?) If its about tensorData refcounts I would

Done.

e2e/integration_tests/backends_test.ts, line 66 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

this is true before the line above (it should be true right after the call to reshape, could you move it up. You can use findBackend('webgl') to explicitly get the webgl backend even if its not active. then you can remove the line above and below.

Great suggestion, thank you!

e2e/integration_tests/backends_test.ts, line 96 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

similar comments as above. lets prefer assertions to comments.

Done.

e2e/integration_tests/custom_op_test.ts, line 1 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

i think we need a different name for this file since its not primarily trying to test custom_ops? it is using them to test something else.

Changed to memory_leak_test.ts

e2e/integration_tests/custom_op_test.ts, line 54 at r6 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

something like this should be the name of the test suite, and then the name of the test can be does not leak memory. similar to above because nothing in this test pokes into the refCounter itself it really asserts something greater about memory manangement.

Done.

tfjs-backend-cpu/src/backend_cpu.ts, line 177 at r5 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

disposeIntermedateData (i suggest this because I want it to imply it should only be used on intermediate tensorInfos in kernels and if you are unable to create an intermediate tensor you should not call this method). In fact you could make it take a tensorInfo and call it disposeIntermediateTensorInfo. This might be clearer at the call site as well.

Done.

googlebot added the cla: yes label Jul 27, 2020

lina128 force-pushed the space branch 3 times, most recently from 4b45150 to d339d4c Compare July 31, 2020 02:53

lina128 marked this pull request as ready for review July 31, 2020 05:04

lina128 requested review from annxingyuan and tafsiri July 31, 2020 05:04

lina128 changed the title ~~Space~~ Implement cpu refCounter, Modularize kernels Reshape and SpaceToBatchND Jul 31, 2020

lina128 commented Jul 31, 2020

View reviewed changes

tafsiri reviewed Jul 31, 2020

View reviewed changes

lina128 commented Jul 31, 2020

View reviewed changes

lina128 force-pushed the space branch from 861d7ce to 8193aed Compare July 31, 2020 17:35

tafsiri suggested changes Jul 31, 2020

View reviewed changes

lina128 force-pushed the space branch from 8193aed to 8f6de93 Compare August 4, 2020 00:04

lina128 commented Aug 4, 2020

View reviewed changes

annxingyuan reviewed Aug 4, 2020

View reviewed changes

tafsiri suggested changes Aug 4, 2020

View reviewed changes

annxingyuan reviewed Aug 4, 2020

View reviewed changes

lina128 force-pushed the space branch from 6c88d1a to 0a46f12 Compare August 5, 2020 06:05

annxingyuan reviewed Aug 5, 2020

View reviewed changes

lina128 force-pushed the space branch 2 times, most recently from 28f7078 to 97aa0da Compare August 6, 2020 19:41

lina128 commented Aug 6, 2020

View reviewed changes

tafsiri approved these changes Aug 7, 2020

View reviewed changes

annxingyuan approved these changes Aug 7, 2020

View reviewed changes

lina128 force-pushed the space branch from edc52b2 to d66785a Compare August 8, 2020 01:20

lina128 commented Aug 8, 2020

View reviewed changes

lina128 added 22 commits August 24, 2020 13:00

Resolve merge conflict.

6c9bd45

Modularize kernel spacetobatchnd.

476e19e

Modularize kernel spacetobatchnd.

b18d210

Modularize kernel spacetobatchnd.

6f288b2

Move decRef to the right place.

aa20f7e

Add memory test.

0342781

Clean up.

19ad1dd

Fix test.

56c986f

Fix test.

892c488

.

5917635

.

6af6244

.

6bd3eb4

.

6195555

.

13b2b60

Add move data logic.

1a8cbc1

Add custom op test.

243f23c

Refactor test.

28cbe2d

.

f698808

.

e8de897

Clean up.

737ac90

Remove decRef from engine.

9d1cec2

.

a77f655

lina128 force-pushed the space branch from c6aaee6 to a77f655 Compare August 24, 2020 20:03

Merge branch 'master' into space

a13a364

lina128 merged commit 2009703 into tensorflow:master Aug 24, 2020

lina128 deleted the space branch August 24, 2020 22:03

Implement cpu refCounter, Modularize kernels Reshape and SpaceToBatchND #3659

Implement cpu refCounter, Modularize kernels Reshape and SpaceToBatchND #3659

Uh oh!

Conversation

lina128 commented Jul 27, 2020 • edited by nsthorat Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

tafsiri left a comment

Choose a reason for hiding this comment

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

tafsiri left a comment

Choose a reason for hiding this comment

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

tafsiri left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

tafsiri left a comment

Choose a reason for hiding this comment

Uh oh!

tafsiri commented Aug 7, 2020

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lina128 commented Jul 27, 2020 •

edited by nsthorat

Loading