Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HashTable ops #4060

Merged
merged 18 commits into from
Oct 14, 2020
Merged

Implement HashTable ops #4060

merged 18 commits into from
Oct 14, 2020

Conversation

lina128
Copy link
Collaborator

@lina128 lina128 commented Oct 13, 2020

Background:
For saved model graphs that have hashtable or lookuptable operations, the underlying hashtable needs to be initialized before being used. From graph architecture, the initialization subgraph is detached from the inference subgraph, so that initialization doesn't happen in every inference. To support hashtable ops in TF.js, we made below changes:

  1. Extract the initialization subgraph and store its topology in modelInitializer field in model.json. See PR: [converter]Support table initializer in converter. #3958
  2. Load the initialization subgraph and execute it during model load. See PR: [converter] Add table initializer to graph execution. #3975.
  3. Add hashtable ops implementation. See this PR.

Changes in this PR:

  1. Add ResourceManager class, used to share model resources, such as HashTables. Currently, the resourceManager only stores hashTable maps, but it is easy to extend to store other resources. For the hashtable, there are two maps, hashTableNameToHandle and hashTableMap. hashTableNameToHandle is used to get the hashTable handle from node name. node name is used as inputs for a node, if any node needs hashTable as input, the name of the hashTable name will be in the inputs list. The step happens when an executor getParams for a node. hashTableMap is used to get the actual hashTable from the handle's id. This step happens when after getParams, a handle is found.
  2. Add HashTable class, which contains operations on hashTables, such as import and find.
  3. The HashTable receives keys and values as tensors, but will transform keys to primitive value to store. This allows O(1) lookup time. As a comparison, if keys are stored as tensors, lookup time is O(N), N = # of elements in the table. This is because Javascript doesn't have a mechanism to override hashing and equal operator. The downside of this approach is the key tensor values need to be read out, but the time cost should be minimal.
  4. Also fixes an error in e2e suite, graph_models => graph_model, to allow tests actually run in the convert_predict.ts file.
  5. Because of above fix, actually found a memory leak in loading graph with complex64 data, that is also fixed in this PR.
  6. Added a e2e test for a model that takes a key tensor, initializes a hashtable, and lookup the key tensor in the hashtable and outputs the result.

This change is Reviewable

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, thank you! just a couple minor comments.

Reviewable status: 0 of 1 approvals obtained (waiting on @lina128 and @pyu10055)


e2e/integration_tests/convert_predict.py, line 390 at r1 (raw file):

      keys_tensor = tf.constant([1, 2])
      vals_tensor = tf.constant([3, 4])
      input_tensor = tf.constant([1, 5])

this variable is not used?


tfjs-converter/python/tensorflowjs/op_list/hash_table.json, line 33 at r1 (raw file):

    "category": "hash_table",
    "inputs": [],
    "attrs": [

we are using the container attribute?


tfjs-converter/src/executor/hash_table.ts, line 67 at r1 (raw file):

   * @param values Values to store in the hashtable.
   */
  async import(keys: Tensor, values: Tensor): Promise<Tensor> {

curious if the keys dtype is string, what is return values from keys.data()?
Is it uint8Array? or string?


tfjs-converter/src/executor/resource_manager.ts, line 3 at r1 (raw file):

/**
 * @license
 * Copyright 2018 Google LLC. All Rights Reserved.

2020


tfjs-converter/src/executor/resource_manager.ts, line 28 at r1 (raw file):

      readonly hashTableMap: HashTableMap = {}) {}

  addHashTable(name: string, hashTable: HashTable) {

please add jsdoc for the public methods


tfjs-converter/src/operations/executors/hash_table_executor_test.ts, line 3 at r1 (raw file):

/**
 * @license
 * Copyright 2018 Google LLC. All Rights Reserved.

2020


tfjs-core/src/io/io_utils.ts, line 217 at r1 (raw file):

        const imageTensor = tensor(image, shape, 'float32');
        out[name] = complex(realTensor, imageTensor);
        realTensor.dispose();

is this change related?

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 30 of 30 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128 and @pyu10055)

Copy link
Collaborator Author

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055)


e2e/integration_tests/convert_predict.py, line 390 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

this variable is not used?

Removed.


tfjs-converter/python/tensorflowjs/op_list/hash_table.json, line 33 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

we are using the container attribute?

We are not using container attribute. In this case, should we still map it or skip it?


tfjs-converter/src/executor/hash_table.ts, line 67 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

curious if the keys dtype is string, what is return values from keys.data()?
Is it uint8Array? or string?

It returns string: https://github.com/tensorflow/tfjs/blob/master/tfjs-core/src/tensor.ts#L336
I can change the e2e test to use string as key to verify it.


tfjs-converter/src/executor/resource_manager.ts, line 3 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

2020

Done.


tfjs-converter/src/executor/resource_manager.ts, line 28 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

please add jsdoc for the public methods

Done.


tfjs-converter/src/operations/executors/hash_table_executor_test.ts, line 3 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

2020

Done.


tfjs-core/src/io/io_utils.ts, line 217 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

is this change related?

This is not related, but fixes a memory leak bug. See PR description #4 and #5.

Copy link
Collaborator Author

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review, Ping! I addressed the comments, please take another look.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @lina128 and @pyu10055)


tfjs-converter/python/tensorflowjs/op_list/hash_table.json, line 33 at r1 (raw file):

Previously, lina128 (Na Li) wrote…

We are not using container attribute. In this case, should we still map it or skip it?

Maybe just add for the sake of completeness, since some of the attribute we don't use are listed anyway.


tfjs-converter/src/executor/resource_manager.ts, line 60 at r2 (raw file):

  /**
   * Dispose `ResourceManager`, including its hashTables and tensors in them/

replace / at the end with .

Copy link
Collaborator Author

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @pyu10055)


tfjs-converter/python/tensorflowjs/op_list/hash_table.json, line 33 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

Maybe just add for the sake of completeness, since some of the attribute we don't use are listed anyway.

Ack.


tfjs-converter/src/executor/resource_manager.ts, line 60 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

replace / at the end with .

Done.

@lina128 lina128 merged commit 763a133 into tensorflow:master Oct 14, 2020
@lina128 lina128 deleted the temp2 branch October 14, 2020 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants