[Feat]Add the features for expanding and shrinking the number of tables in distributed training by independently saving files. #305

MoFHeka · 2023-03-09T14:42:24Z

Description

Add the features for expanding and shrinking the number of tables in distributed training by independently saving files.

Also improve the performance of CPU table by using std::copy_n.

Also make genarating _DEFAULT_CUDA_COMPUTE_CAPABILITIES more compatible and concise in build_deps/toolchains/gpu/cuda_configure.bzl.

Also compatible with TF 2.9, which would pass parameter validate_shape to _init_from_args.

Also fix RedisTableOfTensors Node missing user-defined name.

Also fix problem with the parameter 'checkpoint' passing not working when using DE BasicEmbedding.

Also compatible with 'find_namespace_packages' when using setuptools, because 'find_packages' has been deprecated.

Type of change

Checklist:

I've properly formatted my code according to the guidelines
- By running yapf
- By running clang-format
This PR addresses an already submitted issue for TensorFlow Recommenders-Addons
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works

How Has This Been Tested?

Read the doc(docs/api_docs/tfra/dynamic_embedding/FileSystemSaver.md) and run new tests.

…ible and concise.

…r=True in LoadFromFileSystem function.

… a coredump bug.

…hen using DE BasicEmbedding.

…les in distributed training by independently saving files. User would be able to use DE filesystem KV files without any code changing. Just simply use savedmodel/checkpoint API to save and restore DE parameters. A better implementation for TFRA training in Horovod.

…attribute 'BatchNormalization'"

…pe to _init_from_args [fix] Compatible with TF 2.9 function read_value_no_copy.

…, because 'find_packages' has been deprecated.

rhdong

LGTM

MoFHeka added 4 commits March 9, 2023 22:40

[feat] make genarating _DEFAULT_CUDA_COMPUTE_CAPABILITIES more compat…

301001a

…ible and concise.

[fix] Fix mistakenly loading embedding slot files when load_entire_di…

514318b

…r=True in LoadFromFileSystem function.

[feat] Improve the performance of CPU table by using std::copy_n. Fix…

d85f82b

… a coredump bug.

[fix] Fix problem with the parameter checkpoint passing not working w…

bd5c013

…hen using DE BasicEmbedding.

MoFHeka requested a review from rhdong as a code owner March 9, 2023 14:42

MoFHeka requested a review from Lifann March 9, 2023 14:42

MoFHeka added 5 commits March 10, 2023 12:09

[fix] CI throw error "module 'tensorflow.python.keras.layers' has no …

2acf683

…attribute 'BatchNormalization'"

[fix] Compatible with TF 2.9, which would pass parameter validate_sha…

963deb5

…pe to _init_from_args [fix] Compatible with TF 2.9 function read_value_no_copy.

[fix] Fix RedisTableOfTensors Node missing user-defined name.

efc8152

[fix] Compatible with 'find_namespace_packages' when using setuptools…

f553d43

…, because 'find_packages' has been deprecated.

MoFHeka force-pushed the master-dev branch from bdee030 to f553d43 Compare March 10, 2023 04:10

rhdong approved these changes Mar 10, 2023

View reviewed changes

rhdong merged commit 373c729 into tensorflow:master Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]Add the features for expanding and shrinking the number of tables in distributed training by independently saving files. #305

[Feat]Add the features for expanding and shrinking the number of tables in distributed training by independently saving files. #305

MoFHeka commented Mar 9, 2023

rhdong left a comment

[Feat]Add the features for expanding and shrinking the number of tables in distributed training by independently saving files. #305

[Feat]Add the features for expanding and shrinking the number of tables in distributed training by independently saving files. #305

Conversation

MoFHeka commented Mar 9, 2023

Description

Type of change

Checklist:

How Has This Been Tested?

rhdong left a comment

Choose a reason for hiding this comment