Addding Redis as the backend for KV storage and changed the Python interface by calling the Config class and creating/passing the KvCreator class to fit the different backend. Also setting backend parameters with json file.
The implementation of TFRA backend relies on the Redis-plus-plus and hiredis component. How to install Redis-plus-plus please visit the following page: https://github.com/sewenew/redis-plus-plus
Redis-plus-plus is a Redis client, based on hiredis and written in C++11. It supports scritpting, pub/sub, pipeline, transaction, Redis Cluster, Redis Sentinel, connection pool, ACL, SSL and thread safety.
the method for parsing json file was created by James McLaughlin et al.
For performance reasons, TFRA-Redis uses the method of exponentiating the remainder of 2 to calculate the bucket division of key in embedding table. So never use a key with an artificial ending, for example 341445_1 and 341445_0. Put your custom edit to the key at the beginning, for example 1_341445 and 0_341445.
You can solve this problem by hashing the keys with more out-of-order results, such as SHA or MD5, rather than CRC32 / CRC16. But even with the addition of Intel's instruction set, it still affects performance too much.
Below is an example of a JSON file, along with comments on the corresponding project. Please use UTF-8 encoding. Attention! Json files cannot contain comments when actually used!
{
"redis_connection_mode": 2, // ClusterMode = 0, SentinelMode = 1, StandaloneMode = 2
"redis_master_name": "master",
// connection_options
"redis_host_ip": ["127.0.0.1"],
"redis_host_port": [6379],
"redis_user": "default",
"redis_password": "",
"redis_db": 0,
"redis_read_access_slave": False, // set True in infer or train mode if you like
"redis_connect_keep_alive": False, // keep TCP alive
"redis_connect_timeout": 1000, // milliseconds
"redis_socket_timeout": 1000, // milliseconds
// connection_pool_options
"redis_conn_pool_size": 20,
"redis_wait_timeout": 100000000, // milliseconds
"redis_connection_lifetime": 100, // minutes
// sentinel_connection_options
"redis_sentinel_user": "default",
"redis_sentinel_password": "",
"redis_sentinel_connect_timeout": 1000, // milliseconds
"redis_sentinel_socket_timeout": 1000, // milliseconds
// Below there is user-defined parameters in this custom op, not Redis setting parameters
"storage_slice_import": 2, // If storage_slice_import is not equal to storage_slice, rehash will happen. Equaling -1 means same as storage_slice.
"storage_slice": 2, // For deciding bucket number, which usually is how many Redis instance may be used in the trainning.
"using_hash_storage_slice":
False, // If True, IDs will be calculated hash(CRC32) value and then MOD to decide which bucket number they belong to. If False, only calculate the remainder.
"keys_sending_size": 1024, // Determines how many keys to send at a time for performance tuning
"using_md5_prefix_name": False, // 1=true, 0=false
"redis_hash_tags_hypodispersion":
True, // distribution of storag_slice will be hypodispersion in 16354 regardless cluster slot, but still depends on redis_hash_tags_import/runtime if they aren't empty.
"model_tag_import": "test", // model_tag_import for version and any other information from last time.
"redis_hash_tags_import": ["{6379}","{26379}"], // Deciding hash tag for every bucket from last time, Note that the hash tag must be wrapped in curly braces {}.
"model_tag_runtime": "test", // model_tag_runtime for version and any other information for now.
"redis_hash_tags_runtime": ["{3560}","{120}"], // Deciding hash tag for every bucket for now, Note that the hash tag must be wrapped in curly braces {}.
"expire_model_tag_in_seconds": 604800, // To eliminate unwanted model versions in Redis to ensure sufficient storage space. It will not take effect if it is less than zero.
"table_store_mode": 1, // Saving and restoring table into ensor in TF savedmodel variable file, table_store_mode = 0; Saving and restoring table into redis rdb file in model_lib_abs_dir, table_store_mode = 1; Saving and restoring nothing, keeping data in redis servers, table_store_mode = 2.
"model_lib_abs_dir": "/tmp/" // if table_store_mode equals 1, then it will try to save or resoter table from model_lib_abs_dir which has been mounted in system
}
If you creat a new model, then "model_tag_import" equals "model_tag_runtime". If you want to import the embedding table
from a old model, then keep then "model_tag_import" as the old tag, and assign a new "model_tag_runtime".
Also if you want to change the Redis hash tag for "storage_slice", you could assign "redis_hash_tags_import" and
"redis_hash_tags_runtime" in the same way. Or you could only change "redis_hash_tags_runtime" without modifying "model_tag_runtime".
Remember! the tag numbers in "redis_hash_tags_import" and "redis_hash_tags_runtime" arrays should equal to the "storage_slice"!
Attention! TFRA-Redis will run with "model_tag_runtime"!
When TFRA-Redis is loaded for the first time, it checks whether the current "model_tag_runtime" bucket partition in Redis is the same as the "storage_slice" parameter. If not, The buckets of "model_tag_runtime" are re-bucketized to the number of buckets set by the "storage_slice" parameter.
Generally, "storage_slice "should be equal to the number of nodes in the Redis cluster, but you can still change this parameter to any other number. Also there is a table inside the program to generate the Redis hash tag sequentially. So for a particular "storage_slice" parameter, the target Redis node and slot number is fixed. Of course you can set it with the "redis_hash_tags_runtime" parameter by yourself rather than generated by the program.
By default, TFRA-Redis reads the JSON file pointed to by the path in the OP attribute redis_config_abs_dir_env, which is an environment variable. If the environment variable described in redis_config_ABS_dir_env does not exist or the path to which the value of the environment variable points does not have a corresponding JSON file, the next step is to look for the path of the JSON file pointed to in the TFRA_REDIS_CONFIG_PATH environment variable.The last step will read the json file path configured in the Python operator attribute redis_config_abs_dir.
So when you deploy inference, you can set the environment variable TFRA_REDIS_CONFIG_PATH to configure the redis service of all model on the inference side. For example, "$export TFRA_REDIS_CONFIG_PATH=/tem/redis.config".
Or to specify a particular Redis service for a particular model in a container, you can agree on a common environment variable redis_config_abs_dir_env, both online and offline, from which to read the path to the config file.
For example, set "redis_config_abs_dir_env="model1_tfra_redis_config_path", and then
"$export model1_tfra_redis_config_path=/tem/redis_offline.config" in the offline,
"$export model1_tfra_redis_config_path=/tem/redis_online.config" in the online
Besides, the default value of redis_config_abs_dir_env is "TFRA_REDIS_CONFIG_PATH".
The following Python code describes how you should introduce the Redis backend for training under the TFRA framework:
redis_config1=tfra.dynamic_embedding.RedisTableConfig(
redis_config_abs_dir_env="model1_tfra_redis_config_path",
redis_config_abs_dir=“/tmp/test/”
)
redis_creator1=tfra.dynamic_embedding.RedisTableCreator(redis_config1)
self.user_embeddings = tfra.dynamic_embedding.get_variable(
name="user_dynamic_embeddings",
dim=self.embedding_size,
initializer=tf.keras.initializers.RandomNormal(-1, 1),
checkpoint=False,
KVCreator=redis_creator1
)
or you can do it like this:
redis_config_dir = os.path.join(tempfile.mkdtemp(dir=os.environ.get('TEST_TMPDIR')), "save_restore")
redis_config_path = os.path.join(tempfile.mkdtemp(prefix=redis_config_dir), "hash")
os.makedirs(redis_config_path)
redis_config_path = os.path.join(redis_config_path, "redis_config.json")
redis_config_params = {
"redis_host_ip":["127.0.0.1"],
"redis_host_port":[6379],
"using_model_lib":False
}
with open(redis_config_path, 'w', encoding='utf-8') as f:
f.write(json.dumps(redis_config_params, indent=2, ensure_ascii=True))
redis_config = de.RedisTableConfig(
redis_config_abs_dir_env="model1_tfra_redis_config_path",
redis_config_abs_dir=redis_config_path
)
redis_creator=tfra.dynamic_embedding.RedisTableCreator(redis_config)
self.user_embeddings = tfra.dynamic_embedding.get_variable(
name="user_dynamic_embeddings",
dim=self.embedding_size,
initializer=tf.keras.initializers.RandomNormal(-1, 1),
checkpoint=False,
KVCreator= redis_creator
)