You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried writing a python program to save tf.keras.layers.TextVectorization to disk and load it with the answer of https://stackoverflow.com/questions/65103526/how-to-save-textvectorization-to-disk-in-tensorflow.
The TextVectorization layer built from saved configs outputs a vector with wrong length when the arg output_sequence_length is not None and output_mode='int'.
For example, if I set output_sequence_length= 10, and output_mode='int', it is expected that given a text, TextVectorization should output a vector with length of 10, see vectorizer and new_v2 in the code below.
However, if TextVectorization's arg output_mode='int' is set from saved configs, it doesn't output a vector with length of 10(actually it is 9, the real length of the sentence. It seems like output_sequence_length is not set successfully). See the object new_v1 in the code below.
The interesting thing is, I have compared from_disk['config']['output_mode'] and 'int', they equal to each other.
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle
# In[]
max_len = 10 # Sequence length to pad the outputs to.
text_dataset = tf.data.Dataset.from_tensor_slices([
"I like natural language processing",
"You like computer vision",
"I like computer games and computer science"])
# Fit a TextVectorization layer
VOCAB_SIZE = 10 # Maximum vocab size.
vectorizer = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode='int',
output_sequence_length=max_len
)
vectorizer.adapt(text_dataset.batch(64))
# In[]
#print(vectorizer.get_vocabulary())
#print(vectorizer.get_config())
#print(vectorizer.get_weights())
# In[]
# Pickle the config and weights
pickle.dump({'config': vectorizer.get_config(),
'weights': vectorizer.get_weights()}
, open("./models/tv_layer.pkl", "wb"))
# Later you can unpickle and use
# `config` to create object and
# `weights` to load the trained weights.
from_disk = pickle.load(open("./models/tv_layer.pkl", "rb"))
new_v1 = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode=from_disk['config']['output_mode'],
output_sequence_length=from_disk['config']['output_sequence_length'],
)
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v1.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v1.set_weights(from_disk['weights'])
new_v2 = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode='int',
output_sequence_length=from_disk['config']['output_sequence_length'],
)
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v2.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v2.set_weights(from_disk['weights'])
print ("*"*10)
# In[]
test_sentence="Jack likes computer scinece, computer games, and foreign language"
print(vectorizer(test_sentence))
print (new_v1(test_sentence))
print (new_v2(test_sentence))
print(from_disk['config']['output_mode']=='int')
@lankuohsing We see that the issue is posted in Keras repo and PR 15422 is merged for this issue .Please let us know if we can close this ticket here ?Thanks!
@lankuohsing We see that the issue is posted in Keras repo and PR 15422 is merged for this issue .Please let us know if we can close this ticket here ?Thanks!
@sushreebarsa The solution in that PR is useful! Thanks! You can close the issue.
I have tried writing a python program to save tf.keras.layers.TextVectorization to disk and load it with the answer of https://stackoverflow.com/questions/65103526/how-to-save-textvectorization-to-disk-in-tensorflow.
The TextVectorization layer built from saved configs outputs a vector with wrong length when the arg output_sequence_length is not None and output_mode='int'.
For example, if I set output_sequence_length= 10, and output_mode='int', it is expected that given a text, TextVectorization should output a vector with length of 10, see vectorizer and new_v2 in the code below.
However, if TextVectorization's arg output_mode='int' is set from saved configs, it doesn't output a vector with length of 10(actually it is 9, the real length of the sentence. It seems like output_sequence_length is not set successfully). See the object new_v1 in the code below.
The interesting thing is, I have compared from_disk['config']['output_mode'] and 'int', they equal to each other.
Here are the print() outputs:
Does anyone know why?
I have also raised a same issue as this in the repo of Keras keras-team/keras#15382
The text was updated successfully, but these errors were encountered: