Infinity loop when try to save model with input_signatures on funtion decorator #65256

Chm-vinicius · 2024-04-08T19:12:39Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15.1

Custom code

Yes

OS platform and distribution

Linux; Windows

Mobile device

No response

Python version

3.9.5

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I have a custom BruteForce method, it's almost a copy of tfrs.layers.factorized_top_k.BruteForce differing on inputs as candidates are one variable input like the queries on original method. I was abble to save model and load without any issues, i can up the tensorflow serve as well, but when i call the served endpoint I receive the message "Serving signature name: "serving_default" not found in signature def", when i try to setup the input signatures on call function, be it list of tensors dict or dataset of tensors dict, the kernel get in infinity loop and dont output anything, the issue occurs on windows and on gcloud vertex AI workbranch with a linux os env.

On the code bellow candidaster and data_ds is a datasets of tensors dicts;
The model are two towers recommendation model were "self.model.query_model" and "self.model.candidate_model" are concatenation of embeddings and some normalizations

Standalone code to reproduce the issue

@tf.keras.saving.register_keras_serializable(package="MyLayers")
class BruteForce2(tf.keras.layers.Layer):
  """Brute force retrieval."""

  def __init__(self, model, k=5, **kwargs):
        super().__init__(**kwargs)
        self.model = model
        self.k = k
        self._k = k
        self._candidates = None
        
  def _compute_score(self, queries: tf.Tensor,
                     candidates: tf.Tensor) -> tf.Tensor:

        print(queries, candidates)
    
        return tf.matmul(queries, candidates, transpose_b=True)
  
  def _topk_ds(self, data):
    df = data.copy()
    df = {key: np.array(value)[:,tf.newaxis] for key, value in df.items()}
    ds = tf.data.Dataset.from_tensor_slices((dict(df)))
    ds.prefetch(data.size)

    return ds
  
  positions_inputs = {
                      'key_1': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_2': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_3': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_4': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_5': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_6': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_7': tf.TensorSpec(shape=(1,), dtype=tf.string)
                      }
  
  @tf.function(input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
  def _encode(self, queries, raw_candidates, k):
    # data = pd.DataFrame(raw_candidates)
    # data.head()
    # returnment = self._topk_ds(data)
    return queries, raw_candidates, k
  
  # @tf.function(input_signature=[
  #     positions_inputs, 
  #     tf.TensorSpec(shape=[None], dtype=tf.int64),
  #     tf.TensorSpec(shape=(), dtype=tf.int64)
  #   ])
  @tf.function(input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
  def call(self, queries: Union[tf.Tensor, Dict[Text, tf.Tensor]], candidates_raw: List[tf.Tensor], k: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
    
    # queries, candidates_raw, k = self._encode(queries, candidates_raw, k)
    # candidates_raw = self._topk_ds(pd.DataFrame(candidates_raw))
    # candidates_raw = self._encode(candidates_raw)
    
    parse_one = pd.DataFrame.from_dict(candidates_raw).to_dict(orient='list')
    candidates_raw = tf.data.Dataset.from_tensor_slices(parse_one)
    
    candidates = tf.data.Dataset.zip(candidates_raw.batch(1).map(lambda x: x['experience.jobTitle']), 
                                           candidates_raw.batch(1).map(self.model.candidate_model)
                                          )
    
    spec = candidates.element_spec

    if isinstance(spec, tuple):
      identifiers_and_candidates = list(candidates)
      candidates = tf.concat(
          [embeddings for _, embeddings in identifiers_and_candidates],
          axis=0
      )
      identifiers = tf.concat(
          [identifiers for identifiers, _ in identifiers_and_candidates],
          axis=0
      )
    else:
      candidates = tf.concat(list(candidates), axis=0)
      identifiers = None
    
    # self._candidates = candidates

    if identifiers is None:
      identifiers = tf.range(candidates.shape[0])
    if tf.rank(candidates) != 2:
      raise ValueError(
          f"The candidates tensor must be 2D (got {candidates.shape}).")
    if candidates.shape[0] != identifiers.shape[0]:
      raise ValueError(
          "The candidates and identifiers tensors must have the same number of rows "
          f"(got {candidates.shape[0]} candidates rows and {identifiers.shape[0]} "
          "identifier rows). "
      )
    # We need any value that has the correct dtype.
    identifiers_initial_value = tf.zeros((), dtype=identifiers.dtype)
    self._identifiers = self.add_weight(
        name="identifiers",
        dtype=identifiers.dtype,
        shape=identifiers.shape,
        initializer=tf.keras.initializers.Constant(
            value=identifiers_initial_value),
        trainable=False)
    self._candidates = self.add_weight(
        name="candidates",
        dtype=candidates.dtype,
        shape=candidates.shape,
        initializer=tf.keras.initializers.Zeros(),
        trainable=False)
    self._identifiers.assign(identifiers)
    self._candidates.assign(candidates)
    # self._reset_tf_function_cache()
    

    k = k if k is not None else self._k

    if self._candidates is None:
      raise ValueError("The `index` method must be called first to "
                       "create the retrieval index.")

    if self.model.query_model is not None:
      queries = self.model.query_model(queries)

    scores = self._compute_score(queries, self._candidates)

    values, indices = tf.math.top_k(scores, k=k)

    return values, tf.gather(self._identifiers, indices)

# save model
custom_index = BruteForce2(final_model, k=5)

for l in data_ds.take(1):
    # scores, chose = custom_index(l, mapped_candidates, 2)

    # custom_index.call = tf.function(custom_index.call, input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
    # print(scores, chose)
    
    custom_index.model.task_retrieval = tfrs.tasks.Retrieval()
    
    signatures = {"serving_default": custom_index.call.get_concrete_function()}
    
    tf.saved_model.save(custom_index, "./serveModels/exportedMode19", signatures=signatures)

Relevant log output

No response

Chm-vinicius · 2024-04-12T14:06:20Z

For simplify the bug reprodution follow the code where the inputs is two simple dict of tensors and one int tensor.
This code do the same as issue code and reproduce the error.

@tf.keras.saving.register_keras_serializable(package="MyLayers")

class BruteForce3(tf.keras.layers.Layer):

  def __init__(self, model, k=5, **kwargs):
      super().__init__(**kwargs)
      self.model = model
      self.k = k

  @tf.function(input_signature=[data_ds.element_spec, candidaster.element_spec, tf.TensorSpec([], dtype=tf.int32)])
  # @tf.function
  def call(self, queries, candidates, ks):
      
      standard_brute = tfrs.layers.factorized_top_k.BruteForce(self.model.query_model)
      
      candidates = [candidates]
      
      parse_one = pd.DataFrame.from_dict(candidates).to_dict(orient='list')
      candidates = tf.data.Dataset.from_tensor_slices(parse_one)
      
      standard_brute.index_from_dataset(tf.data.Dataset.zip(candidates.batch(2).map(lambda x: x['experience.jobTitle']), 
                                         candidates.batch(2).map(self.model.candidate_model)
                                        ))
      scores, ids = standard_brute(queries, ks)
      
      return scores, ids`

and save model:

`custom = BruteForce3(final_model)

for l in data_ds.take(1):
    scores, jobs = custom(l, cand[0], 2)
    
    print(scores, jobs)
    
    tf.saved_model.save(custom, "./serveModels/exportedMode36")`

when i save the model without function decorator and inputs_signatures i have the follow error on calling loaded model:
ValueError: Found zero restored functions for caller function.
and if i serve with tensorflow serving i have the follow response from server endpoint:
"error": "Serving signature name: \"serving_default\" not found in signature def"

SuryanarayanaY · 2024-04-18T09:21:03Z

Hi @Chm-vinicius ,

Could you please minimal code snippet for reproducing the issue? Thanks!

It seems the code contains tensorflow recommender layers which we are not supporting here at TF repo. I can see this is already reported in concerned repo.

If you can able to reproduce the issue with TF/Keras please submit the code snippet for same. Thanks!

Chm-vinicius · 2024-04-22T23:03:50Z

Hi @SuryanarayanaY,

Thanks for your response!
Unfortunately due to NDA contract i cant share more than that, also we still in permantly contact with a google engineers to find and solve this issue.
When the solution are reached i certainly share her.

google-ml-butler bot added the type:bug Bug label Apr 8, 2024

google-ml-butler bot assigned SuryanarayanaY Apr 8, 2024

Chm-vinicius mentioned this issue Apr 10, 2024

tfrs.metrics.FactorizedTopK tensorflow/recommenders#712

Open

SuryanarayanaY added TF 2.15 For issues related to 2.15.x comp:tf.function tf.function related issues labels Apr 10, 2024

Chm-vinicius mentioned this issue Apr 12, 2024

Unable to save multi-task recommender model tensorflow/recommenders#136

Open

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 18, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinity loop when try to save model with input_signatures on funtion decorator #65256

Infinity loop when try to save model with input_signatures on funtion decorator #65256

Chm-vinicius commented Apr 8, 2024

Chm-vinicius commented Apr 12, 2024

SuryanarayanaY commented Apr 18, 2024

Chm-vinicius commented Apr 22, 2024

Infinity loop when try to save model with input_signatures on funtion decorator #65256

Infinity loop when try to save model with input_signatures on funtion decorator #65256

Comments

Chm-vinicius commented Apr 8, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Chm-vinicius commented Apr 12, 2024

SuryanarayanaY commented Apr 18, 2024

Chm-vinicius commented Apr 22, 2024