Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinity loop when try to save model with input_signatures on funtion decorator #65256

Open
Chm-vinicius opened this issue Apr 8, 2024 · 3 comments
Assignees
Labels
comp:tf.function tf.function related issues TF 2.15 For issues related to 2.15.x type:bug Bug

Comments

@Chm-vinicius
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15.1

Custom code

Yes

OS platform and distribution

Linux; Windows

Mobile device

No response

Python version

3.9.5

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I have a custom BruteForce method, it's almost a copy of tfrs.layers.factorized_top_k.BruteForce differing on inputs as candidates are one variable input like the queries on original method. I was abble to save model and load without any issues, i can up the tensorflow serve as well, but when i call the served endpoint I receive the message "Serving signature name: "serving_default" not found in signature def", when i try to setup the input signatures on call function, be it list of tensors dict or dataset of tensors dict, the kernel get in infinity loop and dont output anything, the issue occurs on windows and on gcloud vertex AI workbranch with a linux os env.

On the code bellow candidaster and data_ds is a datasets of tensors dicts;
The model are two towers recommendation model were "self.model.query_model" and "self.model.candidate_model" are concatenation of embeddings and some normalizations

Standalone code to reproduce the issue

@tf.keras.saving.register_keras_serializable(package="MyLayers")
class BruteForce2(tf.keras.layers.Layer):
  """Brute force retrieval."""

  def __init__(self, model, k=5, **kwargs):
        super().__init__(**kwargs)
        self.model = model
        self.k = k
        self._k = k
        self._candidates = None
        
  def _compute_score(self, queries: tf.Tensor,
                     candidates: tf.Tensor) -> tf.Tensor:

        print(queries, candidates)
    
        return tf.matmul(queries, candidates, transpose_b=True)
  
  def _topk_ds(self, data):
    df = data.copy()
    df = {key: np.array(value)[:,tf.newaxis] for key, value in df.items()}
    ds = tf.data.Dataset.from_tensor_slices((dict(df)))
    ds.prefetch(data.size)

    return ds
  
  positions_inputs = {
                      'key_1': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_2': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_3': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_4': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_5': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_6': tf.TensorSpec(shape=(1,), dtype=tf.string), 
                      'key_7': tf.TensorSpec(shape=(1,), dtype=tf.string)
                      }
  
  @tf.function(input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
  def _encode(self, queries, raw_candidates, k):
    # data = pd.DataFrame(raw_candidates)
    # data.head()
    # returnment = self._topk_ds(data)
    return queries, raw_candidates, k
  
  # @tf.function(input_signature=[
  #     positions_inputs, 
  #     tf.TensorSpec(shape=[None], dtype=tf.int64),
  #     tf.TensorSpec(shape=(), dtype=tf.int64)
  #   ])
  @tf.function(input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
  def call(self, queries: Union[tf.Tensor, Dict[Text, tf.Tensor]], candidates_raw: List[tf.Tensor], k: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
    
    # queries, candidates_raw, k = self._encode(queries, candidates_raw, k)
    # candidates_raw = self._topk_ds(pd.DataFrame(candidates_raw))
    # candidates_raw = self._encode(candidates_raw)
    
    parse_one = pd.DataFrame.from_dict(candidates_raw).to_dict(orient='list')
    candidates_raw = tf.data.Dataset.from_tensor_slices(parse_one)
    
    candidates = tf.data.Dataset.zip(candidates_raw.batch(1).map(lambda x: x['experience.jobTitle']), 
                                           candidates_raw.batch(1).map(self.model.candidate_model)
                                          )
    
    spec = candidates.element_spec

    if isinstance(spec, tuple):
      identifiers_and_candidates = list(candidates)
      candidates = tf.concat(
          [embeddings for _, embeddings in identifiers_and_candidates],
          axis=0
      )
      identifiers = tf.concat(
          [identifiers for identifiers, _ in identifiers_and_candidates],
          axis=0
      )
    else:
      candidates = tf.concat(list(candidates), axis=0)
      identifiers = None
    
    # self._candidates = candidates

    if identifiers is None:
      identifiers = tf.range(candidates.shape[0])
    if tf.rank(candidates) != 2:
      raise ValueError(
          f"The candidates tensor must be 2D (got {candidates.shape}).")
    if candidates.shape[0] != identifiers.shape[0]:
      raise ValueError(
          "The candidates and identifiers tensors must have the same number of rows "
          f"(got {candidates.shape[0]} candidates rows and {identifiers.shape[0]} "
          "identifier rows). "
      )
    # We need any value that has the correct dtype.
    identifiers_initial_value = tf.zeros((), dtype=identifiers.dtype)
    self._identifiers = self.add_weight(
        name="identifiers",
        dtype=identifiers.dtype,
        shape=identifiers.shape,
        initializer=tf.keras.initializers.Constant(
            value=identifiers_initial_value),
        trainable=False)
    self._candidates = self.add_weight(
        name="candidates",
        dtype=candidates.dtype,
        shape=candidates.shape,
        initializer=tf.keras.initializers.Zeros(),
        trainable=False)
    self._identifiers.assign(identifiers)
    self._candidates.assign(candidates)
    # self._reset_tf_function_cache()
    

    k = k if k is not None else self._k

    if self._candidates is None:
      raise ValueError("The `index` method must be called first to "
                       "create the retrieval index.")

    if self.model.query_model is not None:
      queries = self.model.query_model(queries)

    scores = self._compute_score(queries, self._candidates)

    values, indices = tf.math.top_k(scores, k=k)

    return values, tf.gather(self._identifiers, indices)

# save model
custom_index = BruteForce2(final_model, k=5)

for l in data_ds.take(1):
    # scores, chose = custom_index(l, mapped_candidates, 2)

    # custom_index.call = tf.function(custom_index.call, input_signature=[positions_inputs, [candidaster.element_spec], tf.TensorSpec(shape=(), dtype=tf.int32)])
    # print(scores, chose)
    
    custom_index.model.task_retrieval = tfrs.tasks.Retrieval()
    
    signatures = {"serving_default": custom_index.call.get_concrete_function()}
    
    tf.saved_model.save(custom_index, "./serveModels/exportedMode19", signatures=signatures)

Relevant log output

No response

@Chm-vinicius
Copy link
Author

For simplify the bug reprodution follow the code where the inputs is two simple dict of tensors and one int tensor.
This code do the same as issue code and reproduce the error.

@tf.keras.saving.register_keras_serializable(package="MyLayers")

class BruteForce3(tf.keras.layers.Layer):

  def __init__(self, model, k=5, **kwargs):
      super().__init__(**kwargs)
      self.model = model
      self.k = k

  @tf.function(input_signature=[data_ds.element_spec, candidaster.element_spec, tf.TensorSpec([], dtype=tf.int32)])
  # @tf.function
  def call(self, queries, candidates, ks):
      
      standard_brute = tfrs.layers.factorized_top_k.BruteForce(self.model.query_model)
      
      candidates = [candidates]
      
      parse_one = pd.DataFrame.from_dict(candidates).to_dict(orient='list')
      candidates = tf.data.Dataset.from_tensor_slices(parse_one)
      
      standard_brute.index_from_dataset(tf.data.Dataset.zip(candidates.batch(2).map(lambda x: x['experience.jobTitle']), 
                                         candidates.batch(2).map(self.model.candidate_model)
                                        ))
      scores, ids = standard_brute(queries, ks)
      
      return scores, ids`

and save model:

`custom = BruteForce3(final_model)

for l in data_ds.take(1):
    scores, jobs = custom(l, cand[0], 2)
    
    print(scores, jobs)
    
    tf.saved_model.save(custom, "./serveModels/exportedMode36")`

when i save the model without function decorator and inputs_signatures i have the follow error on calling loaded model:
ValueError: Found zero restored functions for caller function.
and if i serve with tensorflow serving i have the follow response from server endpoint:
"error": "Serving signature name: \"serving_default\" not found in signature def"

@SuryanarayanaY
Copy link
Collaborator

Hi @Chm-vinicius ,

Could you please minimal code snippet for reproducing the issue? Thanks!

It seems the code contains tensorflow recommender layers which we are not supporting here at TF repo. I can see this is already reported in concerned repo.

If you can able to reproduce the issue with TF/Keras please submit the code snippet for same. Thanks!

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 18, 2024
@Chm-vinicius
Copy link
Author

Hi @SuryanarayanaY,

Thanks for your response!
Unfortunately due to NDA contract i cant share more than that, also we still in permantly contact with a google engineers to find and solve this issue.
When the solution are reached i certainly share her.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tf.function tf.function related issues TF 2.15 For issues related to 2.15.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants