Bug: with `with strategy.scope():` BERT output loses it's shape #870

maifeeulasad · 2022-12-14T07:07:30Z

What happened?

I was trying to use BERT hosted at:

And it should give me multiple outputs. And these shapes should look something like: (None, 128, 768). And sometimes it's working. But with with strategy.scope(): it's loosing it's shape. It's becoming (None, None, 768).

Relevant code

!pip install tensorflow-text==2.7.0

strategy = tf.distribute.MirroredStrategy()
print('Number of GPU: ' + str(strategy.num_replicas_in_sync)) # 1 or 2, shouldn't matter

import tensorflow_text as text
import tensorflow_hub as hub
# ... other tf imports....

NUM_CLASS=2

with strategy.scope():
    bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
    bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")


def get_model():
    text_input = Input(shape=(), dtype=tf.string, name='text')
    preprocessed_text = bert_preprocess(text_input)
    outputs = bert_encoder(preprocessed_text)
    
    output_sequence = outputs['sequence_output']
    x = Dense(NUM_CLASS,  activation='sigmoid')(output_sequence)

    model = Model(inputs=[text_input], outputs = [x])
    return model


optimizer = Adam()
model = get_model()
model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
model.summary() # <- look at the output 1
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model.png') # <- look at the figure 1


with strategy.scope():
    optimizer = Adam()
    model = get_model()
    model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
    
model.summary() # <- compare with output 1, it has already lost it's shape 
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model_scoped.png') # <- compare this figure too, for ease

Relevant log output

Model (without scope):

Model: "model_6"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_3 (KerasLayer)     multiple             109482241   ['keras_layer_2[6][0]',          
                                                                  'keras_layer_2[6][1]',          
                                                                  'keras_layer_2[6][2]']          
                                                                                                  
 dense_6 (Dense)                (None, 128, 2)       1538        ['keras_layer_3[6][14]']         
                                                                                                  
==================================================================================================
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
__________________________________________________________________________________________________




Model (WITH scope):

Model: "model_7"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_3 (KerasLayer)     multiple             109482241   ['keras_layer_2[7][0]',          
                                                                  'keras_layer_2[7][1]',          
                                                                  'keras_layer_2[7][2]']          
                                                                                                  
 dense_7 (Dense)                (None, None, 2)      1538        ['keras_layer_3[7][14]']         
                                                                                                  
==================================================================================================
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
__________________________________________________________________________________________________

tensorflow_hub Version

0.12.0 (latest stable release)

TensorFlow Version

2.7

Other libraries

tensorflow-text==2.7.0

Python Version

3.x

OS

Linux

The text was updated successfully, but these errors were encountered:

singhniraj08 · 2022-12-15T07:21:22Z

@maifeeulasad,

I was able to replicate the same behaviour although this does not looks like a blocker as both model (model with strategy.scope() and model without scope) produces same output with same shape (1, 128, 2). Please find attached the gist.

Please let us know if this blocks you. Thank you!

maifeeulasad · 2022-12-15T07:37:33Z

@singhniraj08 What if we put some layer after that, which requires shape from the previous layer, maybe Flatten or Conv2D, or Conv1D or any other layer?
I simply put Dense, this can be anything right. And then those None for sequence_length is a bit problematic.

Here is the notebook, you may find it helpful. Here: https://www.kaggle.com/code/maifeeulasad/tfhub-bert-with-scope/

singhniraj08 · 2022-12-15T09:54:50Z

@maifeeulasad,

I tried the same setup with adding Conv1d and AveragePooling1D layer and it worked for me. Attached gist for reference.

The Flatten layer resulted in error for which suggested workaround is to use pooled output from bert encoder as shown below. This will result in flattened output from encoder.

output_sequence = outputs["pooled_output"]

@alenarepina, Can you please look into this issue where the output shape from bert encoder shows (None, None, 768) instead of (None, 128, 768) when using tf.distribute.MirroredStrategy() scope.

maifeeulasad · 2022-12-15T10:11:04Z

@singhniraj08, here is a gist: https://colab.research.google.com/drive/1NLeVirYdVeHGit7QGsIvook6K_ezFtgB?usp=sharing

You aren't using the 2.7.0 version, in this version it breaks.

And maybe in some updated version, it works, but the thing is, it breaks more vital tf features, like plot_model.

akhorlin · 2023-05-23T12:10:39Z

It's best to forward this type of question to the main TensorFlow repo. On the tfhub.dev side, we are hosting models pre-trained by various publishers. The semantics of how code behaves under different distribution strategies is under the control of the TensorFlow library.

google-ml-butler · 2023-05-23T12:10:41Z

Are you satisfied with the resolution of your issue?
Yes
No

maifeeulasad added the type:bug label Dec 14, 2022

singhniraj08 self-assigned this Dec 14, 2022

singhniraj08 added the stat:awaiting response label Dec 15, 2022

singhniraj08 assigned alenarepina Dec 15, 2022

singhniraj08 added stat:awaiting tensorflower and removed stat:awaiting response labels Dec 15, 2022

maifeeulasad mentioned this issue Dec 22, 2022

with with strategy.scope(): BERT output loses it's shape tensorflow/tensorflow#58981

Open

akhorlin closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: with `with strategy.scope():` BERT output loses it's shape #870

Bug: with `with strategy.scope():` BERT output loses it's shape #870

maifeeulasad commented Dec 14, 2022

singhniraj08 commented Dec 15, 2022

maifeeulasad commented Dec 15, 2022

singhniraj08 commented Dec 15, 2022

maifeeulasad commented Dec 15, 2022

akhorlin commented May 23, 2023

google-ml-butler bot commented May 23, 2023

Bug: with with strategy.scope(): BERT output loses it's shape #870

Bug: with with strategy.scope(): BERT output loses it's shape #870

Comments

maifeeulasad commented Dec 14, 2022

What happened?

Relevant code

Relevant log output

tensorflow_hub Version

TensorFlow Version

Other libraries

Python Version

OS

singhniraj08 commented Dec 15, 2022

maifeeulasad commented Dec 15, 2022

singhniraj08 commented Dec 15, 2022

maifeeulasad commented Dec 15, 2022

akhorlin commented May 23, 2023

google-ml-butler bot commented May 23, 2023

Bug: with `with strategy.scope():` BERT output loses it's shape #870

Bug: with `with strategy.scope():` BERT output loses it's shape #870