Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: with with strategy.scope(): BERT output loses it's shape #870

Closed
maifeeulasad opened this issue Dec 14, 2022 · 6 comments
Closed

Bug: with with strategy.scope(): BERT output loses it's shape #870

maifeeulasad opened this issue Dec 14, 2022 · 6 comments

Comments

@maifeeulasad
Copy link

What happened?

I was trying to use BERT hosted at:

And it should give me multiple outputs. And these shapes should look something like: (None, 128, 768). And sometimes it's working. But with with strategy.scope(): it's loosing it's shape. It's becoming (None, None, 768).

Relevant code

!pip install tensorflow-text==2.7.0

strategy = tf.distribute.MirroredStrategy()
print('Number of GPU: ' + str(strategy.num_replicas_in_sync)) # 1 or 2, shouldn't matter

import tensorflow_text as text
import tensorflow_hub as hub
# ... other tf imports....

NUM_CLASS=2

with strategy.scope():
    bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
    bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")


def get_model():
    text_input = Input(shape=(), dtype=tf.string, name='text')
    preprocessed_text = bert_preprocess(text_input)
    outputs = bert_encoder(preprocessed_text)
    
    output_sequence = outputs['sequence_output']
    x = Dense(NUM_CLASS,  activation='sigmoid')(output_sequence)

    model = Model(inputs=[text_input], outputs = [x])
    return model


optimizer = Adam()
model = get_model()
model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
model.summary() # <- look at the output 1
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model.png') # <- look at the figure 1


with strategy.scope():
    optimizer = Adam()
    model = get_model()
    model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
    
model.summary() # <- compare with output 1, it has already lost it's shape 
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model_scoped.png') # <- compare this figure too, for ease

Relevant log output

Model (without scope):

Model: "model_6"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_3 (KerasLayer)     multiple             109482241   ['keras_layer_2[6][0]',          
                                                                  'keras_layer_2[6][1]',          
                                                                  'keras_layer_2[6][2]']          
                                                                                                  
 dense_6 (Dense)                (None, 128, 2)       1538        ['keras_layer_3[6][14]']         
                                                                                                  
==================================================================================================
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
__________________________________________________________________________________________________




Model (WITH scope):

Model: "model_7"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_3 (KerasLayer)     multiple             109482241   ['keras_layer_2[7][0]',          
                                                                  'keras_layer_2[7][1]',          
                                                                  'keras_layer_2[7][2]']          
                                                                                                  
 dense_7 (Dense)                (None, None, 2)      1538        ['keras_layer_3[7][14]']         
                                                                                                  
==================================================================================================
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
__________________________________________________________________________________________________

tensorflow_hub Version

0.12.0 (latest stable release)

TensorFlow Version

2.7

Other libraries

tensorflow-text==2.7.0

Python Version

3.x

OS

Linux

@singhniraj08 singhniraj08 self-assigned this Dec 14, 2022
@singhniraj08
Copy link

@maifeeulasad,

I was able to replicate the same behaviour although this does not looks like a blocker as both model (model with strategy.scope() and model without scope) produces same output with same shape (1, 128, 2). Please find attached the gist.

Please let us know if this blocks you. Thank you!

@maifeeulasad
Copy link
Author

@singhniraj08 What if we put some layer after that, which requires shape from the previous layer, maybe Flatten or Conv2D, or Conv1D or any other layer?
I simply put Dense, this can be anything right. And then those None for sequence_length is a bit problematic.

Here is the notebook, you may find it helpful. Here: https://www.kaggle.com/code/maifeeulasad/tfhub-bert-with-scope/

@singhniraj08
Copy link

@maifeeulasad,

I tried the same setup with adding Conv1d and AveragePooling1D layer and it worked for me. Attached gist for reference.

The Flatten layer resulted in error for which suggested workaround is to use pooled output from bert encoder as shown below. This will result in flattened output from encoder.

output_sequence = outputs["pooled_output"]

@alenarepina, Can you please look into this issue where the output shape from bert encoder shows (None, None, 768) instead of (None, 128, 768) when using tf.distribute.MirroredStrategy() scope.

@maifeeulasad
Copy link
Author

@singhniraj08, here is a gist: https://colab.research.google.com/drive/1NLeVirYdVeHGit7QGsIvook6K_ezFtgB?usp=sharing

You aren't using the 2.7.0 version, in this version it breaks.

And maybe in some updated version, it works, but the thing is, it breaks more vital tf features, like plot_model.

@akhorlin
Copy link
Collaborator

It's best to forward this type of question to the main TensorFlow repo. On the tfhub.dev side, we are hosting models pre-trained by various publishers. The semantics of how code behaves under different distribution strategies is under the control of the TensorFlow library.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants