MultiHeadAttention padding mask example #49237

dbouchabou · 2021-05-17T16:36:37Z

URL(s) with the issue:

Please provide a link to the documentation entry, for example:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention

Description of issue:

I try to implement tranformers layer but there is no example of using this MultiHeadAttention layer with padding mask.
It is poffible to get one ?

UsharaniPagadala · 2021-05-17T16:54:43Z

@dbouchabou
Could you please check ,and let us know if it helps.Thanks

dbouchabou · 2021-05-17T19:42:18Z

Thank you for the link. I've already checked it but the attention_mask is a boolean mask of shape [B, T, S] in the function tf.keras.layers.MultiHeadAttention. In the link you shared the shape is different [batch_size, 1, 1, seq_len].

I don't understand how to create a mask with shape [B, T, S]. I suppose B is for batch_size T for Target len and S for source len. But how and where place 1 and 0 in the mask ?

bhack · 2021-05-17T19:45:24Z

Can you check https://www.tensorflow.org/tutorials/text/transformer ?

dbouchabou · 2021-05-17T22:45:45Z

@bhack thank but it did not help me because il this tutorial the mask shape is not the same as the native multi head attention of TF 2.5

mask shape in tutorial: [batch_size, 1, 1, seq_len]
mask shape in tf.keras.layers.MultiHeadAttention: [B, T, S]

I don't understand how can I create the mask with the shape [B, T, S] ?

bhack · 2021-05-17T22:53:58Z

If you want to use the Keras one you can see Mask creation in test_masked_ tests in https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/keras/layers/multi_head_attention_test.py

dbouchabou · 2021-05-17T23:14:42Z

@bhack thanks for this pointer I understand better. But I still one problem. The example in your link use random mask. How can I create a mask for a multi head self attention with the Keras implementation for this sequence for exemple:

seq= [A,B,C,D,0,0,0,0,0,0]

if it self attention the source and the target are the same sequence.
T = [A,B,C,D,0,0,0,0,0,0]
S = [A,B,C,D,0,0,0,0,0,0]

the mask should be [1,1,1,1,0,0,0,0,0,0] but its not respect the [B, T, S] shape.
I don't know if the mask must be transform like this:

[[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
....
[1,1,1,1,0,0,0,0,0,0] ]

or like this:

[[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0]
...
[0,0,0,0,0,0,0,0,0,0] ]

dbouchabou · 2021-05-17T23:38:00Z

is it a correct function to create a padding mask for the Keras multi head attention implementation ?

def create_padding_mask(seq):
    att_mask = []
    seq_length = seq.shape[1]
    with tqdm(total=seq.shape[0]+1, desc='Create padding mask') as pbar:
        masks = tf.cast(tf.math.equal(seq, 0), tf.float32)
        pbar.update(1)

        
        masks = 1 - masks


        for m in masks:
            am = np.ones((seq_length,seq_length))

            m1 = np.expand_dims(m, axis=0)
            m2 = np.expand_dims(m, axis=1)


            am = am*m1
            am = am*m2
            att_mask.append(am)

            pbar.update(1)


    att_mask = np.array(att_mask)

    return att_mask  # (batch_size, seq_len, seq_len)

bhack · 2021-05-17T23:48:10Z

Ok as it was already semi-requested in #45854 probably we could add something in the documentation or refactor the tutorial with the Keras layer.

/cc @MarkDaoust to evaluate this.

dbouchabou · 2021-05-18T06:06:52Z

Thanks for this answer. I think it could be a good thing to add a new exemple with the Keras implementation in addition of the current one transformer implementation with Tensorflow .

Maybe in future Tensorflow versions you could provide Tranformers Encoder and Decoder as Keras layers and also a full transformer Keras implementation ?

UsharaniPagadala · 2021-05-18T06:20:05Z

@dbouchabou
Could you please confirm if the issue still persist.Thanks

dbouchabou · 2021-05-24T13:34:12Z

Issue persists if no official documentation or example is provide

MarkDaoust · 2021-05-24T16:32:53Z

I've got half a solution being submitted (address this problem for the nmt_with_attention tutorial instead of the transformers tutorial)

The transformers tutorial should be updated to use MultiHeadAttention, and a lot of the same tricks as the new nmt_with_attention (image_captioning too.).

For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 370250185

For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 375597559

dbouchabou · 2021-10-23T14:38:42Z

Is it possible to update the transformer tutorial with the MultiHeadAttention function ?

MarkDaoust · 2021-10-23T15:06:32Z

This is the right thing to do, It's just a matter of finding time.

tilakrayal · 2023-03-05T10:02:16Z

@dbouchabou,
The related PR tensorflow/text#626 was merged and Updated the nmt_with_attention. Could you please take a look at this official doc link for the reference.
https://www.tensorflow.org/text/tutorials/nmt_with_attention
https://www.tensorflow.org/text/tutorials/transformer

MarkDaoust · 2023-03-06T17:00:46Z

Yes. The "Used in" links at the bottom f the api page (https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention) totally have this covered now.

We'll be adding links to keras-nlp soon as well.

google-ml-butler · 2023-03-06T17:00:50Z

Are you satisfied with the resolution of your issue?
Yes
No

alifare · 2023-08-03T09:19:48Z

T = [A,B,C,D,0,0,0,0,0,0] S = [A,B,C,D,0,0,0,0,0,0]

the mask should be [1,1,1,1,0,0,0,0,0,0] but its not respect the [B, T, S] shape. I don't know if the mask must be transform like this:

[[1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] .... [1,1,1,1,0,0,0,0,0,0] ]

or like this:

[[1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,0] ... [0,0,0,0,0,0,0,0,0,0] ]

Hi dbouchabou, have you got an answer to your question?

google-ml-butler bot assigned UsharaniPagadala May 17, 2021

UsharaniPagadala added comp:apis Highlevel API related issues comp:keras Keras related issues labels May 17, 2021

UsharaniPagadala added stat:awaiting response Status - Awaiting response from author type:docs-bug Document issues labels May 17, 2021

UsharaniPagadala assigned jvishnuvardhan and unassigned UsharaniPagadala May 24, 2021

UsharaniPagadala removed the stat:awaiting response Status - Awaiting response from author label May 24, 2021

jvishnuvardhan assigned MarkDaoust and unassigned jvishnuvardhan May 24, 2021

tf-text-github-robot mentioned this issue May 25, 2021

Update nmt_with_attention tensorflow/text#626

Merged

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 5, 2023

MarkDaoust closed this as completed Mar 6, 2023

tilakrayal self-assigned this Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiHeadAttention padding mask example #49237

MultiHeadAttention padding mask example #49237

dbouchabou commented May 17, 2021 •

edited

UsharaniPagadala commented May 17, 2021

dbouchabou commented May 17, 2021 •

edited

bhack commented May 17, 2021

dbouchabou commented May 17, 2021

bhack commented May 17, 2021 •

edited

dbouchabou commented May 17, 2021 •

edited

dbouchabou commented May 17, 2021 •

edited

bhack commented May 17, 2021

dbouchabou commented May 18, 2021

UsharaniPagadala commented May 18, 2021

dbouchabou commented May 24, 2021

MarkDaoust commented May 24, 2021

dbouchabou commented Oct 23, 2021

MarkDaoust commented Oct 23, 2021

tilakrayal commented Mar 5, 2023

MarkDaoust commented Mar 6, 2023

google-ml-butler bot commented Mar 6, 2023

alifare commented Aug 3, 2023

MultiHeadAttention padding mask example #49237

MultiHeadAttention padding mask example #49237

Comments

dbouchabou commented May 17, 2021 • edited

URL(s) with the issue:

Description of issue:

UsharaniPagadala commented May 17, 2021

dbouchabou commented May 17, 2021 • edited

bhack commented May 17, 2021

dbouchabou commented May 17, 2021

bhack commented May 17, 2021 • edited

dbouchabou commented May 17, 2021 • edited

dbouchabou commented May 17, 2021 • edited

bhack commented May 17, 2021

dbouchabou commented May 18, 2021

UsharaniPagadala commented May 18, 2021

dbouchabou commented May 24, 2021

MarkDaoust commented May 24, 2021

dbouchabou commented Oct 23, 2021

MarkDaoust commented Oct 23, 2021

tilakrayal commented Mar 5, 2023

MarkDaoust commented Mar 6, 2023

google-ml-butler bot commented Mar 6, 2023

alifare commented Aug 3, 2023

dbouchabou commented May 17, 2021 •

edited

dbouchabou commented May 17, 2021 •

edited

bhack commented May 17, 2021 •

edited

dbouchabou commented May 17, 2021 •

edited

dbouchabou commented May 17, 2021 •

edited