Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiHeadAttention padding mask example #49237

Closed
dbouchabou opened this issue May 17, 2021 · 18 comments
Closed

MultiHeadAttention padding mask example #49237

dbouchabou opened this issue May 17, 2021 · 18 comments
Assignees
Labels
comp:apis Highlevel API related issues comp:keras Keras related issues stat:awaiting response Status - Awaiting response from author type:docs-bug Document issues

Comments

@dbouchabou
Copy link

dbouchabou commented May 17, 2021

URL(s) with the issue:

Please provide a link to the documentation entry, for example:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention

Description of issue:

I try to implement tranformers layer but there is no example of using this MultiHeadAttention layer with padding mask.
It is poffible to get one ?

@UsharaniPagadala UsharaniPagadala added comp:apis Highlevel API related issues comp:keras Keras related issues labels May 17, 2021
@UsharaniPagadala
Copy link

@dbouchabou
Could you please check ,and let us know if it helps.Thanks

@UsharaniPagadala UsharaniPagadala added stat:awaiting response Status - Awaiting response from author type:docs-bug Document issues labels May 17, 2021
@dbouchabou
Copy link
Author

dbouchabou commented May 17, 2021

Thank you for the link. I've already checked it but the attention_mask is a boolean mask of shape [B, T, S] in the function tf.keras.layers.MultiHeadAttention. In the link you shared the shape is different [batch_size, 1, 1, seq_len].

I don't understand how to create a mask with shape [B, T, S]. I suppose B is for batch_size T for Target len and S for source len. But how and where place 1 and 0 in the mask ?

@bhack
Copy link
Contributor

bhack commented May 17, 2021

@dbouchabou
Copy link
Author

@bhack thank but it did not help me because il this tutorial the mask shape is not the same as the native multi head attention of TF 2.5

mask shape in tutorial: [batch_size, 1, 1, seq_len]
mask shape in tf.keras.layers.MultiHeadAttention: [B, T, S]

I don't understand how can I create the mask with the shape [B, T, S] ?

@bhack
Copy link
Contributor

bhack commented May 17, 2021

If you want to use the Keras one you can see Mask creation in test_masked_ tests in https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/keras/layers/multi_head_attention_test.py

@dbouchabou
Copy link
Author

dbouchabou commented May 17, 2021

@bhack thanks for this pointer I understand better. But I still one problem. The example in your link use random mask. How can I create a mask for a multi head self attention with the Keras implementation for this sequence for exemple:

seq= [A,B,C,D,0,0,0,0,0,0]

if it self attention the source and the target are the same sequence.
T = [A,B,C,D,0,0,0,0,0,0]
S = [A,B,C,D,0,0,0,0,0,0]

the mask should be [1,1,1,1,0,0,0,0,0,0] but its not respect the [B, T, S] shape.
I don't know if the mask must be transform like this:

[[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
....
[1,1,1,1,0,0,0,0,0,0] ]

or like this:

[[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[1,1,1,1,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0]
...
[0,0,0,0,0,0,0,0,0,0] ]

@dbouchabou
Copy link
Author

dbouchabou commented May 17, 2021

is it a correct function to create a padding mask for the Keras multi head attention implementation ?

def create_padding_mask(seq):
    att_mask = []
    seq_length = seq.shape[1]
    with tqdm(total=seq.shape[0]+1, desc='Create padding mask') as pbar:
        masks = tf.cast(tf.math.equal(seq, 0), tf.float32)
        pbar.update(1)

        
        masks = 1 - masks


        for m in masks:
            am = np.ones((seq_length,seq_length))

            m1 = np.expand_dims(m, axis=0)
            m2 = np.expand_dims(m, axis=1)


            am = am*m1
            am = am*m2
            att_mask.append(am)

            pbar.update(1)


    att_mask = np.array(att_mask)

    return att_mask  # (batch_size, seq_len, seq_len)

@bhack
Copy link
Contributor

bhack commented May 17, 2021

Ok as it was already semi-requested in #45854 probably we could add something in the documentation or refactor the tutorial with the Keras layer.

/cc @MarkDaoust to evaluate this.

@dbouchabou
Copy link
Author

Thanks for this answer. I think it could be a good thing to add a new exemple with the Keras implementation in addition of the current one transformer implementation with Tensorflow .

Maybe in future Tensorflow versions you could provide Tranformers Encoder and Decoder as Keras layers and also a full transformer Keras implementation ?

@UsharaniPagadala
Copy link

@dbouchabou
Could you please confirm if the issue still persist.Thanks

@dbouchabou
Copy link
Author

Issue persists if no official documentation or example is provide

@UsharaniPagadala UsharaniPagadala removed the stat:awaiting response Status - Awaiting response from author label May 24, 2021
@MarkDaoust
Copy link
Member

I've got half a solution being submitted (address this problem for the nmt_with_attention tutorial instead of the transformers tutorial)

The transformers tutorial should be updated to use MultiHeadAttention, and a lot of the same tricks as the new nmt_with_attention (image_captioning too.).

tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 375597559
@dbouchabou
Copy link
Author

Is it possible to update the transformer tutorial with the MultiHeadAttention function ?

@MarkDaoust
Copy link
Member

This is the right thing to do, It's just a matter of finding time.

@tilakrayal
Copy link
Contributor

@dbouchabou,
The related PR tensorflow/text#626 was merged and Updated the nmt_with_attention. Could you please take a look at this official doc link for the reference.
https://www.tensorflow.org/text/tutorials/nmt_with_attention
https://www.tensorflow.org/text/tutorials/transformer

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 5, 2023
@MarkDaoust
Copy link
Member

Yes. The "Used in" links at the bottom f the api page (https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention) totally have this covered now.

We'll be adding links to keras-nlp soon as well.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@tilakrayal tilakrayal self-assigned this Mar 6, 2023
@alifare
Copy link

alifare commented Aug 3, 2023

T = [A,B,C,D,0,0,0,0,0,0] S = [A,B,C,D,0,0,0,0,0,0]

the mask should be [1,1,1,1,0,0,0,0,0,0] but its not respect the [B, T, S] shape. I don't know if the mask must be transform like this:

[[1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] .... [1,1,1,1,0,0,0,0,0,0] ]

or like this:

[[1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [1,1,1,1,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,0] ... [0,0,0,0,0,0,0,0,0,0] ]

Hi dbouchabou, have you got an answer to your question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues comp:keras Keras related issues stat:awaiting response Status - Awaiting response from author type:docs-bug Document issues
Projects
None yet
Development

No branches or pull requests

7 participants