New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiHeadAttention padding mask example #49237
Comments
@dbouchabou |
Thank you for the link. I've already checked it but the attention_mask is a boolean mask of shape [B, T, S] in the function tf.keras.layers.MultiHeadAttention. In the link you shared the shape is different [batch_size, 1, 1, seq_len]. I don't understand how to create a mask with shape [B, T, S]. I suppose B is for batch_size T for Target len and S for source len. But how and where place 1 and 0 in the mask ? |
Can you check https://www.tensorflow.org/tutorials/text/transformer ? |
@bhack thank but it did not help me because il this tutorial the mask shape is not the same as the native multi head attention of TF 2.5 mask shape in tutorial: [batch_size, 1, 1, seq_len] I don't understand how can I create the mask with the shape [B, T, S] ? |
If you want to use the Keras one you can see Mask creation in |
@bhack thanks for this pointer I understand better. But I still one problem. The example in your link use random mask. How can I create a mask for a multi head self attention with the Keras implementation for this sequence for exemple: seq= [A,B,C,D,0,0,0,0,0,0] if it self attention the source and the target are the same sequence. the mask should be [1,1,1,1,0,0,0,0,0,0] but its not respect the [B, T, S] shape. [[1,1,1,1,0,0,0,0,0,0] or like this: [[1,1,1,1,0,0,0,0,0,0] |
is it a correct function to create a padding mask for the Keras multi head attention implementation ?
|
Ok as it was already semi-requested in #45854 probably we could add something in the documentation or refactor the tutorial with the Keras layer. /cc @MarkDaoust to evaluate this. |
Thanks for this answer. I think it could be a good thing to add a new exemple with the Keras implementation in addition of the current one transformer implementation with Tensorflow . Maybe in future Tensorflow versions you could provide Tranformers Encoder and Decoder as Keras layers and also a full transformer Keras implementation ? |
@dbouchabou |
Issue persists if no official documentation or example is provide |
I've got half a solution being submitted (address this problem for the The transformers tutorial should be updated to use MultiHeadAttention, and a lot of the same tricks as the new |
For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 370250185
For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 370250185
For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 370250185
For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 375597559
Is it possible to update the transformer tutorial with the MultiHeadAttention function ? |
This is the right thing to do, It's just a matter of finding time. |
@dbouchabou, |
Yes. The "Used in" links at the bottom f the api page (https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention) totally have this covered now. We'll be adding links to keras-nlp soon as well. |
Hi dbouchabou, have you got an answer to your question? |
URL(s) with the issue:
Please provide a link to the documentation entry, for example:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention
Description of issue:
I try to implement tranformers layer but there is no example of using this MultiHeadAttention layer with padding mask.
It is poffible to get one ?
The text was updated successfully, but these errors were encountered: