New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.keras.layers.Softmax does not support masking? #27010
Comments
@chwang85 can you elaborate a bit because the output is correct i guess, which is Tensor of shape |
I think my question is clear enough... |
Sir, according to my understanding mask just skip those value which are equal to mask value. |
The default mask value is zero, so the Softmax should skip all values in my given case |
@chwang85 masking replace the value with 0 actually for example. t = tf.fill([2,2],5.0) """ output |
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Masking Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension # 1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). |
No one can explain why? |
@erikchwang: My notes here might help you understand masking better keras-team/keras#3086 (comment) |
So, can you explain the following question?
Since the default mask value of Masking is zero, Softmax should skip all values in the above case, and its behavior should be like sparse softmax. Therefore, I suppose the output should be all zeros, but that is not the case. |
@erikchwang: If you look at my second, third and fourth bullets in my comment, you will understand this. Yes - the output is not supposed to be zero all the time. "- Masking is not that complicated if we understand how the loss is computed with masking. For instance let us assume we have a sequence with length 256. From this sequence we have a masking with only 4 elements that are with masking of 1 (others are with masking 0). I thought the loss is computed as the average between these 4 elements. Guess what - it is not! The average loss will be divided by 256 instead. For this reason sometimes the loss will be extremely small (0.0something) if we have only few 1 elements and long sequence.
0 0 0 1 1 0 0 0 With this case, the three first elements with masking zero has output of 0. However, the three last zeros have output that is as the same as the output of the last element with masking 1." |
I did not find this relevant to my question. Please just explain why the outputs of Softmax are not full zeros when all the inputs are masked?
|
@erikchwang: Even if you mask, the softmax layer still treats everything as usual. For instance if you put this: [3.,1.,2.,2.,0.,0.] into the softmax, regardless of whether you do masking or not, the output is always: What masking does is that it notifies the loss computing that do not take into account the "neuron" that is masked, and that is it, no more no less. This is extremely useful, of course because we do padding all the time. In summary, dont' expect the output of masking is zeros, except LSTMs but in just a specific case like I shown. |
You made too many assumptions. I do not use LSTM, neither do I calculate loss, I just want to verify if Softmax support masking. Now it seems that the answer is NO. |
Sometimes we need more flexibility than just stacking keras layers... |
I think this is a perfectly valid question, which needs to be addressed. Can we reopen? Added on SO https://stackoverflow.com/questions/65745053/tensorflow-softmax-does-not-ignore-masking-value |
Since the default mask value of Masking is zero, Softmax should skip all values in the above case, and its behavior should be like sparse softmax. Therefore, I suppose the output should be all zeros, but that is not the case.
The text was updated successfully, but these errors were encountered: