Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.keras.layers.Softmax does not support masking? #27010

Closed
erikchwang opened this issue Mar 22, 2019 · 16 comments
Closed

tf.keras.layers.Softmax does not support masking? #27010

erikchwang opened this issue Mar 22, 2019 · 16 comments
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues

Comments

@erikchwang
Copy link

erikchwang commented Mar 22, 2019

import tensorflow as tf
outputs = tf.keras.layers.Softmax().apply(
  tf.keras.layers.Masking().apply(
    tf.zeros([3,5,7])
  )
)

Since the default mask value of Masking is zero, Softmax should skip all values in the above case, and its behavior should be like sparse softmax. Therefore, I suppose the output should be all zeros, but that is not the case.

@Gurpreetsingh9465
Copy link
Contributor

@chwang85 can you elaborate a bit because the output is correct i guess, which is Tensor of shape [3,5,7] of value = 0.14285715 which is correct as shown here wiki

@erikchwang
Copy link
Author

I think my question is clear enough...
Maybe you need to know what is masking first...

@Gurpreetsingh9465
Copy link
Contributor

Sir, according to my understanding mask just skip those value which are equal to mask value.

@erikchwang
Copy link
Author

The default mask value is zero, so the Softmax should skip all values in my given case

@Gurpreetsingh9465
Copy link
Contributor

@chwang85 masking replace the value with 0 actually for example.
t = tf.fill([2,2],5.0)
m = tf.keras.layers.Masking(5)
print(m.apply(t))
""" output
tf.Tensor(
[[0. 0.]
[0. 0.]], shape=(2, 2), dtype=float32)
"""

t = tf.fill([2,2],5.0)
m = tf.keras.layers.Masking() # default value 0.0
print(m.apply(t))

""" output
[[5. 5.]
[5. 5.]], shape=(2, 2), dtype=float32)
"""

@erikchwang
Copy link
Author

erikchwang commented Mar 25, 2019

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Masking

Masks a sequence by using a mask value to skip timesteps.

For each timestep in the input tensor (dimension # 1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).

@erikchwang
Copy link
Author

erikchwang commented Mar 26, 2019

No one can explain why?
Does Softmax support masking?
If so, why the masked values are not skipped in Softmax (the "downstream" layer of Masking)?

@ymodak ymodak self-assigned this Mar 27, 2019
@ymodak ymodak added comp:keras Keras related issues type:support Support issues labels Mar 27, 2019
@ymodak ymodak assigned pavithrasv and unassigned ymodak Mar 27, 2019
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 27, 2019
@hoangcuong2011
Copy link

@erikchwang: My notes here might help you understand masking better keras-team/keras#3086 (comment)

@erikchwang
Copy link
Author

So, can you explain the following question?

import tensorflow as tf
outputs = tf.keras.layers.Softmax().apply(
  tf.keras.layers.Masking().apply(
    tf.zeros([3,5,7])
  )
)

Since the default mask value of Masking is zero, Softmax should skip all values in the above case, and its behavior should be like sparse softmax. Therefore, I suppose the output should be all zeros, but that is not the case.

@hoangcuong2011
Copy link

@erikchwang: If you look at my second, third and fourth bullets in my comment, you will understand this. Yes - the output is not supposed to be zero all the time.

"- Masking is not that complicated if we understand how the loss is computed with masking. For instance let us assume we have a sequence with length 256. From this sequence we have a masking with only 4 elements that are with masking of 1 (others are with masking 0). I thought the loss is computed as the average between these 4 elements. Guess what - it is not! The average loss will be divided by 256 instead. For this reason sometimes the loss will be extremely small (0.0something) if we have only few 1 elements and long sequence.
Does it matter? I guess not, as what we need is the gradient of loss, rather than the loss itself.

  • When we use softmax as the last layer, the denominator would be the sum of exponential of all elements, regarding whether their masking is 1 or 0.
  • I thought the output of masking inputs is zeros all the time in LSTM. But it is not the case. Let us assume we have a masking:

0 0 0 1 1 0 0 0

With this case, the three first elements with masking zero has output of 0. However, the three last zeros have output that is as the same as the output of the last element with masking 1."

@erikchwang
Copy link
Author

I did not find this relevant to my question. Please just explain why the outputs of Softmax are not full zeros when all the inputs are masked?

import tensorflow as tf
outputs = tf.keras.layers.Softmax().apply(
  tf.keras.layers.Masking().apply(
    tf.zeros([3,5,7])
  )
)

@hoangcuong2011
Copy link

hoangcuong2011 commented Dec 8, 2019

@erikchwang: Even if you mask, the softmax layer still treats everything as usual. For instance if you put this: [3.,1.,2.,2.,0.,0.] into the softmax, regardless of whether you do masking or not, the output is always:
array([[ 0.50744212, 0.06867483, 0.18667753, 0.18667753, 0.02526405, 0.02526405]])

What masking does is that it notifies the loss computing that do not take into account the "neuron" that is masked, and that is it, no more no less. This is extremely useful, of course because we do padding all the time.
Also, it is very useful for LSTM as it skips inputs that have zeros (i.e. missing inputs - see my picture for the example why we need that). Note also that in case of LSTM it is a bit different in the sense that if you have a sequence of, say, 0 0 0 1 1 0 0 0, the output of the first three zeros is actually 0. But the output of the last three zeros is not 0.

In summary, dont' expect the output of masking is zeros, except LSTMs but in just a specific case like I shown.

@erikchwang
Copy link
Author

You made too many assumptions. I do not use LSTM, neither do I calculate loss, I just want to verify if Softmax support masking. Now it seems that the answer is NO.

@erikchwang
Copy link
Author

Sometimes we need more flexibility than just stacking keras layers...
The graph-style tf.layers is much more flexible than the dynamic tf.keras.layers, but it has been DEPRECATED...

@bw4sz
Copy link

bw4sz commented Jan 15, 2021

I think this is a perfectly valid question, which needs to be addressed. Can we reopen? Added on SO https://stackoverflow.com/questions/65745053/tensorflow-softmax-does-not-ignore-masking-value

@pavithrasv pavithrasv removed their assignment Jan 16, 2021
@bw4sz
Copy link

bw4sz commented Jan 16, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues
Projects
None yet
Development

No branches or pull requests

6 participants