-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM prediction is numerically inconsistent for the last few instances. #30995
Comments
This is not Build/Installation or Bug/Performance issue. Please post this kind of support questions at Stackoverflow. There is a big community to support and learn from your questions. GitHub is mainly for addressing bugs in installation and performance. Thanks! |
This is a bug report. (It is not about build, installation, or performance. It is about correctness.) I provided a small reproducible example illustrating a bug. (In our real code, we trained a bigger model on a training set of thousands of instances, but then found that the trained model behaved oddly.) The same input should give the same response. And it does, except if the length of the input isn't divisible by 4; then the remaining 1 to 3 instances differ. To make this obvious, I repeated the same input 11 times in my reprex. This bug in keras or tensorflow also manifests like this when applying to time series: If predicting n days, the first n-1 predictions should match what you get if you had 1 less data of data and were generating just n-1 predictions. But the prediction on the historical data does change! |
@Quiigi Can you provide a standalone code in python to reproduce the issue? Thanks! |
After looking at python basics, I rewrote my R example above in python: import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
def fake(shape_): # arbitrary but reproducible
f = np.reshape(range(np.prod(shape_)), shape_, order="F") + 1
return f % 2.71 - 1.04
shape = (30,5)
model = Sequential()
model.add(LSTM(units=2, input_shape=shape))
model.set_weights([fake((5, 8)), fake((2, 8)), fake(8)])
for n in [8, 7]:
print("\nn = " + str(n))
x= np.broadcast_to(fake(shape), (n,)+shape) # n copies of identical input
p = model.predict(x) # all predictions should match
p == p[1] # but last n%4 rows differ
(p-p[1]) * 2**26 # the difference is low bits
assert (p == p[1]).all() # fails iff n%4 > 0 I also do two experiments. For n=8, a multiple of 4, my check passes; for 7 it fails. Here's my output:
|
Thanks for reporting the issue. Let me take a look. |
@qlzh727 What did you find? |
Sorry for the late reply. I was able to reproduce the issue, and I think it is somehow happening when batch_size that is not a perfect 2^n number. I can see the value difference between batch 0-3 and 4-6. If I change the batch size to 9, then the difference is between 0-7 and 8, same for batch size 17. The cause of this might be numerical instability for numerical libraries. Also, given the fact that the diff is so small, this is usually ignored in the unit test (the default atol and rtol for numpy assert allclose() is 1e-6). In fact if I change assert in the code to np.allclose(), the issue goes away. Could you give more details about why this issue is concerning you in your application, and what's the specific problem it causes? |
We are predicting financial time series. In this application, snooping future We have found a "snoop test" to be a useful tool: every day, we generate The numerical difference is very small, on the order of floating point Ideally, the fix should be in the library. The way "multiple of 4" gets We have a workaround: we tack on additional 0, 1, 2, or 3 irrelevant data |
Thanks for the detailed explanation. After some debug, the lowest op I can track that cause the difference was the "recurrent_activation" function (sigmoid), where the inputs with same value will produce slight different result. The underlying implementation of sigmoid for CPU goes to Eigen, which I don't have any knowledge. Adding @rmlarsen who is the expert for Eigen in TF team for this issue. |
Thank you for the update! Trying to replicate your debugging, I wasn't able to find a node called "recurrent_activation" or "sigmoid" in my tensorflow graph. The closest I see is "Tanh", and its output (Tanh:0) shows the tiny discrepancies at the end. I see the issue the nodes feeding it directly, "add_5", and idirectly, MatMul_6, BiasAdd_2, MatMul_2. Notably, the inputs to the latter (Enter and TensorArrayReadV3) are clean. So I would guess the difference starts around there. |
The recurrent_activation I am talking about is at
|
I have
(And activation is 'tanh', resulting presumably in the node from where I was able to trace back the mod 4 issue to matrix multiplication). |
In my case, the inconsistency originates in MatMul. It's possible that activation functions 'sigmoid' or 'hard_sigmoid' have a similar issue. This reduced example demonstrates the issue: import numpy as np
import tensorflow as tf
a = np.broadcast_to(np.float32([.6, -.8, -.3, 0]), (5,4))
b = np.float32([[8, 1], [6, 4], [-9, 1], [0, 0]])
tf.matmul(a,b).eval(session=tf.Session()) My output:
This is an improvement because LSTM generates a graph with 268 nodes, including While loops. The above reproducible example has just 1 simple node. I'm not sure where exactly the discrepancy creeps in; maybe in the function
but that's just a guess. |
Due the numerical instability, I don't think there is anything we can address here (the diff is smaller than the normal limit when we do the tests), I am going to close this bug. |
I had hoped to learn where exactly the instability arises. I assume we see an artifact of optimization, trading accuracy for speed. And as the error is within your normal tolerance, the result we see must be deemed "correct". Therefore, closing this issue is appropriate. |
The predictions you get may differ slightly depending on input length and position within it. E.g., if you have 11 instances of input, you get one answer for the first 8, and a different answer for the last 3.
I write "may" as it happens to me with probability around 0.4. "Slightly" means in the order of the least significant bits of the float32 mantissa.
System information
Platform A:
VERSION "1.7.0"
GIT_VERSION "v1.7.0-3-g024aecf414"
COMPILER_VERSION "4.8.4"
Platform B:
VERSION "1.12.0"
GIT_VERSION "v1.12.0-0-ga6d8ffae09"
COMPILER_VERSION "4.8.5"
both:
Describe the current behavior
If the first dimension of
x
isn
, "row"i
will get one value if0 <= i < (n&-4)
, but a possibly different value for(n&-4) <= i < n
. (These C++/Python style 0-based indices. For R, 1-based, it's0 < i <= bitwAnd(n, -4)
versusbitwAnd(n, -4) < i <= n
.)Describe the expected behavior
Reproducible prediction from same input instance, independent of row number or input length. I use generalized "row" for a slice of a tensor with a given fixed first index, e.g.,
x[i,,]
orpred[i,]
.Code to reproduce the issue
This is a reprex written in R. I'd be happy to port to other languages if that's preferable.
Created on 2019-07-25 by the reprex package (v0.2.1.9000)
Other info / logs
The text was updated successfully, but these errors were encountered: