-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
I have a tfjs model which is taking an inference time of 1.1 sec when I'm passing a batch of input. I came across this which suggest that using this will change compilation mode from sequence to parallel. It will result in the the lower inference time.
So, I added this into my code:
const t = Date.now();
tf.tidy(() => {
tf.env().set('ENGINE_COMPILE_ONLY', true);
console.log("warm up output",model.predict({ "input_2:0": tf.tensor(arr) }).dataSync());
tf.backend().checkCompileCompletion();
tf.backend().getUniformLocations();
tf.env().set('ENGINE_COMPILE_ONLY', false);
});
console.log("warm up time", Date.now() - t);
It decreased my inference time to 120 millisec. But now, I'm getting tensor of 0s as output. It seems changing all the values to 0s is the reason for lower inference time.
I have set tf.setBackend('webgl') and I'm using tfjs@4.2.0.
My model architecture looks like this:
So, I take the x,y co-ordinates as input. Then I'm using positional encoding on this 2D data to transform to higher space of 42. Next there are 5 dense layer. The first layer input layer with 42 neurons with sine activation, then next 3 layer with 256 neurons each with sine activation and then final layer with 3 neurons.
Could anyone please suggest what's wrong in this? or any other way to decrease my inference time?