Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF python vs TFJS models generating significantly different results #8025

Open
danielgoldelman opened this issue Oct 20, 2023 · 4 comments
Open
Assignees
Labels
comp:converter type:bug Something isn't working

Comments

@danielgoldelman
Copy link

System information

  • I have written custom code (described below)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Attempted on Google Colab and Intel Macbook Pro i7
  • TensorFlow.js installed from (npm or script link): npm
  • TensorFlow.js version: 4.11.0
  • Browser version: Brave Version 1.59.120 Chromium: 118.0.5993.88 (Official Build) (arm64)
  • Tensorflow.js Converter Version: tfjs-v4.11.0

Describe the current behavior
This model sees a significant drop in performance between a Tensorflow in Python TFBertForSequenceClassification model and a related Tensorflow in JS Graph Model, created using tensorflow_converter.

Describe the expected behavior
The expected behavior would be that the TF python model and the TF js model would produce similar or identical results. Please let us know if the conversion was done incorrectly.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.

Converter scripts:

! pip3 install transformers
! pip3 install tensorflowjs_converter
import tensorflow as tf
from transformers import TFAutoModel

MODEL_NAME = './MultitaskModel'
model = TFAutoModel.from_pretrained(MODEL_NAME, from_pt = True)

callable = tf.function(model.call)
concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"),tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])

tf.saved_model.save(model, 'multitaskModelForJS', signatures=concrete_function)
!saved_model_cli show --dir multitaskModelForJS --tag_set serve --signature_def serving_default

! tensorflowjs_converter \
    --input_format=tf_saved_model \
    --output_format=tfjs_graph_model \
    --signature_name=serving_default \
    --saved_model_tags=serve \
    ./multitaskModelForJS/ \
    ./multitaskModelForJSWeb/

Model location:
https://github.com/danielgoldelman/modelrepo/tree/main/MultitaskModel

Results for the Python model:
Screenshot 2023-10-19 at 8 16 15 PM
Results for the JS model (tested both in node and in the browser)
Screenshot 2023-10-19 at 8 16 03 PM

@danielgoldelman danielgoldelman added the type:bug Something isn't working label Oct 20, 2023
@gaikwadrahul8 gaikwadrahul8 self-assigned this Oct 20, 2023
@gaikwadrahul8
Copy link
Contributor

Hi, @danielgoldelman

Thank you for bringing this issue to our attention and I was trying to replicate the same issue from my end if possible, could you please help me with code-snippet/code-example which you're using to display above results for python model and Tensorflow.js model so I'll try to replicate the same issue from my end also ? Thank you!

@danielgoldelman
Copy link
Author

Hello @gaikwadrahul8, here is a link to a repo that can display these results. The results shared in the screenshots above can be generated via the test.ipynb jupyter notebook. Please let me know if there is other information you would like me to supply.

https://github.com/danielgoldelman/tfjs_conv_issue

@mattsoulanille
Copy link
Member

I compared the CPU execution vs WebGL (you can reproduce this by running npx http-server --cors="Access-Control-Allow-Origin: *, Access-Control-Allow-Private-Network: true" in the tfjs model directory and clicking the run button). CPU and WebGL results appear identical, so this probably isn't a GPU rounding issue. Maybe it's a model conversion issue, but your call to tfjs-converter looks correct.

You could try converting the model without grappler graph optimization. This shouldn't be the problem, but it's possible something went wrong during optimization. In any case, I would probably do this before trying to compare intermediate tensors between the TF version and TFJS version. Unfortunately, we don't have a flag for this yet, but if you comment out these lines and replace them with return graph_def, you should be able to run the modified converter with npx bazel run //tfjs-converter/python/tensorflowjs/converters:converter -- --help.

@danielgoldelman
Copy link
Author

@mattsoulanille Sorry, but I would like some help understanding where to place the BUILD file in the tfjs filestructure. I am getting this error:

ERROR: Skipping '//tfjs-converter/python/tensorflowjs/converters:converter': no such package 'tfjs-converter/python/tensorflowjs/converters': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /Users/danielgoldelman/Desktop/tfjs_conv_issue/tfjs-converter/python/tensorflowjs/converters
WARNING: Target pattern parsing failed.
ERROR: no such package 'tfjs-converter/python/tensorflowjs/converters': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /Users/danielgoldelman/Desktop/tfjs_conv_issue/tfjs-converter/python/tensorflowjs/converters
INFO: Elapsed time: 0.056s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
ERROR: Build failed. Not running target

Could you help me get on track?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:converter type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants