Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bge-small-en ONNX component fails to load with "Opset 19 is under development" #28161

Open
eostis opened this issue Aug 27, 2023 · 11 comments
Open
Assignees
Milestone

Comments

@eostis
Copy link

eostis commented Aug 27, 2023

From https://blog.vespa.ai/bge-embedding-models-in-vespa-using-bfloat16/

Vespa 8.216.8

(no issues with model multilingual-e5-small)

The ONNX generation traces (tried also with --opset 17):

  • optimum-cli export onnx --task sentence-similarity -m BAAI/bge-small-en --optimize O3 wpsolr/models/bge-small-en-onnx
    Framework not specified. Using pt to export to ONNX.
    Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133M/133M [00:06<00:00, 20.6MB/s]
    Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 366/366 [00:00<00:00, 1.89MB/s]
    Using framework PyTorch: 2.0.1
    Overriding 1 configuration item(s)
    - use_cache -> False
    ================ Diagnostic Run torch.onnx.export version 2.0.1 ================
    verbose: False, log level: Level.ERROR
    ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

/usr/local/lib/python3.11/site-packages/optimum/onnxruntime/configuration.py:765: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
warnings.warn(
Optimizing model...
symbolic shape inference disabled or failed.
Configuration saved in wpsolr/models/bge-small-en-onnx/ort_config.json
Optimized model saved at: wpsolr/models/bge-small-en-onnx (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Validating models in subprocesses...
Validating ONNX model wpsolr/models/bge-small-en-onnx/model.onnx...
-[✓] ONNX model output names match reference model (last_hidden_state)
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 16, 384) matches (2, 16, 384)
-[x] values not close enough, max diff: 1.8654606342315674 (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:

  • last_hidden_state: max diff = 1.8654606342315674.
    The exported model was saved at: wpsolr/models/bge-small-en-onnx

The component:

  <component id="wpsolr_bge_small_en_onnx" type="hugging-face-embedder">
       <transformer-model url="https://www.dropbox.com/scl/fi/91x8qmxsq87plberfv238/model.onnx?rlkey=(...)&amp;dl=1"/>
       <tokenizer-model url="https://www.dropbox.com/scl/fi/nmhz3pzwhcc13ypz33kh1/tokenizer.json?rlkey=(...)&amp;dl=1"/>
      <pooling-strategy>cls</pooling-strategy>
      <normalize>true</normalize>
    </component>
  

The error in Vespa logs

Container.com.yahoo.jdisc.core.StandaloneMain JDisc exiting: Throwable caught: \nexception=\ncom.yahoo.container.di.componentgraph.core.ComponentNode$ComponentConstructorException: Error constructing 'wpsolr_bge_small_en_onnx' of type 'ai.vespa.embedding.huggingface.HuggingFaceEmbedder': null\nCaused by: java.lang.RuntimeException: ONNX Runtime exception\n\tat ai.vespa.modelintegration.evaluator.OnnxEvaluator.createSession(OnnxEvaluator.java:161)\n\tat ai.vespa.modelintegration.evaluator.OnnxEvaluator.createSession(OnnxEvaluator.java:156)\n\tat ai.vespa.modelintegration.evaluator.OnnxEvaluator.(OnnxEvaluator.java:36)\n\tat ai.vespa.modelintegration.evaluator.OnnxRuntime.evaluatorOf(OnnxRuntime.java:81)\nCaused by: ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Load model from /opt/vespa/var/db/vespa/download/-1287799194143085460/contents failed:/builddir/build/BUILD/vespa-onnxruntime-1.13.1/onnxruntime/core/graph/model_load_utils.h:47 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only guarantees support for models stamped with official released onnx opset versions. Opset 19 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain com.ms.internal.nhwc is till opset 17.\n\n\tat ai.onnxruntime.OrtSession.createSession(Native Method)\n\tat ai.onnxruntime.OrtSession.(OrtSession.java:73)\n\tat ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:222)\n\tat ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:208)\n\tat ai.vespa.modelintegration.evaluator.OnnxRuntime$1.create(OnnxRuntime.java:46)\n\tat ai.vespa.modelintegration.evaluator.OnnxRuntime.acquireSession(OnnxRuntime.java:149)\n\tat ai.vespa.modelintegration.evaluator.OnnxEvaluator.createSession(OnnxEvaluator.java:144)\n\t... 3 more\n

@jobergum
Copy link
Member

Hey - this is likely because you are using a new version of onnxruntime (onnxruntime 1.13.1). Try to downgrade onnxruntime to 1.13.1

@jobergum jobergum self-assigned this Aug 28, 2023
@jobergum jobergum added this to the soon milestone Aug 28, 2023
@jobergum
Copy link
Member

We should document an easy way to find which onnxruntime is used in any Vespa target version, to avoid using a model exported with a newer version than what vespa uses.

@eostis
Copy link
Author

eostis commented Aug 28, 2023

Would you know why exporting model "multilingual-e5-small" works, while "bge-small-en" does not?

@jobergum
Copy link
Member

They have slightly different model architectures that can trigger different compute graphs, which causes this forward compatibility issue with onnxruntime. What I want this ticket to be about is that it should be easy to target a specific Vespa version with a given onnxruntime version.

@eostis
Copy link
Author

eostis commented Aug 28, 2023

I will wait for the FR :)
image

@baldersheim
Copy link
Member

We have fallen behind on onnx versions, 1.15 is on the way.

@eostis
Copy link
Author

eostis commented Aug 28, 2023

This makes sense to me now: multilingual-e5-small is an older model than bge-small-en

@jobergum
Copy link
Member

@arnej27959 or @lesters, does the onnx file include which version was used to export it? So that we could potentially sniff on that and reject if it was exported with a newer runtime than Vespa uses?

@lesters
Copy link
Member

lesters commented Sep 1, 2023

ONNX files contain both the opset version and ir version of the file.

@jobergum
Copy link
Member

jobergum commented Sep 1, 2023

What is ir version?

@lesters
Copy link
Member

lesters commented Sep 4, 2023

The ir (intermediate representation) version refers to the representation of the graph and the operators of the model, e.g. that overall computation that should be done, or the overall structure. The op version refers to the version of the individual operators, as they can change or introduce optimizations. However, when new operators are introduced, the IR version necessarily must increase, but so does the op version. So the op version in general is the most important to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants