You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get an error when trying to load the unquantized version but the quantized version works just fine. Apologies in advance if this is a feature request rather than a bug.
Trace:
ort-wasm-simd.wasm:0x82c2bc D:/a/_work/1/s/onnxruntime/core/optimizer/initializer.cc:31 onnxruntime::Initializer::Initializer(const onnx::TensorProto &, const Path &) !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11428 @ ort-wasm-simd.wasm:0x8296b1
$func631 @ ort-wasm-simd.wasm:0x4d0e8
v @ ort-web.min.js:7
$func92 @ ort-wasm-simd.wasm:0xb052
o @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28ce8
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
ort-wasm-simd.wasm:0x82c2bc
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11427 @ ort-wasm-simd.wasm:0x829582
$func4164 @ ort-wasm-simd.wasm:0x339b6f
$func4160 @ ort-wasm-simd.wasm:0x339aff
j @ ort-web.min.js:7
$func356 @ ort-wasm-simd.wasm:0x2e215
j @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28e06
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
2ort-web.min.js:7 Uncaught (in promise) Error: Can't create a session
at e.createSessionFinalize (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:450870)
at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:451468)
at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:443694)
at e.OnnxruntimeWebAssemblySessionHandler.loadModel (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:446588)
at async Object.createSessionHandler (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:156416)
at async InferenceSession.create (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-common/dist/lib/inference-session-impl.js:176:25)
at async constructSession (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:418:16)
at async Promise.all (index 1)
at async XLMRobertaModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:1085:20)
at async AutoModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:5492:20)
Reproduction
import{env,pipeline}from'@xenova/transformers';// Specify a custom location for models in public folder// env.localModelPath = "/models";// // Disable the loading of remote models from the Hugging Face Hub:env.allowRemoteModels=true;// env.allowLocalModels = true;// env.useBrowserCache = false;// Use the Singleton pattern to enable lazy construction of the pipeline.// model should be directory in public/models (and in this case onnx folder is hardcoded)classPipelineSingleton{statictask='feature-extraction';staticmodel='Xenova/bge-m3';staticinstance=null;staticasyncgetInstance(progress_callback=null){if(this.instance===null){console.log(this.model);this.instance=pipeline(this.task,this.model,{ progress_callback,quantized:false},);}returnthis.instance;}}// Listen for messages from the main threadself.addEventListener('message',async(event)=>{// Retrieve the feature-extraction pipeline. When called for the first time,// this will load the pipeline and save it for future use.letembedder=awaitPipelineSingleton.getInstance(x=>{// We also add a progress callback to the pipeline so that we can// track model loading.self.postMessage(x);});// Actually perform the feature-extractionletoutput=awaitembedder(event.data.text,{pooling: 'avg',normalize: true});// console.log(output.tolist()[0].length);// Send the output back to the main threadself.postMessage({status: 'complete',output: output.tolist(),});});
The text was updated successfully, but these errors were encountered:
This is because the current version of transformers.js does not yet support the external data format. See #105. This will be fixed when we upgrade to onnxruntime-web v1.17.0 (#596)
Ah I don't even think I realized what that file was for. Makes sense. Thanks!
While I have you, I know it's unrelated but when I ran the quantized model on GPU via python it was quite a bit slower since there were still operations on CPU. So is there anyway to get the best of both worlds and run the quantized version on GPU via Python performantly and then use transformers.js for embedding just the queries?
Apologies in advance if unrelated and better suited for a discussion/forum
Update: For those wondering, I was not able to load the model with the TensorrtExecutionProvider but it was fast enough as-is using the quantized version from xenova via the CPUExecutionProvider
System Info
NextJS client side
Environment/Platform
Description
I get an error when trying to load the unquantized version but the quantized version works just fine. Apologies in advance if this is a feature request rather than a bug.
Trace:
Reproduction
The text was updated successfully, but these errors were encountered: