Unquantized models don't load for bge-m3 #600

boltonn · 2024-02-22T04:25:34Z

System Info

NextJS client side

Environment/Platform

Description

I get an error when trying to load the unquantized version but the quantized version works just fine. Apologies in advance if this is a feature request rather than a bug.

Trace:

ort-wasm-simd.wasm:0x82c2bc D:/a/_work/1/s/onnxruntime/core/optimizer/initializer.cc:31 onnxruntime::Initializer::Initializer(const onnx::TensorProto &, const Path &) !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11428 @ ort-wasm-simd.wasm:0x8296b1
$func631 @ ort-wasm-simd.wasm:0x4d0e8
v @ ort-web.min.js:7
$func92 @ ort-wasm-simd.wasm:0xb052
o @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28ce8
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
ort-wasm-simd.wasm:0x82c2bc 
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11427 @ ort-wasm-simd.wasm:0x829582
$func4164 @ ort-wasm-simd.wasm:0x339b6f
$func4160 @ ort-wasm-simd.wasm:0x339aff
j @ ort-web.min.js:7
$func356 @ ort-wasm-simd.wasm:0x2e215
j @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28e06
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
2ort-web.min.js:7 Uncaught (in promise) Error: Can't create a session
    at e.createSessionFinalize (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:450870)
    at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:451468)
    at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:443694)
    at e.OnnxruntimeWebAssemblySessionHandler.loadModel (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:446588)
    at async Object.createSessionHandler (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:156416)
    at async InferenceSession.create (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-common/dist/lib/inference-session-impl.js:176:25)
    at async constructSession (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:418:16)
    at async Promise.all (index 1)
    at async XLMRobertaModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:1085:20)
    at async AutoModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:5492:20)

Reproduction

import { env, pipeline } from '@xenova/transformers';


// Specify a custom location for models in public folder
// env.localModelPath = "/models";

// // Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = true;
// env.allowLocalModels = true;
// env.useBrowserCache = false;

// Use the Singleton pattern to enable lazy construction of the pipeline.
// model should be directory in public/models (and in this case onnx folder is hardcoded)
class PipelineSingleton {
    static task = 'feature-extraction';
    static model = 'Xenova/bge-m3';
    static instance = null;

    static async getInstance(progress_callback = null) {
        if (this.instance === null) {
            console.log(this.model);
            this.instance = pipeline(this.task, this.model, { progress_callback, quantized:false }, );
        }
        return this.instance;
    }
}

// Listen for messages from the main thread
self.addEventListener('message', async (event) => {
    // Retrieve the feature-extraction pipeline. When called for the first time,
    // this will load the pipeline and save it for future use.
    let embedder = await PipelineSingleton.getInstance(x => {
        // We also add a progress callback to the pipeline so that we can
        // track model loading.
        self.postMessage(x);
    });

    // Actually perform the feature-extraction
    let output = await embedder(event.data.text, { pooling: 'avg', normalize: true });
    // console.log(output.tolist()[0].length);

    // Send the output back to the main thread
    self.postMessage({
        status: 'complete',
        output: output.tolist(),
    });
});

The text was updated successfully, but these errors were encountered:

xenova · 2024-02-22T21:36:05Z

Duplicate of #553.

This is because the current version of transformers.js does not yet support the external data format. See #105. This will be fixed when we upgrade to onnxruntime-web v1.17.0 (#596)

boltonn · 2024-02-23T00:09:06Z

Ah I don't even think I realized what that file was for. Makes sense. Thanks!

While I have you, I know it's unrelated but when I ran the quantized model on GPU via python it was quite a bit slower since there were still operations on CPU. So is there anyway to get the best of both worlds and run the quantized version on GPU via Python performantly and then use transformers.js for embedding just the queries?

Apologies in advance if unrelated and better suited for a discussion/forum

Update: For those wondering, I was not able to load the model with the TensorrtExecutionProvider but it was fast enough as-is using the quantized version from xenova via the CPUExecutionProvider

boltonn added the bug Something isn't working label Feb 22, 2024

boltonn changed the title ~~Unquantized models don't appear to load for bge-m3~~ Unquantized models don't load for bge-m3 Feb 22, 2024

boltonn closed this as completed Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unquantized models don't load for bge-m3 #600

Unquantized models don't load for bge-m3 #600

boltonn commented Feb 22, 2024

xenova commented Feb 22, 2024

boltonn commented Feb 23, 2024 •

edited

Loading

Unquantized models don't load for bge-m3 #600

Unquantized models don't load for bge-m3 #600

Comments

boltonn commented Feb 22, 2024

System Info

Environment/Platform

Description

Reproduction

xenova commented Feb 22, 2024

boltonn commented Feb 23, 2024 • edited Loading

boltonn commented Feb 23, 2024 •

edited

Loading