Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unquantized models don't load for bge-m3 #600

Closed
5 tasks
boltonn opened this issue Feb 22, 2024 · 2 comments
Closed
5 tasks

Unquantized models don't load for bge-m3 #600

boltonn opened this issue Feb 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@boltonn
Copy link

boltonn commented Feb 22, 2024

System Info

NextJS client side

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I get an error when trying to load the unquantized version but the quantized version works just fine. Apologies in advance if this is a feature request rather than a bug.

Trace:

ort-wasm-simd.wasm:0x82c2bc D:/a/_work/1/s/onnxruntime/core/optimizer/initializer.cc:31 onnxruntime::Initializer::Initializer(const onnx::TensorProto &, const Path &) !model_path.IsEmpty() was false. model_path must not be empty. Ensure that a path is provided when the model is created or loaded.
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11428 @ ort-wasm-simd.wasm:0x8296b1
$func631 @ ort-wasm-simd.wasm:0x4d0e8
v @ ort-web.min.js:7
$func92 @ ort-wasm-simd.wasm:0xb052
o @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28ce8
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
ort-wasm-simd.wasm:0x82c2bc 
lt @ ort-web.min.js:7
P @ ort-web.min.js:7
$func11504 @ ort-wasm-simd.wasm:0x82c2bc
$func2149 @ ort-wasm-simd.wasm:0x16396e
$func584 @ ort-wasm-simd.wasm:0x48a63
$func11427 @ ort-wasm-simd.wasm:0x829582
$func4164 @ ort-wasm-simd.wasm:0x339b6f
$func4160 @ ort-wasm-simd.wasm:0x339aff
j @ ort-web.min.js:7
$func356 @ ort-wasm-simd.wasm:0x2e215
j @ ort-web.min.js:7
$func339 @ ort-wasm-simd.wasm:0x28e06
$Ra @ ort-wasm-simd.wasm:0x6ebffb
e._OrtCreateSession @ ort-web.min.js:7
e.createSessionFinalize @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
e.createSession @ ort-web.min.js:7
loadModel @ ort-web.min.js:7
await in loadModel (async)
createSessionHandler @ ort-web.min.js:7
create @ inference-session-impl.js:176
await in create (async)
constructSession @ models.js:418
await in constructSession (async)
from_pretrained @ models.js:1087
from_pretrained @ models.js:5492
await in from_pretrained (async)
loadItems @ pipelines.js:3099
pipeline @ pipelines.js:3047
getInstance @ worker.js:22
eval @ worker.js:32
Show 16 more frames
Show less
2ort-web.min.js:7 Uncaught (in promise) Error: Can't create a session
    at e.createSessionFinalize (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:450870)
    at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:451468)
    at e.createSession (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:443694)
    at e.OnnxruntimeWebAssemblySessionHandler.loadModel (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:446588)
    at async Object.createSessionHandler (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-web/dist/ort-web.min.js:7:156416)
    at async InferenceSession.create (webpack-internal:///(app-pages-browser)/./node_modules/onnxruntime-common/dist/lib/inference-session-impl.js:176:25)
    at async constructSession (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:418:16)
    at async Promise.all (index 1)
    at async XLMRobertaModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:1085:20)
    at async AutoModel.from_pretrained (webpack-internal:///(app-pages-browser)/./node_modules/@xenova/transformers/src/models.js:5492:20)

Reproduction

import { env, pipeline } from '@xenova/transformers';


// Specify a custom location for models in public folder
// env.localModelPath = "/models";

// // Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = true;
// env.allowLocalModels = true;
// env.useBrowserCache = false;

// Use the Singleton pattern to enable lazy construction of the pipeline.
// model should be directory in public/models (and in this case onnx folder is hardcoded)
class PipelineSingleton {
    static task = 'feature-extraction';
    static model = 'Xenova/bge-m3';
    static instance = null;

    static async getInstance(progress_callback = null) {
        if (this.instance === null) {
            console.log(this.model);
            this.instance = pipeline(this.task, this.model, { progress_callback, quantized:false }, );
        }
        return this.instance;
    }
}

// Listen for messages from the main thread
self.addEventListener('message', async (event) => {
    // Retrieve the feature-extraction pipeline. When called for the first time,
    // this will load the pipeline and save it for future use.
    let embedder = await PipelineSingleton.getInstance(x => {
        // We also add a progress callback to the pipeline so that we can
        // track model loading.
        self.postMessage(x);
    });

    // Actually perform the feature-extraction
    let output = await embedder(event.data.text, { pooling: 'avg', normalize: true });
    // console.log(output.tolist()[0].length);

    // Send the output back to the main thread
    self.postMessage({
        status: 'complete',
        output: output.tolist(),
    });
});
@boltonn boltonn added the bug Something isn't working label Feb 22, 2024
@boltonn boltonn changed the title Unquantized models don't appear to load for bge-m3 Unquantized models don't load for bge-m3 Feb 22, 2024
@xenova
Copy link
Collaborator

xenova commented Feb 22, 2024

Duplicate of #553.

This is because the current version of transformers.js does not yet support the external data format. See #105. This will be fixed when we upgrade to onnxruntime-web v1.17.0 (#596)

@boltonn
Copy link
Author

boltonn commented Feb 23, 2024

Ah I don't even think I realized what that file was for. Makes sense. Thanks!

While I have you, I know it's unrelated but when I ran the quantized model on GPU via python it was quite a bit slower since there were still operations on CPU. So is there anyway to get the best of both worlds and run the quantized version on GPU via Python performantly and then use transformers.js for embedding just the queries?

Apologies in advance if unrelated and better suited for a discussion/forum

Update: For those wondering, I was not able to load the model with the TensorrtExecutionProvider but it was fast enough as-is using the quantized version from xenova via the CPUExecutionProvider

@boltonn boltonn closed this as completed Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants