mutex issue on Mac only for release 1.21.X only #24579

giorgosHadji · 2025-04-28T17:45:46Z

Describe the issue

Hello,

I am trying to use the latest onnxruntime release (1.21.X) (tried both 0 and 1), they work fine for unix and windows but for mac I get the following issue at the end of the inference when everything seems to be closing down:

libc++abi: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
No code changes from my side, the issue is only present on the latest onnxruntime release and only for Mac.
On previous releases all are working just fine, the only thing changed is the onnx runtime.

Googling says it has something todo with static mutex and the order in which they get destroyed/accessed - see more here.

This happens both on the cpu case and on the gpu (coreML) case.

I have got in my hands only macOS 12 and macOS13 and both fail. Also both arm64 and x86 fail on all scenarios.

Important to note is that I can compile and link against onnxruntime just fine, its during running that the issue appears. I believe more specifically when the inference has finished and things are getting destructed/closing down

To reproduce

For the current moment can't share a model to repro, and not sure If I will ever get approval for this.
Sharing to see if anyone has seen this or if it brings something in mind.

Urgency

No response

Platform

Mac

OS Version

MacOs 12 and MacOs13

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.21.0 and 1.21.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

fs-eire · 2025-04-29T21:56:14Z

Since there is no model and no repro steps, it may be difficult to investigate.

Before more info is shared, could you please help to verify whether a simple test model works (eg. use this model: https://github.com/onnx/onnx/blob/v1.16.2/onnx/backend/test/data/node/test_abs/model.onnx).

If the runtime error still occur to this test model, maybe it's OK to share your code and repro steps?

xenova · 2025-05-05T22:55:07Z

I can confirm that I get this error too with onnxruntime-node dev build: https://www.npmjs.com/package/onnxruntime-node/v/1.22.0-dev.20250418-c19a49615b

libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument

It seems to occur at shutdown, after loading any model on WebGPU ep.

fs-eire · 2025-05-05T23:13:46Z

@xenova does it happen on CPU EP or CoreML EP?

xenova · 2025-05-06T01:26:41Z

Doesn't happen on CPU EP, but with CoreML EP I get a different error:

Context leak detected, msgtracer returned -1

Here's a reproduction:

import { AutoTokenizer, MusicgenForConditionalGeneration, RawAudio } from '@huggingface/transformers';

// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained('Xenova/musicgen-small', {
  dtype: {
    text_encoder: 'q4',
    decoder_model_merged: 'q4',
    encodec_decode: 'fp32',
  },
  device: "webgpu", // or "coreml"
});

// Prepare text input
const prompt = 'a light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions bpm: 130';
const inputs = tokenizer(prompt);

// Generate audio
const audio_values = await model.generate({
  ...inputs,
  max_new_tokens: 500,
  do_sample: true,
  guidance_scale: 3,
});

// (Optional) Write the output to a WAV file
const audio = new RawAudio(audio_values.data, model.config.audio_encoder.sampling_rate);
audio.save('musicgen.wav');

throw new Error('dummy error'); // If we throw a dummy error, we get the error.

But the libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument issue appears to be printed whenever the process exits with a non-zero status code (?) You should be able to reproduce this with any model, not just musicgen

clouds56 · 2025-05-06T10:38:28Z

I think I'm on CPU EP, with this code

#include <iostream>
#include <cassert>
#include <filesystem>
#include <onnxruntime_cxx_api.h>
#include <onnxruntime_c_api.h>
using namespace std;

int main() {
  // auto env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "test");
  auto api = OrtGetApiBase()->GetApi(ORT_API_VERSION);
  auto env_ptr = (OrtEnv*)nullptr;
  auto result = api->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env_ptr);
  assert(result == nullptr && env_ptr != nullptr);
  // api->ReleaseEnv(env_ptr);
  cout << "Hello, World!" << endl;
  return 0;
}

uncomment api->ReleaseEnv(env_ptr); and it would work.

Update:
No models needed to reproduce the "abort"

c++ API auto env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "test"); works, since RAII released it
c API api->CreateEnv(...) then api->ReleaseEnv(env_ptr); works
c API api->CreateEnv(...) but without manually release failed, this could happen for some framework that has static OrtEnv singleton

fs-eire · 2025-05-15T00:01:34Z

found the root cause:

node::errors::TriggerUncaughtException() will calls into node::Exit(), which eventually calls into libsystem_c.dylib!__cxa_finalize_ranges (destruction of static/global variables in libonnxruntime.dylib) without calling the finalizer set by napi_set_instance_data. A later destruction of std::unique_ptr<Ort::Env> (onnxruntime_binding.node) will refer to invalid static mutex and crashes.

If the program exits normally (not from node::errors::TriggerUncaughtException()), the finalizer get called correctly and the std::unique_ptr<Ort::Env> will be reset to nullptr so the problem will not occur.

Not sure if the behavior is expected (finalizer set by napi_set_instance_data not called).

I am still trying to figure out how to fix this.

github-actions bot added the ep:CoreML label Apr 28, 2025

fs-eire removed the ep:CoreML label Apr 30, 2025

fs-eire mentioned this issue May 15, 2025

finalize callback of instance data is not called when error is thrown nodejs/node#58341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutex issue on Mac only for release 1.21.X only #24579

mutex issue on Mac only for release 1.21.X only #24579

giorgosHadji commented Apr 28, 2025

fs-eire commented Apr 29, 2025 •

edited

Loading

xenova commented May 5, 2025

fs-eire commented May 5, 2025

xenova commented May 6, 2025

clouds56 commented May 6, 2025 •

edited

Loading

fs-eire commented May 15, 2025

mutex issue on Mac only for release 1.21.X only #24579

mutex issue on Mac only for release 1.21.X only #24579

Comments

giorgosHadji commented Apr 28, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

fs-eire commented Apr 29, 2025 • edited Loading

xenova commented May 5, 2025

fs-eire commented May 5, 2025

xenova commented May 6, 2025

clouds56 commented May 6, 2025 • edited Loading

fs-eire commented May 15, 2025

fs-eire commented Apr 29, 2025 •

edited

Loading

clouds56 commented May 6, 2025 •

edited

Loading