feat: inference API #501

kallebysantos · 2025-02-25T20:41:24Z

What kind of change does this PR introduce?

feature

What is the current behavior?

Since PR #436, is possible to use onnx inference by calling the globalThis[Symbol.for('onnxruntime')]

What is the new behavior?

Coming from Issue #479, the Inference API is an user friendly interface that allows developers easily run their own models using the power of the low level onnx rust backend.

It's based on two core componenents RawSession and RawTensor

RawSession: A low level Supabase.ai.Session that can execute any .onnx model. It's recommended for use cases where need more control of the pre/pos-processing steps like text-to-audio example, as well when need to execute linear regression, tabular classification and self-made models.

For common tasks like nlp, audio or computer-vision. The huggingface/transformers.js is recommended, since it already does all the pre/pos-processing stuff.

RawTensor: A low level data representation of the model input/output. Inference API's Tensors are fully compatible with Transformers.js Tensors. It means that developers can still be using the high-lavel abstractions that transformers.js provides, like: .sum(), .normalize(), .min().

Examples:

Simple utilization:

Loading a RawSession:

// hosted on supabase storage
const session = await RawSession.fromStorage('models/model.onnx')
// or from hugging face repo
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
// or using the model file url direclty
const session = await RawSession.fromUrl("https://example.com/model.onnx");

Executing a RawSession with RawTensor:

const session = await RawSession.fromUrl("https://example.com/model.onnx");

// Prepare the input tensors
const inputs = {
  input1: new RawTensor("float32", [1.0, 2.0, 3.0], [1, 3]),
  input2: new RawTensor("float32", [4.0, 5.0, 6.0], [1, 3]),
};

const outputs = await session.run(inputs);
console.log(outputs.output1); // Output tensor

Generating embeddings from scratch:

This example demonstrates how Inference API can be used to complex scenarios while taking advantage of Transformers.js high-level functions

import { Tensor } from "@huggingface/transformers.js";
const { RawTensor, RawSession } = Supabase.ai;
   
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
   
// Example only, in real 'feature-extraction' tensors are given from the tokenizer step. 
// Consider 'n' as the batch size
const inputs = {
   input_ids: new RawTensor('float32', [1, 2, 3...], [n, 2]),
   attention_mask: new RawTensor('float32', [...], [n, 2]),
   // @ts-ignore: mixing Tensors from both
   token_types_ids: new Tensor('float32', [...], [n, 2])
};
   
const { last_hidden_state } = await session.run(inputs);
   
// Using `transformers.js` APIs
const hfTensor = Tensor.mean_pooling(last_hidden_state, inputs.attention_mask).normalize();
   
return hfTensor.tolist();

Self-made models

This example ilustrate how users can train their own model and execute it direclty from edge-runtime

Here you can check a Deployable example of it, with the current Supa stack

The model was trained to expect the following object payload

[
  {
    "Model_Year": 2021,
    "Engine_Size": 2.9,
    "Cylinders": 6,
    "Fuel_Consumption_in_City": 13.9,
    "Fuel_Consumption_in_City_Hwy": 10.3,
    "Fuel_Consumption_comb": 12.3,
    "Smog_Level": 3,
  },
  {
    "Model_Year": 2023,
    "Engine_Size": 2.4,
    "Cylinders": 4,
    "Fuel_Consumption_in_City": 9.9,
    "Fuel_Consumption_in_City_Hwy": 7.0,
    "Fuel_Consumption_comb": 8.6,
    "Smog_Level": 3,
  }
]

Then the model inference can done inside a common Edge Function

const { RawTensor, RawSession } = Supabase.ai;

// Custom filename on Hugging Face, default: 'model_quantized.onnx'
const session = await RawSession.fromStorage('models/vehicle-emission.onnx');

Deno.serve(async (req: Request) => {
  const carsBatchInput = await req.json();

  // Parsing objects to tensor input
  const inputTensors = {};
  session.inputs.forEach((inputKey) => {
    const values = carsBatchInput.map((item) => item[inputKey]);

    // This model uses `float32` tensors, but could variate to mixed types
    inputTensors[inputKey] = new RawTensor('float32', values, [values.length, 1]);
  });

  const { emissions } = await session.run(inputTensors);

  return Response.json({ result: emissions });  // [ 289.01, 199.53]
});

TODO:

Add Supabase Storage integration
Possibility for external request authentication
Tensor Audio support with tryEncodeAudio(), check out the text-to-audio example
Cache revalidation
Fine control of the Session Id
Model size constraints, checking the size before downloading the model

- Exposing an user friendly interface to consume the `onnx` backend

- using `InferenceAPI` to perform `text-to-audio`. - encoding `wave` audio tensors from the rust land

Documenting the "magic numbers" of the `text-to-audio` exmaple, [original paper](https://arxiv.org/pdf/2306.07691)

- Adding `fromStorage` method to InferenceAPI, its allows model loadingfrom Supabase Storage with public/private bucket support.

kallebysantos added 8 commits February 25, 2025 17:17

feat: creating inference_api

75faff4

- Exposing an user friendly interface to consume the `onnx` backend

stamp: add typescript defs for 'InferenceAPI'

62b4a93

feat: adding text-to-audio example

0c67095

- using `InferenceAPI` to perform `text-to-audio`. - encoding `wave` audio tensors from the rust land

stamp: adding paper references for model magic numbers

910673b

Documenting the "magic numbers" of the `text-to-audio` exmaple, [original paper](https://arxiv.org/pdf/2306.07691)

fix(ext/ai): moving types to global.d.ts

305970d

feat: support for authorization header on model fetch

cf41425

stamp: add model loading fromStorage

4fb7f28

- Adding `fromStorage` method to InferenceAPI, its allows model loadingfrom Supabase Storage with public/private bucket support.

stamp: clippy & format

86bff1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inference API #501

feat: inference API #501

kallebysantos commented Feb 25, 2025 •

edited

Loading

feat: inference API #501

Are you sure you want to change the base?

feat: inference API #501

Conversation

kallebysantos commented Feb 25, 2025 • edited Loading

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Simple utilization:

Generating embeddings from scratch:

Self-made models

kallebysantos commented Feb 25, 2025 •

edited

Loading