Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inference API #501

Draft
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

kallebysantos
Copy link
Contributor

@kallebysantos kallebysantos commented Feb 25, 2025

What kind of change does this PR introduce?

feature

What is the current behavior?

Since PR #436, is possible to use onnx inference by calling the globalThis[Symbol.for('onnxruntime')]

What is the new behavior?

Coming from Issue #479, the Inference API is an user friendly interface that allows developers easily run their own models using the power of the low level onnx rust backend.

It's based on two core componenents RawSession and RawTensor

  • RawSession: A low level Supabase.ai.Session that can execute any .onnx model. It's recommended for use cases where need more control of the pre/pos-processing steps like text-to-audio example, as well when need to execute linear regression, tabular classification and self-made models.

For common tasks like nlp, audio or computer-vision. The huggingface/transformers.js is recommended, since it already does all the pre/pos-processing stuff.

  • RawTensor: A low level data representation of the model input/output. Inference API's Tensors are fully compatible with Transformers.js Tensors. It means that developers can still be using the high-lavel abstractions that transformers.js provides, like: .sum(), .normalize(), .min().
Examples:

Simple utilization:

Loading a RawSession:

// hosted on supabase storage
const session = await RawSession.fromStorage('models/model.onnx')
// or from hugging face repo
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
// or using the model file url direclty
const session = await RawSession.fromUrl("https://example.com/model.onnx");

Executing a RawSession with RawTensor:

const session = await RawSession.fromUrl("https://example.com/model.onnx");

// Prepare the input tensors
const inputs = {
  input1: new RawTensor("float32", [1.0, 2.0, 3.0], [1, 3]),
  input2: new RawTensor("float32", [4.0, 5.0, 6.0], [1, 3]),
};

const outputs = await session.run(inputs);
console.log(outputs.output1); // Output tensor

Generating embeddings from scratch:

This example demonstrates how Inference API can be used to complex scenarios while taking advantage of Transformers.js high-level functions

import { Tensor } from "@huggingface/transformers.js";
const { RawTensor, RawSession } = Supabase.ai;
   
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
   
// Example only, in real 'feature-extraction' tensors are given from the tokenizer step. 
// Consider 'n' as the batch size
const inputs = {
   input_ids: new RawTensor('float32', [1, 2, 3...], [n, 2]),
   attention_mask: new RawTensor('float32', [...], [n, 2]),
   // @ts-ignore: mixing Tensors from both
   token_types_ids: new Tensor('float32', [...], [n, 2])
};
   
const { last_hidden_state } = await session.run(inputs);
   
// Using `transformers.js` APIs
const hfTensor = Tensor.mean_pooling(last_hidden_state, inputs.attention_mask).normalize();
   
return hfTensor.tolist();

Self-made models

This example ilustrate how users can train their own model and execute it direclty from edge-runtime

Here you can check a Deployable example of it, with the current Supa stack

The model was trained to expect the following object payload

[
  {
    "Model_Year": 2021,
    "Engine_Size": 2.9,
    "Cylinders": 6,
    "Fuel_Consumption_in_City": 13.9,
    "Fuel_Consumption_in_City_Hwy": 10.3,
    "Fuel_Consumption_comb": 12.3,
    "Smog_Level": 3,
  },
  {
    "Model_Year": 2023,
    "Engine_Size": 2.4,
    "Cylinders": 4,
    "Fuel_Consumption_in_City": 9.9,
    "Fuel_Consumption_in_City_Hwy": 7.0,
    "Fuel_Consumption_comb": 8.6,
    "Smog_Level": 3,
  }
]

Then the model inference can done inside a common Edge Function

const { RawTensor, RawSession } = Supabase.ai;

// Custom filename on Hugging Face, default: 'model_quantized.onnx'
const session = await RawSession.fromStorage('models/vehicle-emission.onnx');

Deno.serve(async (req: Request) => {
  const carsBatchInput = await req.json();

  // Parsing objects to tensor input
  const inputTensors = {};
  session.inputs.forEach((inputKey) => {
    const values = carsBatchInput.map((item) => item[inputKey]);

    // This model uses `float32` tensors, but could variate to mixed types
    inputTensors[inputKey] = new RawTensor('float32', values, [values.length, 1]);
  });

  const { emissions } = await session.run(inputTensors);

  return Response.json({ result: emissions });  // [ 289.01, 199.53]
});

TODO:

  • Add Supabase Storage integration
  • Possibility for external request authentication
  • Tensor Audio support with tryEncodeAudio(), check out the text-to-audio example
  • Cache revalidation
  • Fine control of the Session Id
  • Model size constraints, checking the size before downloading the model

- Exposing an user friendly interface to consume the `onnx` backend
- using `InferenceAPI` to perform `text-to-audio`.
- encoding `wave` audio tensors from the rust land
Documenting the "magic numbers" of the `text-to-audio` exmaple,
[original paper](https://arxiv.org/pdf/2306.07691)
- Adding `fromStorage` method to InferenceAPI, its allows model
loadingfrom Supabase Storage with public/private bucket support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant