Skip to content

microsoft/aoai-realtime-multi-assistants

Repository files navigation

Azure OpenAI /realtime: an interactive chat with multi-assistants

This repo contains a node sample application that uses AOAI Realtime Audio endpoint. See more detail about the SDK at AOAI Realtime Audio SDK

This sample switches multiple assistants (system prompt + tools set) seamlessly depending on your intent.

Scenario

You can ask about mobile service, such as

  • Weather
  • Mobile phone billing
  • Mobile phoen current plan
  • Mobile phone options
  • Consulation on usage
  • Mobile phoe store related question, etc.

You can find the assistant definitions at assistants.ts. See all tools set for each assistant to understand what each assistant can do, or modify as you need.

Prereqs

  1. Node.js installation (https://nodejs.org)
  2. Azure Open AI account
  3. GPT-4o realtime model
  4. Bing Search Resource

Using the sample

  1. Navigate to this folder
  2. Run npm install to download a small number of dependency packages (see package.json)
  3. Rename .env_sample to .env and update variables
  4. Run npm run dev to start the web server, navigating any firewall permissions prompts
  5. Use any of the provided URIs from the console output, e.g. http://localhost:5173/, in a browser
  6. If you want to debug the application, press F5 that will launch the browser for debug.
  7. Check Chat Only if you prefer to use text input only, otherwise you can use both Speech and text.
  8. Click the "Start" button to start the session; accept any microphone permissions dialog
  9. You should see a << Session Started >> message in the left-side output, after which you can speak to the app
  10. You can interrupt the chat at any time by speaking and completely stop the chat by using the "Stop" button
  11. Optionally, you can use chat area to talk to the bot rather than speak to.
  12. Assitant name will be displayed in the assistant name text input whenever an assistant is loaded.
  13. To delete the specific message, enter the Id of the message to Delete Item which you can find in the chat history and click Delete that will strike sthough the idem.

Known issues

  1. Connection errors are not yet gracefully handled and looping error spew may be observed in script debug output. Please just refresh the web page if an error appears.
  2. Voice selection is not yet supported.
  3. More authentication mechanisms, including keyless support via Entra, will come in a future service update.

Code description

This sample uses a custom client to simplify the usage of the realtime API. The client package is included in this repo in the rt-client-0.4.7.tgz file. Check the AOAI Realtime Audio SDK to see if there is a newer version of the package if you need the latest version of the SDK.

The primary file demonstrating /realtime use is src/main.ts; the first few functions demonstrate connecting to /realtime using the client, sending an inference configuration message, and then processing the send/receive of messages on the connection.

Assistants

In this repo, we define an assistant as:

  • has system prompt
  • has tools (function calling definitions)

We use function calling feature to switch to other assistant.

For example, the generic assistant has following function calling definition.

{
    name: 'Assistant_MobileAssistant',
    description: 'Help user to answer mobile related question, such as billing, contract, etc.',
    parameters: {
        type: 'object',
        properties: {}
    },
    returns: async (arg: string) => "Assistant_MobileAssistant"
}

This function will be called whenever you asked about mobile phone related question. When we excute the function, instead of returns the function calling result back to the LLM, we send:

  1. SessionUpdateMessage to switch the assistant.
  2. response.create to let the model to continue the message.

Function Calling

To simplify the demo, we define the function calling metadata and the function defintion into one object. The returns property contains the anonymous function that returns the function calling result.

The below example is the get weather function, that always returns the weather as 40F and rainy with the location name.

{
    name: 'get_weather',
    description: 'get the weather of the locaion',
    parameters: {
        type: 'object',
        properties: {
            location: { type: 'string', description: 'location for the weather' }
        }
    },
    returns: async (arg: string) => `the weather of ${JSON.parse(arg).location} is 40F and rainy`
}

About

AOAI Realtime sample with multiple assistants

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published