This repo contains a node sample application that uses AOAI Realtime Audio endpoint. See more detail about the SDK at AOAI Realtime Audio SDK
This sample switches multiple assistants (system prompt + tools set) seamlessly depending on your intent.
You can ask about mobile service, such as
- Weather
- Mobile phone billing
- Mobile phoen current plan
- Mobile phone options
- Consulation on usage
- Mobile phoe store related question, etc.
You can find the assistant definitions at assistants.ts. See all tools set for each assistant to understand what each assistant can do, or modify as you need.
- Node.js installation (https://nodejs.org)
- Azure Open AI account
- GPT-4o realtime model
- Bing Search Resource
- Navigate to this folder
- Run
npm install
to download a small number of dependency packages (seepackage.json
) - Rename
.env_sample
to.env
and update variables - Run
npm run dev
to start the web server, navigating any firewall permissions prompts - Use any of the provided URIs from the console output, e.g.
http://localhost:5173/
, in a browser - If you want to debug the application, press F5 that will launch the browser for debug.
- Check
Chat Only
if you prefer to use text input only, otherwise you can use both Speech and text. - Click the "Start" button to start the session; accept any microphone permissions dialog
- You should see a
<< Session Started >>
message in the left-side output, after which you can speak to the app - You can interrupt the chat at any time by speaking and completely stop the chat by using the "Stop" button
- Optionally, you can use chat area to talk to the bot rather than speak to.
- Assitant name will be displayed in the assistant name text input whenever an assistant is loaded.
- To delete the specific message, enter the Id of the message to
Delete Item
which you can find in the chat history and clickDelete
that will strike sthough the idem.
- Connection errors are not yet gracefully handled and looping error spew may be observed in script debug output. Please just refresh the web page if an error appears.
- Voice selection is not yet supported.
- More authentication mechanisms, including keyless support via Entra, will come in a future service update.
This sample uses a custom client to simplify the usage of the realtime API. The client package is included in this repo in the rt-client-0.4.7.tgz
file. Check the AOAI Realtime Audio SDK to see if there is a newer version of the package if you need the latest version of the SDK.
The primary file demonstrating /realtime
use is src/main.ts; the first few functions demonstrate connecting to /realtime
using the client, sending an inference configuration message, and then processing the send/receive of messages on the connection.
In this repo, we define an assistant as:
- has system prompt
- has tools (function calling definitions)
We use function calling
feature to switch to other assistant.
For example, the generic assistant has following function calling definition.
{
name: 'Assistant_MobileAssistant',
description: 'Help user to answer mobile related question, such as billing, contract, etc.',
parameters: {
type: 'object',
properties: {}
},
returns: async (arg: string) => "Assistant_MobileAssistant"
}
This function will be called whenever you asked about mobile phone related question. When we excute the function, instead of returns the function calling result back to the LLM, we send:
SessionUpdateMessage
to switch the assistant.response.create
to let the model to continue the message.
To simplify the demo, we define the function calling metadata and the function defintion into one object. The returns
property contains the anonymous function that returns the function calling result.
The below example is the get weather
function, that always returns the weather as 40F and rainy
with the location
name.
{
name: 'get_weather',
description: 'get the weather of the locaion',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'location for the weather' }
}
},
returns: async (arg: string) => `the weather of ${JSON.parse(arg).location} is 40F and rainy`
}