tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10645

joslat · 2025-02-24T00:00:22Z

joslat
Feb 24, 2025

Hi,

I've managed to generate a proper text to speech following the sample:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToAudio/OpenAI_TextToAudio.cs

But the only model i can use is tts or tts-hd - all of them have a cap of 4,096 chars...
This enables a maximum of 4 to 8 minutes of audio, not more.
And on top, this model is to be deprecated on Sat, Mar 1, 2025...

I am building a language teacher and would like to generate audio sessions ranging up to 20 or more minutes...

Is there any way to overcome this "hard cap"? or what should I use instead, tts seems to only have this model...

What would you suggest to use?

Best,
José

Answered by joslat

Feb 26, 2025

Update: the tts & tts-hd model since 24-02-2025 a new expiration date: 01-02-2026 - bad timing I guess...

Anyhow the issue/limitation of 3 requests per minute seem to remain, as well as the limit of 4096 characters per request.

To overcome this I've implemented:

A sample using SK using the tts-hd model that splits the text in chunks of <4096, generates the audio and puts it together.
A sample using Azure OpenAI SDK that uses gpt-4o-audio-preview and does the same.

Note: for the later i did a version in file system and another, more clean, in memory.

You can find the two separate projects in the following repo: https://github.com/joslat/PlayingWithAudio

View full answer

joslat · 2025-02-24T00:38:47Z

joslat
Feb 24, 2025
Author

is gpt-4o-audio-preview supported already?
mentioned here: https://platform.openai.com/docs/guides/audio

Otherwise the most "practical" solution is to split the text in chunks of 4.096 characters (or a bit less just in case) and concatenate the output audio in a single file.
Of course taking care that 3 requests are not invoked in 1 minute...
but... there should be a better way right ? ;)

For the curious, this works, here's the code: (NOTE: I removed the code as now it is in a public repo along more examples using the Azure OpenAI SDK - which supports gpt-4o-audio-preview)
You can find the two separate projects in the following repo: https://github.com/joslat/PlayingWithAudio
(contains the original code, now in the tts project.

0 replies

joslat · 2025-02-25T21:58:25Z

joslat
Feb 25, 2025
Author

Quoted in issue: #10655

I also have managed to use the gpt-4o-voice-preview model (it seems to expire in may...) and it is a bit flaky or slow (I got some timeouts) but the code works too. Thx to @rogerbarreto for suggesting this and using the Azure OpenAI SDK.

I will look the next days on how to adapt this already working code to support the model and if its not too hard to provide support for this. Might need help though ;)

0 replies

joslat · 2025-02-26T20:06:49Z

joslat
Feb 26, 2025
Author

Update: the tts & tts-hd model since 24-02-2025 a new expiration date: 01-02-2026 - bad timing I guess...

Anyhow the issue/limitation of 3 requests per minute seem to remain, as well as the limit of 4096 characters per request.

To overcome this I've implemented:

A sample using SK using the tts-hd model that splits the text in chunks of <4096, generates the audio and puts it together.
A sample using Azure OpenAI SDK that uses gpt-4o-audio-preview and does the same.

Note: for the later i did a version in file system and another, more clean, in memory.

You can find the two separate projects in the following repo: https://github.com/joslat/PlayingWithAudio

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10645

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10645

Uh oh!

joslat Feb 24, 2025

Replies: 3 comments

Uh oh!

Uh oh!

joslat Feb 24, 2025 Author

Uh oh!

joslat Feb 25, 2025 Author

Uh oh!

joslat Feb 26, 2025 Author

joslat
Feb 24, 2025

joslat
Feb 24, 2025
Author

joslat
Feb 25, 2025
Author

joslat
Feb 26, 2025
Author