Skip to content

tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10645

Answered by joslat
joslat asked this question in Q&A

You must be logged in to vote

Update: the tts & tts-hd model since 24-02-2025 a new expiration date: 01-02-2026 - bad timing I guess...

Anyhow the issue/limitation of 3 requests per minute seem to remain, as well as the limit of 4096 characters per request.

To overcome this I've implemented:

  • A sample using SK using the tts-hd model that splits the text in chunks of <4096, generates the audio and puts it together.
  • A sample using Azure OpenAI SDK that uses gpt-4o-audio-preview and does the same.

Note: for the later i did a version in file system and another, more clean, in memory.

You can find the two separate projects in the following repo: https://github.com/joslat/PlayingWithAudio

Replies: 3 comments

You must be logged in to vote
0 replies

You must be logged in to vote
0 replies

You must be logged in to vote
0 replies
Answer selected by sophialagerkranspandey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant