Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for openai tts api #354

Closed
tarasglek opened this issue Jan 21, 2024 · 7 comments · Fixed by #357
Closed

Add support for openai tts api #354

tarasglek opened this issue Jan 21, 2024 · 7 comments · Fixed by #357
Assignees

Comments

@tarasglek
Copy link
Owner

As a warmup for #310 one can add a tts feature to chatcraft.
Openai can do tts using really high quality voices: https://platform.openai.com/docs/guides/text-to-speech

We should add a speak menu entry to each message so it can be spoken by these voices. Once the message is spoken, it should also get a download speech menu so we can download the generated speech.

TTS could also allow chatcraft to respond in voice when we ask it questions using the voice feature.

This would be handy for recording voiceovers for demos on youtube, presentations, etc.

@tarasglek
Copy link
Owner Author

tarasglek commented Jan 21, 2024

We should also support using browser-local VTT and TTS ala https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

Unfortunately these are a bit limited in that they don't support working with raw sound :(

@Amnish04
Copy link
Collaborator

Amnish04 commented Jan 21, 2024

Sounds Interesting!

I tried to play with it a bit, but for some reason I can't access the method from documentation
image

Am I using the wrong object?
image
image

@Amnish04
Copy link
Collaborator

Just found that the version of openai we are using does not support the speech property.

Upgrading to latest fixed it, and I can listen to generated audio.

image

@humphd
Copy link
Collaborator

humphd commented Jan 21, 2024

@Amnish04 nice, want to turn your investigation into a PR?

One thing we'd have to do here is make this aware of different providers/models (cc @kliu57). For example, if I'm using OpenRouter.ai vs. OpenAI for my provider and API Key, this won't work.

To start, maybe we only do this if you're using OpenAI as your provider?

Another thing that would be cool is to allow users to set the voice to use as a setting: https://platform.openai.com/docs/guides/text-to-speech/voice-options

Also, it looks like we can stream it vs. waiting to download: https://platform.openai.com/docs/guides/text-to-speech

@tarasglek
Copy link
Owner Author

tarasglek commented Jan 21, 2024

I think the model should show up in list endpoint. Can use that to enable disable this feature.

Love how quickly you got an experimental result!

@humphd
Copy link
Collaborator

humphd commented Jan 22, 2024

I think the model should show up in list endpoint.

Do you mean in the list of models we show the user, so you an "Ask" or "Retry" with the TTS models?

@tarasglek
Copy link
Owner Author

i did not mean that, but thats a good idea lets do that

@Amnish04 Amnish04 linked a pull request Jan 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants