Xinference provides audio-to-text functionality that is compatible with OpenAI Audio. This notebook demonstrates how to use Xinference for speech recognition.

# Preparation

First, you need to install Xinference:
```shell
pip install xinference
```

Whisper model requires the command-line tool [ffmpeg](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:

```shell
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
```

Then, start the Xinference server by the following command:
```shell
xinference-local
```

The Xinference server will be started:

```shell
2023-11-02 16:04:55,278 xinference   38878 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-11-02 16:04:55,280 xinference.core.supervisor 38878 INFO     Worker 127.0.0.1:32187 has been added successfully
2023-11-02 16:04:55,281 xinference.deploy.worker 38878 INFO     Xinference worker successfully started.
```

Finally, we launch a ChatGLM3 model for tool calls.
```shell
xinference launch -u whisper-1 -n whisper-large-v3 -t audio
```

# Audio to text

This is an example audio from [Common Voice](https://commonvoice.mozilla.org/zh-CN). We transcibe it to text and translate it to English.


In [1]:
import IPython
IPython.display.Audio("../xinference/model/audio/tests/common_voice_zh-CN_38026095.mp3")

In [2]:
import openai

# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1")
audio_file = open("../xinference/model/audio/tests/common_voice_zh-CN_38026095.mp3", "rb")
# Transcription
completion = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
completion

Transcription(text='本列表列出香港航空的航点')

In [3]:
# Translation
completion = client.audio.translations.create(model="whisper-1", file=audio_file)
completion

Translation(text=' This list lists the airlines in Hong Kong.')

In [4]:
audio_file.close()

# Conclusion

Xinference is a powerful model inference platform that seamlessly integrates with OpenAI's API for tasks such as speech recognition, text conversation, image generation, and more.