In [1]:
import azure.cognitiveservices.speech as speechsdk

#### About the settings
The <b>endpoint</b> should look like:
<br>
https://your-resource-name.cognitiveservices.azure.com/
<br>
or for some regions:
<br>
https://your-resource-name.region.cognitiveservices.azure.com/
<br>
Make sure you do not use the generic documentation endpoint (https://endpoint.api.cognitive.microsoft.com/).<br>
Instead, use the endpoint from your Speech resource in Azure.
<br>
<br>
The speech config can be set using region or endpoint
speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region="eastus")

In [2]:
subscription_key  = "02KqGB8EDonpmVbrFYwd5Dm7cGdvR7e3jdUdaVDiRqRYq8kCaaSUJQQJ99BCACYeBjFXJ3w3AAAYACOGE4wU"#"7c400507-6b30-4a2f-97f9-5baa6c9e4e28" 
speech_endpoint = "https://speech_assessment_1.eastus.api.cognitive.microsoft.com/"
region="eastus"
language="en-US"  # 可根據需求修改語言

In [5]:
audio_file = "./testdata/test01_v2.wav"
transcribe_txt = "GIVE NOT SO EARNEST A MIND TO THESE MUMMERIES CHILD"

In [6]:
audio_config = speechsdk.audio.AudioConfig(filename=audio_file)
# 創建發音評估配置
pronunciation_config = speechsdk.PronunciationAssessmentConfig(
    reference_text=transcribe_txt,
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark,
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme,
    enable_miscue=True
)
pronunciation_config.enable_prosody_assessment()

In [7]:
speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region )

In [8]:
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config, language=language)

In [9]:
# 應用發音評估配置
pronunciation_config.apply_to(speech_recognizer)

In [10]:
# 執行識別
result = speech_recognizer.recognize_once_async().get()
pronunciation_result=None
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    pronunciation_result = speechsdk.PronunciationAssessmentResult(result)
else:
    raise Exception(f"Speech recognition failed: {result.reason}")

In [13]:
print(pronunciation_result.accuracy_score)
print(pronunciation_result.fluency_score)
print(pronunciation_result.pronunciation_score)

97.0
100.0
95.7


#### Bug and Fix
- Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
  主要原因是 Azure Speech SDK 不支援直接用 MP3 檔案作為輸入，預設只能處理標準的 PCM/WAV 格式音訊檔案<br>
  問題分析:<br>
      - Azure Speech SDK 預設只支援 16kHz, 16bit, mono 的 WAV (PCM) 檔案。<br>
      - 你目前用的是 MP3 格式，這會導致「無效標頭」(invalid header) 的錯誤，因為 SDK 讀不到正確的 PCM 資訊<br>
      - 如果你一定要用 MP3，需額外處理（例如用 GStreamer 或先轉檔）。<br>
  解決步驟:<br>
    1. 將 MP3 轉換為 WAV (16kHz, 16bit, mono)<br>
        你可以用 ffmpeg 或 pydub 來轉檔。以下是兩種常見方法：<br>

        方法一：<b>用 ffmpeg（推薦）</b><br>
        在終端機執行：
    ```bash
        ffmpeg -i ./testdata/test01.mp3 -ar 16000 -ac 1 -acodec pcm_s16le ./testdata/test01.wav
    ```
    2. 方法二：<b>用 Python pydub</b>
    ```python
        from pydub import AudioSegment
        sound = AudioSegment.from_mp3("./testdata/test01.mp3")
        sound = sound.set_frame_rate(16000).set_channels(1).set_sample_width(2)
        sound.export("./testdata/test01.wav", format="wav")
    ```
  小結:<br>
    SPXERR_INVALID_HEADER 幾乎都是音訊格式不符造成，尤其是直接用 MP3。  
    請務必轉成 16kHz、16bit、單聲道的 WAV 檔案再進行辨識。  
    轉檔後再執行你的原始程式即可順利運作。  
    如需更進階的 MP3 支援，請參考官方文件如何串接 GStreamer，但一般專案直接轉 WAV 最簡單  