# Speech To Text

## Real-time Speech to Text(실시간 음성 텍스트 변환)
* https://learn.microsoft.com/ko-kr/azure/ai-services/speech-service/get-started-speech-to-text?tabs=linux%2Cterminal&pivots=programming-language-python

In [None]:
subscription = ""
region = "eastus"

In [2]:
import os
import azure.cognitiveservices.speech as speechsdk

def recognize_from_microphone():
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_config = speechsdk.SpeechConfig(subscription=subscription, region=region)

    #이해할 언어 설정
    speech_config.speech_recognition_language="en-US"

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    # 음성파일로 진행 할 때
    # audio_config = speechsdk.audio.AudioConfig(filename="YourAudioFile.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    print("Speak into your microphone.")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

recognize_from_microphone()

Speak into your microphone.
Recognized: Everybody.
Info: on_underlying_io_bytes_received: Close frame received
Info: on_underlying_io_bytes_received: closing underlying io.
Info: on_underlying_io_close_complete: uws_state: 6.


## Custom Speech (사용자 음성 텍스트 변환)
* 말하는거를 그대로 받아적지않고 `의역(완전의역은아니고, 평소에 사용하는)`해서 텍스트화 해준다.
* 해당 기능 설명 : https://speech.microsoft.com/portal/62bc2d13edec4c4c86194c6d4c0432b6/customspeech/1d79a569-9083-4469-b11e-6b25d438ad2b/overview
* 실습교안보고 따라할것 / 언어 서비스 6번교안 87페이지

# Text To Speech

## 음성 갤러리
* 텍스트를 음성으로 출력함
    + 음성변환 목록 / https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt#prebuilt-neural-voices

`sdk 로 실습`

In [10]:
import os
import azure.cognitiveservices.speech as speechsdk

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=subscription, region=region)
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

# The neural multilingual voice can speak different languages based on the input text.
# 음성 전환을 원하면 링크에서 확인 후 변경 / https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt#prebuilt-neural-voices
speech_config.speech_synthesis_voice_name='en-US-AvaMultilingualNeural'

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

# Get text from the console and synthesize to the default speaker.
print("Enter some text that you want to speak >")
text = input()

speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

# if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
#     print("Speech synthesized for text [{}]".format(text))
# elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
#     cancellation_details = speech_synthesis_result.cancellation_details
#     print("Speech synthesis canceled: {}".format(cancellation_details.reason))
#     if cancellation_details.reason == speechsdk.CancellationReason.Error:
#         if cancellation_details.error_details:
#             print("Error details: {}".format(cancellation_details.error_details))
#             print("Did you set the speech resource key and region values?")

Info: on_underlying_io_bytes_received: Close frame received
Info: on_underlying_io_bytes_received: closing underlying io.
Info: on_underlying_io_close_complete: uws_state: 6.
Enter some text that you want to speak >


Info: on_underlying_io_bytes_received: Close frame received
Info: on_underlying_io_bytes_received: received close frame, sending a close response frame.
Info: on_underlying_io_close_sent: uws_client=0x120258660, io_send_result:0
Info: on_underlying_io_close_sent: closing underlying io.
Info: on_underlying_io_close_complete: uws_state: 6.


> REST_API 로 실습

curl --location --request POST `"https://${SPEECH_REGION}.tts.speech.microsoft.com/cognitiveservices/v1"` \
--header "Ocp-Apim-Subscription-Key: ${SPEECH_KEY}" \
--header 'Content-Type: application/ssml+xml' \
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \
--header 'User-Agent: curl' \
--data-raw 
```
<speak version='1.0' xml:lang='en-US'>
    <voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>
        my voice is my passport verify me
    </voice>
</speak>
```

In [6]:
import requests

def text_to_speech(text, region, subscription_key, voice="en-US-AvaMultilingualNeural", output_file="output.mp3"):
    # Azure TTS endpoint
    endpoint = f"https://{region}.tts.speech.microsoft.com/cognitiveservices/v1"

    # SSML (Speech Synthesis Markup Language) payload
    ssml = f"""
    <speak version="1.0" xml:lang="en-US">
        <voice xml:lang="en-US" xml:gender="Female" name="{voice}">
            {text}
        </voice>
    </speak>
    """

    headers = {
        "Ocp-Apim-Subscription-Key": subscription_key,
        "Content-Type": "application/ssml+xml",
        "X-Microsoft-OutputFormat": "audio-16khz-32kbitrate-mono-mp3",
        "User-Agent": "PythonTTSClient"
    }

    response = requests.post(endpoint, headers=headers, data=ssml.encode('utf-8'))

    if response.status_code == 200:
        with open(output_file, "wb") as audio:
            audio.write(response.content)
        print(f"✅ Audio saved to '{output_file}'")
    else:
        print(f"❌ Error {response.status_code}: {response.text}")

if __name__ == '__main__':
    text_to_speech(
        "The extractive summarization feature uses natural language processing techniques to locate key sentences in an unstructured text document. ",
        region, subscription
)

✅ Audio saved to 'output.mp3'


## Custom Neural Voice
* 사용자 지정 신경망 음성(CNV)을 사용하면 사람의 음성 녹음으로 학습된 자연스러운 합성 음성을 만들 수 있습니다. 사용자 지정 음성은 언어와 말하기 스타일에 따라 조정할 수 있으며 TTS 솔루션에 고유한 음성을 추가하는 데 적합합니다.
* 내 목소리를 학습해서(녹음) 텍스트를 내목소리로 읽어준다
* 실습교안보고 따라할것 / 언어 서비스 6번교안 105페이지
* https://speech.microsoft.com/portal/62bc2d13edec4c4c86194c6d4c0432b6/customvoice/overview

# Speech Translation

## Speech To Text Translation(음성번역)
* https://learn.microsoft.com/ko-kr/azure/ai-services/speech-service/get-started-speech-translation?tabs=macos%2Cterminal&pivots=programming-language-python

In [8]:
import os
import azure.cognitiveservices.speech as speechsdk

def recognize_from_microphone():
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=subscription, region=region)
    speech_translation_config.speech_recognition_language="en-US"

    to_language ="it"
    speech_translation_config.add_target_language(to_language)

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    translation_recognizer = speechsdk.translation.TranslationRecognizer(translation_config=speech_translation_config, audio_config=audio_config)

    print("Speak into your microphone.")
    translation_recognition_result = translation_recognizer.recognize_once_async().get()

    if translation_recognition_result.reason == speechsdk.ResultReason.TranslatedSpeech:
        print("Recognized: {}".format(translation_recognition_result.text))
        print("""Translated into '{}': {}""".format(
            to_language, 
            translation_recognition_result.translations[to_language]))
    elif translation_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(translation_recognition_result.no_match_details))
    elif translation_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = translation_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

recognize_from_microphone()

Speak into your microphone.
Recognized: The instructive summarization feature use Nature.
Translated into 'it': La funzione di riepilogo istruttiva utilizza la natura.
Info: on_underlying_io_bytes_received: Close frame received
Info: on_underlying_io_bytes_received: closing underlying io.
Info: on_underlying_io_close_complete: uws_state: 6.


## Video Translation(비디오번역)
* 비디오내부에있는 음성을 번역해주는 기능
* 6번 AIspeech교안 132페이지 참고
* https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/video-translation/python
* https://learn.microsoft.com/ko-kr/azure/ai-services/speech-service/video-translation-get-started?tabs=webvtt&pivots=rest-api

# Keyword Recognition

## Custom Keyword(사용자 키워드)
* 사용자가 정한 키워드를 마이크에 말하면 응답하는 기능 / 시리야, 헤이빅스비 같은...
* 6번 AIspeech교안 146페이지 참고