# 如何使用和开发微信聊天机器人的系列教程
# A workshop to develop & use an intelligent and interactive chat-bot in WeChat

### WeChat is a popular social media app, which has more than 800 million monthly active users.

<img src='http://www.kudosdata.com/wp-content/uploads/2016/11/cropped-KudosLogo1.png' width=30% style="float: right;">
<img src='reference/WeChat_SamGu_QR.png' width=10% style="float: right;">

### http://www.KudosData.com

by: Sam.Gu@KudosData.com


May 2017 ========== Scan the QR code to become trainer's friend in WeChat ========>>

### 第三课：自然语言处理
### Lesson 3: Natural Language Processing
* 消息文字转成语音 (Speech synthesis: text to voice)
* 语音转换成消息文字 (Speech recognition: voice to text)
* 消息文字的多语言互译 (Text based language translation)

### Using Google Cloud Platform's Machine Learning APIs

First, visit <a href="http://console.cloud.google.com/apis">API console</a>, choose "Credentials" on the left-hand menu.  Choose "Create Credentials" and generate an API key for your application. You should probably restrict it by IP address to prevent abuse, but for now, just  leave that field blank and delete the API key after trying out this demo.

Copy-paste your API Key here:

In [1]:
import io
import os
import subprocess
import sys

In [2]:
# Here I read in my own API_KEY from a file, which is not shared in Github repository:
with io.open('../../API_KEY.txt') as fp: 
    for line in fp: APIKEY = line

# You need to un-comment below line and replace 'APIKEY' variable with your own GCP API key:
# APIKEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

From the same API console, choose "Dashboard" on the left-hand menu and "Enable API".

Enable the following APIs for your project (search for them) if they are not already enabled:
<ol>
<li> Google Translate API </li>
<li> Google Cloud Vision API </li>
<li> Google Natural Language API </li>
<li> Google Cloud Speech API </li>
</ol>

Finally, because we are calling the APIs from Python (clients in many other languages are available), let's install the Python package (it's not installed by default on Datalab)

In [3]:
# Copyright 2016 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License"); 

# !pip install --upgrade google-api-python-client

### 导入需要用到的一些功能程序库：

In [4]:
import time, datetime, requests, itchat
from itchat.content import *
from googleapiclient.discovery import build
# Below is for GCP Language Tranlation API
service = build('translate', 'v2', developerKey=APIKEY)
# Below is for GCP Speech API
# sservice = build('speech', 'v1beta1', developerKey=APIKEY)
speech_service = build('speech', 'v1', developerKey=APIKEY)

█

### 图片二进制base64码转换 (Define image pre-processing functions)

In [5]:
# Import the base64 encoding library.
import base64
# Pass the image data to an encoding function.
def encode_image(image_file):
    with io.open(image_file, "rb") as image_file:
        image_content = image_file.read()
# Python 2
    if sys.version_info[0] < 3:
        return base64.b64encode(image_content)
# Python 3
    else:
        return base64.b64encode(image_content).decode('utf-8')

# Pass the audio data to an encoding function.
def encode_audio(audio_file):
    with io.open(audio_file, 'rb') as audio_file:
        audio_content = audio_file.read()
# Python 2
    if sys.version_info[0] < 3:
        return base64.b64encode(audio_content)
# Python 3
    else:
        return base64.b64encode(audio_content).decode('utf-8')


### 机器智能API接口控制参数 (Define control parameters for API)

In [6]:
# control parameter for Image API:
parm_image_maxResults = 10 # max objects or faces to be extracted from image analysis

# control parameter for Language Translation API:
parm_translation_origin_language = '' # original language in text: to be overwriten by TEXT_DETECTION
parm_translation_target_language = 'zh' # target language for translation: Chinese

# control parameter for Language Translation API:
parm_speech_origin_language = 'en-US' # speech API 'voice to text' language

### * 消息文字转成语音 (Speech synthesis: text to voice)

### * 语音转换成消息文字 (Speech recognition: voice to text)

The Speech API can work on streaming data, audio content encoded and embedded directly into the POST message, or on a file on Cloud Storage.

In [7]:
#    msg.download(msg.fileName)
#    print('\nDownloaded image file name is: %s' % msg['FileName'])

#    audio_file_input = msg['FileName']
#    audio_file_input = 'reference/eng_sample.mp3'

#    audio_type = {'flac', 'wav'}

# Running Speech API

def KudosData_voice_to_text(audio_file_input, audio_type):
    audio_file_output = str(audio_file_input) + '.' + str(audio_type)
    print('audio_file_input  : %s' % audio_file_input)
    print('audio_file_output : %s' % audio_file_output)
    
    # convert mp3 file to target GCP audio file:
    FNULL = io.open(os.devnull, "w") # supress os/linux command output
    # remove audio_file_output, is exist
    retcode = subprocess.call(['rm', audio_file_output], stdout=FNULL, stderr=subprocess.STDOUT)
    # print(retcode)
    # !ffmpeg -i reference/eng_sample.mp3 -ac 1 reference/eng_sample.mp3.flac
    retcode = subprocess.call(['ffmpeg', '-i', audio_file_input, '-ac', '1', audio_file_output], 
                              stdout=FNULL, stderr=subprocess.STDOUT)
    # print(retcode)

    # Call GCP Speech API:
    # response = speech_service.speech().syncrecognize(
    response = speech_service.speech().recognize(
        body={
            'config': {
#                 'encoding': 'LINEAR16',
#                 'sampleRateHertz': 16000,
                'languageCode': parm_speech_origin_language
            },
            'audio': {
                'content': encode_audio(audio_file_output) # base64 of converted audio file, for speech recognition
                }
            }).execute()    
    print('Compeleted: Speech API: Voice -> Text ...')
    return response

In [8]:
##########################
# main()
##########################

#    msg.download(msg.fileName)
#    print('\nDownloaded image file name is: %s' % msg['FileName'])

#    audio_file_input = msg['FileName']


response = KudosData_voice_to_text('reference/eng_sample.mp3', 'flac')
# response = KudosData_voice_to_text('reference/eng_sample.mp3', 'wav')

audio_file_input  : reference/eng_sample.mp3
audio_file_output : reference/eng_sample.mp3.flac
Compeleted: Speech API: Voice -> Text ...


In [9]:
if response != {}:
    print (response['results'][0]['alternatives'][0]['transcript'])
    print ('( confidence: %f )' % response['results'][0]['alternatives'][0]['confidence'])

in central Japan a spider weeds or web in a field of growing rice rice has been apart of Japan for so long that has a shape the land indeed it has become so much a part of the Japanese landscape it is created a unique environment is Central to both the people who created it and the wild animals and not share it
( confidence: 0.893852 )


In [11]:
##########################
# main()
##########################

#    msg.download(msg.fileName)
#    print('\nDownloaded image file name is: %s' % msg['FileName'])

#    audio_file_input = msg['FileName']


# response = KudosData_voice_to_text('reference/eng_sample.mp3', 'flac')
response = KudosData_voice_to_text('reference/eng_sample.mp3', 'wav')

audio_file_input  : reference/eng_sample.mp3
audio_file_output : reference/eng_sample.mp3.wav
Compeleted: Speech API: Voice -> Text ...


In [12]:
if response != {}:
    print (response['results'][0]['alternatives'][0]['transcript'])
    print ('( confidence: %f )' % response['results'][0]['alternatives'][0]['confidence'])

in central Japan a spider weaves a web in a field of growing rice rice has been apart of Japan for so long that has a shape the land indeed it has become so much a part of the Japanese landscape it is created a unique environment is Central to both the people who created it and the wild animals in their share it
( confidence: 0.921722 )


In [None]:
FNULL = io.open(os.devnull, "w")

In [None]:
# for customization
audio_file_input = 'reference/eng_sample.mp3'
audio_file_output = audio_file_input + '.flac'
print('audio_file_input  : %s' % audio_file_input)
print('audio_file_output : %s' % audio_file_output)

In [None]:
# for customization

# !rm -rf reference/eng_sample.mp3.flac
retcode = subprocess.call(['rm', audio_file_output], stdout=FNULL, stderr=subprocess.STDOUT)
print(retcode)

# !ffmpeg -i reference/eng_sample.mp3 -ac 1 reference/eng_sample.mp3.flac
retcode = subprocess.call(['ffmpeg', '-i', audio_file_input, '-ac', '1', audio_file_output], 
                          stdout=FNULL, stderr=subprocess.STDOUT)
print(retcode)

In [None]:
# audio_file_path = 'reference/eng_sample.mp3.flac' # mono
audio_file_path = str(audio_file_output) # mono

In [None]:
# response = speech_service.speech().syncrecognize(
response = speech_service.speech().recognize(
    body={
        'config': {
#             'encoding': 'LINEAR16',
#             'sampleRateHertz': 16000,
            'languageCode': parm_speech_origin_language
        },
        'audio': {
            'content': encode_audio(audio_file_path)
            }
        }).execute()
print(response)

In [None]:
if response != {}:
    print (response['results'][0]['alternatives'][0]['transcript'])
    print ('( confidence: %f )' % response['results'][0]['alternatives'][0]['confidence'])

In [None]:
# for customization
audio_file_input = 'reference/eng_sample.mp3'
audio_file_output = audio_file_input + '.wav'
print('audio_file_input  : %s' % audio_file_input)
print('audio_file_output : %s' % audio_file_output)

In [None]:
# for customization

# !rm -rf reference/eng_sample.mp3.flac
retcode = subprocess.call(['rm', audio_file_output], stdout=FNULL, stderr=subprocess.STDOUT)
print(retcode)

# !ffmpeg -i reference/eng_sample.mp3 -ac 1 reference/eng_sample.mp3.flac
retcode = subprocess.call(['ffmpeg', '-i', audio_file_input, '-ac', '1', audio_file_output], 
                          stdout=FNULL, stderr=subprocess.STDOUT)
print(retcode)

In [None]:
# audio_file_path = 'reference/eng_sample.mp3.flac' # mono
audio_file_path = str(audio_file_output) # mono

In [None]:
# response = speech_service.speech().syncrecognize(
response = speech_service.speech().recognize(
    body={
        'config': {
#             'encoding': 'LINEAR16',
#             'sampleRateHertz': 16000,
            'languageCode': parm_speech_origin_language
        },
        'audio': {
            'content': encode_audio(audio_file_path)
            }
        }).execute()
print (response)

In [None]:
if response != {}:
    print (response['results'][0]['alternatives'][0]['transcript'])
    print ('( confidence: %f )' % response['results'][0]['alternatives'][0]['confidence'])

In [None]:
audio_file_path = 'reference/audio.raw'

In [None]:
# response = speech_service.speech().syncrecognize(
response = speech_service.speech().recognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRateHertz': 16000,
            'languageCode': parm_speech_origin_language
        },
        'audio': {
            'content': encode_audio(audio_file_path)
            }
        }).execute()
print (response)

In [None]:
if response != {}:
    print (response['results'][0]['alternatives'][0]['transcript'])
    print ('( confidence: %f )' % response['results'][0]['alternatives'][0]['confidence'])

# WIP

In [None]:
from pydub import AudioSegment

In [None]:
AudioSegment.from_file("reference/voice.wav").export("reference/voicewav.mp3", format="mp3")

In [None]:
wav_audio = AudioSegment.from_file("reference/voice.wav", format="wav")

In [None]:
sound = AudioSegment.from_mp3("reference/audio.mp3")

In [None]:
wav_audio = AudioSegment.from_file("voice.wav", format="wav")

In [None]:
wav_audio = AudioSegment.from_file(audio_file_path, format="wav")

In [None]:
import subprocess
import requests
import shutil
import glob
import json

In [None]:
!which sox

In [None]:
audio = 

In [None]:

audio = requests.get('http://somesite.com/some.mp3')
sox = shutil.which('sox') or glob.glob('C:\Program Files*\sox*\sox.exe')[0]
p = subprocess.Popen(sox + ' -t mp3 - -t flac - rate 16k', stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
stdout, stderr = p.communicate(audio.content)
url = 'http://www.google.com/speech-api/v2/recognize?client=chromium&lang=en-US&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'
headers = {'Content-Type': 'audio/x-flac; rate=16000'}
response = requests.post(url, data = stdout, headers = headers).text

result = None
for line in response.split('\n'):
    try:
        result = json.loads(line)['result'][0]['alternative'][0]['transcript']
        break
    except:
        pass

In [None]:
# Running Speech API

def KudosData_voice_to_text(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 地标识别 ]\n'
    # 'LANDMARK_DETECTION'
    if responses['responses'][0] != {}:
        for i in range(len(responses['responses'][0]['landmarkAnnotations'])):
            image_analysis_reply += str(responses['responses'][0]['landmarkAnnotations'][i]['description']) \
            + '\n( score ' +  str(responses['responses'][0]['landmarkAnnotations'][i]['score']) + ' )\n'
    return image_analysis_reply

### * 消息文字的多语言互译 (Text based language translation)

In [None]:
# Running Vision API
# 'TEXT_DETECTION'
def KudosData_TEXT_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 文字提取 ]\n'
    # 'TEXT_DETECTION'
    if responses['responses'][0] != {}:
        image_analysis_reply += u'----- Start Origin Text -----\n'
        image_analysis_reply += u'( Original Language 原文: ' + str(responses['responses'][0]['textAnnotations'][0]['locale']) \
        + ' )\n'        
        image_analysis_reply += responses['responses'][0]['textAnnotations'][0]['description'] + '----- End Origin Text -----\n'

        ##############################################################################################################
        #                                        translation of detected text                                        #
        ##############################################################################################################
        parm_translation_origin_language = str(responses['responses'][0]['textAnnotations'][0]['locale'])
        # Call translation if parm_translation_origin_language is not parm_translation_target_language
        if parm_translation_origin_language != parm_translation_target_language:
            inputs=[responses['responses'][0]['textAnnotations'][0]['description']] # TEXT_DETECTION OCR results only
            outputs = service.translations().list(source=parm_translation_origin_language, 
                                                  target=parm_translation_target_language, q=inputs).execute()
            image_analysis_reply += u'\n----- Start Translation -----\n'
            image_analysis_reply += u'( Target Language 译文: ' + parm_translation_target_language + ' )\n'
            image_analysis_reply += outputs['translations'][0]['translatedText'] + '\n' + '----- End Translation -----\n'
            print('Compeleted: Translation    API ...')
        ##############################################################################################################
        
    return image_analysis_reply

### 用微信App扫QR码图片来自动登录

In [None]:
itchat.auto_login(hotReload=True) # hotReload=True: 退出程序后暂存登陆状态。即使程序关闭，一定时间内重新开启也可以不用重新扫码。
# itchat.auto_login(enableCmdQR=-2) # enableCmdQR=-2: 命令行显示QR图片

In [None]:
@itchat.msg_register([PICTURE])
# @itchat.msg_register([PICTURE], isGroupChat=True)
def download_files(msg):
    parm_translation_origin_language = 'zh' # will be overwriten by TEXT_DETECTION
    msg.download(msg.fileName)
    print('\nDownloaded image file name is: %s' % msg['FileName'])
    image_base64 = encode_image(msg['FileName'])
    
    ##############################################################################################################
    #                                          call image analysis APIs                                          #
    ##############################################################################################################
    
    image_analysis_reply = u'[ Image Analysis Results 图像识别结果 ]\n'

    # 1. LABEL_DETECTION:
    image_analysis_reply += KudosData_LABEL_DETECTION(image_base64, 'LABEL_DETECTION', parm_image_maxResults)
    # 2. LANDMARK_DETECTION:
    image_analysis_reply += KudosData_LANDMARK_DETECTION(image_base64, 'LANDMARK_DETECTION', parm_image_maxResults)
    # 3. LOGO_DETECTION:
    image_analysis_reply += KudosData_LOGO_DETECTION(image_base64, 'LOGO_DETECTION', parm_image_maxResults)
    # 4. TEXT_DETECTION:
    image_analysis_reply += KudosData_TEXT_DETECTION(image_base64, 'TEXT_DETECTION', parm_image_maxResults)
    # 5. FACE_DETECTION:
    image_analysis_reply += KudosData_FACE_DETECTION(image_base64, 'FACE_DETECTION', parm_image_maxResults)
    # 6. SAFE_SEARCH_DETECTION:
    image_analysis_reply += KudosData_SAFE_SEARCH_DETECTION(image_base64, 'SAFE_SEARCH_DETECTION', parm_image_maxResults)

    print('Compeleted: Image Analysis API ...')
    
    return image_analysis_reply

In [None]:
itchat.run()

In [None]:
# interupt kernel, then logout
itchat.logout() # 安全退出

### 恭喜您！已经完成了：
### 第三课：自然语言处理
### Lesson 3: Natural Language Processing
* 消息文字转成语音 (Speech synthesis: text to voice)
* 语音转换成消息文字 (Speech recognition: voice to text)
* 消息文字的多语言互译 (Text based language translation)

### 下一课是:
### 第三课：自然语言处理
### Lesson 3: Natural Language Processing
* 消息文字转成语音 (Speech synthesis: text to voice)
* 语音转换成消息文字 (Speech recognition: voice to text)
* 消息文字的多语言互译 (Text based language translation)

<img src='http://www.kudosdata.com/wp-content/uploads/2016/11/cropped-KudosLogo1.png' width=30% style="float: right;">
<img src='reference/WeChat_SamGu_QR.png' width=10% style="float: left;">

