/
README.txt
190 lines (148 loc) · 8.11 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
Description:
This is a streaming client for Yandex speech recognition service (aka Yandex ASR).
Comparing to http-api it provides much more info about a recognized text and the recognition process itself.
Also it has no limit for an input file length.
Install to Mac OS:
Install python pip & python protobuf using as example MacPorts
(opensource software package manager, instruction for installation here: https://www.macports.org/install.php):
bash-3.2$ sudo port install git py27-pip py27-protobuf
...
Continue? [Y/n]: Y
...
After install PIP & protobuf compilers can checkout speechkit client:
bash-3.2$ git clone https://github.com/yandex/speechkitcloud
...
bash-3.2$ cd speechkitcloud/python
bash-3.2$ protoc -I=asrclient --python_out=asrclient asrclient/*.proto
bash-3.2$ python ./setup.py sdist
...
bash-3.2$ cd dist
bash-3.2$ ls
asrclient-0.5.0.tar.gz
You can have different result filename (more fresh version, etc), use it for pip install
bash-3.2$ sudo pip install asrclient-0.5.0.tar.gz
...
Successfully installed asrclient-0.5.0 futures-3.1.1
If as default used system macos python, than asrclient-cli.py & ttsclient-cli.py installed into folder
/Library/Frameworks/Python.framework/Versions/2.7/bin/
else (as default used python from macports) search it inside folder
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/
Short help can be received via --help option:
bash-3.2$ /Library/Frameworks/Python.framework/Versions/2.7/bin/asrclient-cli.py --help
bash-3.2$ /Library/Frameworks/Python.framework/Versions/2.7/bin/ttsclient-cli.py --help
or for macports python installation:
bash-3.2$ /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/asrclient-cli.py --help
bash-3.2$ /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/ttsclient-cli.py --help
XCode TROUBLESHOOTING:
If after installing macports got error:
Warning: xcodebuild exists but failed to execute
Warning: Xcode does not appear to be installed; most ports will likely fail to build.
use next commands for fix it:
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
xcodebuild -license
Install to Ubuntu/Debian:
You need to provide some python dependencies. Suggest something like this...
sudo apt-get install python2.7 python-setuptools python-pip git protobuf-compiler
git clone https://github.com/yandex/speechkitcloud
cd speechkitcloud/python
protoc -I=asrclient --python_out=asrclient asrclient/*.proto
python ./setup.py sdist
cd dist
sudo pip install <generated-file-name>
...or you can provide the dependencies manually and run ./asrclient-cli.py directly (without install).
1. asrclient-cli.py
Usage:
asrclient-cli.py [OPTIONS] [FILES]...
Options:
-k, --key TEXT You could get it at
https://developer.tech.yandex.ru/. Default
is "paste-your-own-key".
Use "internal" with Speechkit Box.
-s, --server TEXT Default is asr.yandex.net.
-p, --port INTEGER Default is 80.
--format TEXT Input file format. Default is
audio/x-pcm;bit=16;rate=16000.
--model TEXT Recognition model: freeform, maps, general, etc.
Use the last one if your sound comes from a
phone call. It's just a model name, sound
format may be different. Default is
freeform.
--lang TEXT Recognition language. ru-RU | en-EN | tr-TR
| uk-UA. Default is ru-RU.
--chunk-size INTEGER Default value 65536 bytes roughly equals to
one second of audio in default format.
--start-with-chunk INTEGER Use it to send only some part of the input
file. Default is 0.
--max-chunks-count INTEGER Use it to send only some part of the input
file. Default means no limit is set.
--reconnect-delay FLOAT Take a pause in case of network problems.
Default value is 0.5 seconds.
--inter-utt-silence FLOAT A pause between phrases finalization.
Default value is 1.2 seconds.
--cmn-latency INTEGER CMN latency parameter. Default value is 50.
--reconnect-retry-count INTEGER
Sequentional reconnects before giving up.
Default is 5.
--silent Don't print debug messages, only recognized
text.
--record Grab audio from system audio input instead
of files.
--nopunctuation Disable punctuation.
--uuid TEXT UUID of your request. It can be helpful for
further logs analysis. Default is random.
--ipv4 Use ipv4 only connection.
--realtime Emulate realtime record recognition.
--callback-module TEXT Python module name which should implement
advanced_callback(AddDataResponse).
It takes
corresponding protobuf message as a
parameter. See advanced_callback_example.py
for details.
--help Show this message and exit.
Examples:
asrclient-cli.py --help
asrclient-cli.py --key=active-key-from-your-account sound.wav
asrclient-cli.py --key=active-key-from-your-account --silent sound.wav
asrclient-cli.py --key=active-key-from-your-account --silent --callback-module advanced_callback_example sound.wav
More:
We expect incoming sound in specific format audio/x-pcm;bit=16;rate=16000 (single channel).
To convert some random sound file to this, suggest
sox sound.mp3 -t wav -c 1 --rate 16000 -b 16 -e signed-integer sound.wav
2. ttsclient-cli.py
Usage: ttsclient-cli.py [OPTIONS] [FILE] [TEXTS]...
Options:
-k, --key TEXT You could get it at https://developer.tech.yandex.ru/.
Default is "paste-your-own-key".
-s, --server TEXT Default is tts.voicetech.yandex.net.
-p, --port INTEGER Default is 80.
--lang TEXT Synthesis language. ru-RU | en-EN | tr-TR | uk-UA.
Default is ru-RU.
--speaker TEXT Speaker for speech synthesis. Call this script with
--list-speakers flag to get speakers list.
--emotion TEXT Emotion for speech synthesis. Available values: good,
neutral, evil. Default value depends on speaker's
original emotion.
--gender TEXT Speaker's gender for speech synthesis. Available
values: male, female. Default value depends on
speaker's original gender.
--textfile FILENAME Read text from this file instead of command line
arguments.
--uuid TEXT UUID of your request. It can be helpful for further
logs analysis. Default is random.
--ipv4 Use ipv4 only connection.
--list-speakers Only list available speakers, don't try to generate
anything.
--silent Don't print debug messages.
--help Show this message and exit.
Examples:
ttsclient-cli.py --help
ttsclient-cli.py --key=active-key-from-your-account --list-speakers
ttsclient-cli.py --key=active-key-from-your-account --speaker jane --lang en-EN out.wav "Hello!"
ttsclient-cli.py --key=active-key-from-your-account --speaker jane --textfile request.txt out.wav
More:
We generate sound in format audio/x-wav, single channel, 16000Hz, 16-bit signed integer PCM encoding.
Useful links:
http://sox.sourceforge.net/ - sound conversion library and utility.
https://pypi.python.org/pypi/pip - python package manager.
https://developer.tech.yandex.ru - obtain your key.
https://tech.yandex.ru/speechkit/cloud/ - more about Yandex ASR.