This document explains how to set up the text-based speech editor, as seen in Content-based Tools for Editing Audio Stories.
Download and install Vagrant
Download and install VirtualBox (or possibly another Vagrant host like VMWare Fusion).
Clone the speech editor repository:
# (> means a prompt on your computer)
> git clone https://github.com/ucbvislab/speecheditor.git
> cd speecheditor
> git submodule init
> git submodule update
In the /speecheditor
directory, spin up the vagrant enivornment by running
> vagrant up
This will take time: vagrant will set up a virtualmachine with all the requirements necessary to run the speech editor.
Now, to run the speech editor:
# ssh into your vagrant box
> vagrant ssh
# the /speecheditor is mounted on the vagrant box at /vagrant, so cd to there
# (the $ means a prompt inside the vagrant box)
$ cd /vagrant
# install python requirements
$ pip install -r requirements.txt
# run the speech editor!
$ python app.py
To access the speech editor, go to http://localhost:8080 in Chrome only.
See the tutorial for how to use the speech editor at http://localhost:8080/static/tutorial/index.html.
You are free to edit the source code on your computer; everything will be shared between your main computer and the virtual vagrant box.
If you change the javascript or coffeescript, run
$ grunt dev
inside the vagrant box to regenerate the javascript.
You can exit the ssh session with
$ exit
Once you're done using the speech editor, free up the system resources taken up by the vagrant box by running:
# from the /speecheditor directory on your computer
> vagrant halt
Then, vagrant up
, vagrant ssh
, cd /vagrant
and python app.py
to start it up again.
There is a bit more setup you need to do to add your own speech tracks to the speech editor.
You need to get HTK 3.4. First, register here: http://htk.eng.cam.ac.uk/register.shtml
Once you have a username and password, run this in the vagrant box:
$ sh alignment-setup.sh
This will prompt you for your HTK username and password. It will then download and install HTK 3.4 and p2fa-vislab (a wrapper for HTK's HVite).
Once you have successfully run alignment-setup.sh
, you can analyze your own speech tracks.
Add your new speech track mp3 file at /speecheditor/static/speechtracks/{track-name}.mp3
. Also add the text transcript of the speech track at /speecheditor/static/speechtracks/{track-name}.txt
. Then, in the vagrant box, run
$ cd vagrant
$ python analyze_speech.py {track-name}
Once this finished (note: it may take a while if the speech is long), your track will show up in the new composition dialog in the speech editor.
If you have access to the music browser data files (private access only due to copyrights): put the musicbrowser
folder inside of /speecheditor/static
. Also put the music_browser_app
folder inside of /speecheditor
. Instead of using python app.py
run the speech editor with:
$ python app.py --music-browser