Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue running examples #2

Closed
yyf opened this issue May 10, 2017 · 29 comments
Closed

issue running examples #2

yyf opened this issue May 10, 2017 · 29 comments

Comments

@yyf
Copy link

yyf commented May 10, 2017

Hi,
Thanks for the great library. Having some issue running the examples:

For pitch_morph_example.py

`Traceback (most recent call last):
File "pitch_morph_example.py", line 63, in
praatEXE=praatEXE)
File "build/bdist.macosx-10.12-x86_64/egg/promo/f0_morph.py", line 84, in f0Morph
promo.f0_morph.MissingPitchDataException:

No data points available in a region for morphing.
Two data points are needed in each region to do the morph
Regions with fewer than two samples are skipped, which should be fine for some cases (e.g. unvoiced segments).
If you need more data points, see promo.morph_utils.interpolation`

Maybe I miss some steps?

@timmahrt
Copy link
Owner

There was a bug in one of the dependencies praatio, which I patched last night. When did you install promo/praatio?

Try doing a fresh reinstall:
pip install promo --upgrade

Or at least reinstall praatio:
https://github.com/timmahrt/praatIO

If you still get the error, let me know. Thanks!

@timmahrt
Copy link
Owner

For clarification, it was the exact same error (same file and same line):
https://travis-ci.org/timmahrt/ProMo/jobs/230355605

@yyf
Copy link
Author

yyf commented May 11, 2017

Thanks @timmahrt upgraded both resolved the issue.

Still trying to get myself familiarized with praat, wondering what steps are needed to create the TextGrid file a arbitrary wav file? so as to put in the files folder.

Can you provide an example using praatIO to create textgrids using data from other sources?

Thanks

@timmahrt
Copy link
Owner

To create an empty textgrid for an arbitrary audio file, the only piece of information you need is the duration of the audio file. The textgrid also must have at least one tier. A tier is either a point tier or an interval tier. If you wanted to mark all of the places where there was audio clipping, you'd use a point tier. If you wanted to mark all of the words in a recording, you'd use an interval tier, for example.

import os
from praatio import tgio
from praatio import audioio

wavFN = "Full/path/to/file/myAudio.wav"
tgFN = os.path.splitext(wavFN)[0] + ".TextGrid"

duration = audioio.WavQueryObj(wavFN).getDuration()
tier = tgio.IntervalTier("words", [], 0, duration) # TierName, List of intervals, tier start time, tier end time

tg = tgio.TextGrid()
tg.addTier(tier)
tg.save(tgFN)

Later on you can access the list of intervals or points using:
tier.entryList

If you do alter it, it's generally best to work on a fresh copy of the list and create a new version of the tier and textgrid like so:
newEntryList = [(start, stop, "-") for start, stop, label in tier.entryList] # Replacing labels with '-'
tg.replaceTier(tier.name, newEntryList)

Hmmm....I'm going to write up some formal tutorial to the library.

Let me know if you have any more questions!

@yyf
Copy link
Author

yyf commented May 11, 2017

Thanks, this is helpful and a formal tutorial will be great too.

Tried to run the pitch morph example using two audio files with their associated TextGrid files, but ran into Key error:

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in ()
15 # We'll use textgrids for this purpose.
16 tierName = "PhonAlign"
---> 17 fromPitch = f0_morph.getPitchForIntervals(fromPitch, fromTGFN, tierName)
18 toPitch = f0_morph.getPitchForIntervals(toPitch, toTGFN, tierName)

/usr/local/lib/python2.7/site-packages/promo-1.2.5-py2.7.egg/promo/f0_morph.pyc in getPitchForIntervals(data, tgFN, tierName)
37 '''
38 tg = tgio.openTextGrid(tgFN)
---> 39 data = tg.tierDict[tierName].getValuesInIntervals(data)
40 data = [dataList for _, dataList in data]
41

KeyError: 'PhonAlign'`

Need to specify PhonAlign as Key in writing a TextGrid file?

@timmahrt
Copy link
Owner

You are using your own wav and textgrid files?

With morphing, an important idea is that you can choose the regions to morph. At a base level, no textgrid is necessary. You can just morph the pitch contour of one file to that of another (I'll come back to this in a bit).

However, if we want to morph regions, we need to have the same number of regions in the target and source files. In pitch_morph_example.py, I use textgrids for this purpose. You can call your tiers whatever they want. In the example files, the target tier name is "PhonAlign".

For your data, you should use whatever tiers you want and make sense for your data. "Word"? "Utterances"? Etc. "PhonAlign" is not a magic or reserved word. It's just what I picked in the example file.

Let's say in your source and target textgrids, there is a tier called "word". In that case, you should put
tierName = "word"
And the two tiers should have the same number of labeled segments. They can be labeled anything (except empty strings or pure whitespace) and the labeled intervals do not have to have the same duration. They just have to have the same number.

Let's say your tiers have three labeled intervals each. The command f0_morph.getPitchForIntervals() should return a list with three sublists. Each sublist contains f0 data for that segment. The f0 data are the raw pitch values recorded at regular intervals.
fromPitch = f0_morph.getPitchForIntervals(fromPitch, fromTGFN, tierName)

fromPitch --> [[110, 111, 109], [170, 165, 160], [98, 100, 105, 110, 115]]

Ok, so if you just want to morph one utterance to another without bothering with indivual segments, you don't even need textgrids. You can just do this:
fromPitch = [fromPitch, ]

f0Morph expects a list of lists. audioToPI returns a list. So if you just want to morph across a whole utterance, the above trick will do what you need. For individual sentences or segments shorter than that, this will may work ok. For longer segments, the results will be garbage.

@timmahrt
Copy link
Owner

To be short and explicit: The error you received is saying that there is no tier in your textgrid called 'PhonAlign'. You should change tierName to match one of the interval tiers in your textgrid. That tier must have at least one labeled interval. The number of intervals in that tier must match between the two textgrids.

@timmahrt
Copy link
Owner

timmahrt commented May 12, 2017

Progress is coming along on the tutorial for praatio. I hope it will be useful (to the community at large). I'll post here once I upload something.

@yyf
Copy link
Author

yyf commented May 15, 2017

Thanks for the detailed explanation, and yes, trying to use my own wav files and their associated textgrid files. Still trying to figure out how to set tierName properly. How should I examine the interval tiers in my textgrids? I assume just open the textgrid file in the praat app? Do I have to manually label the interval or it's supposed to be there already in the file generated by pitch_morph_example.py?

Would be helpful to have a simple morphing example and more advanced ones, where the simple one is just morphing one to another without textgrid/segment, fromPitch = [fromPitch, ]

@timmahrt
Copy link
Owner

timmahrt commented May 15, 2017

Still trying to figure out how to set tierName properly.
I'm not sure I understand. Can you explain the nature of your data? You have a collection of audio files I'm assuming. Do you have existing transcripts? Are the transcripts in .TextGrid format?

What is the task that you would like to do? For example, along the lines of:
"I have some sentence-long recordings and I would like to morph the pitch between different files, at the word level, but I haven't transcribed my data yet".

Do I have to manually label the interval or it's supposed to be there already in the file generated by pitch_morph_example.py?

Yes, unfortunately the intervals will have to be created by some other system.

You can always manually create the intervals in praat. If you have never used praat to annotate audio files, this is a good tutorial that covers the basics:
https://youtu.be/64cclyKVJZ4?t=100

I recommend opening up the examples provided in promo using praat. And then open up your own audio files in praat. I think it might make it clearer what the textgrid is.

If you have a lot of data, this might not be practical or possible. There are ways to automatically annotate your data. Depending on your data and the task you want to do, this could be easy or it could be difficult. For example, if you have clean recordings of English sentences where the speaker was reading out sentences from a script, you can used a forced aligner like SPPAS or easy align (a plugin for praat) which will automatically transcribe your data with high accuracy for free.
http://www.sppas.org/
http://latlcui.unige.ch/phonetique/easyalign.php

Would be helpful to have a simple morphing example and more advanced ones, where the simple one is just morphing one to another without textgrid/segment, fromPitch = [fromPitch, ]

Here is an example that does not use textgrids. This will be added to a promo tutorial (which I'll work on after I finish the praatio tutorial):
https://www.dropbox.com/s/9bext4torjziexc/morph_examples_no_textgrids.py?dl=0

@yyf
Copy link
Author

yyf commented May 15, 2017

Noticed the textgrid file of my own wav file doesn't have the info as in your mary1.TextGrid file. Wondering what's the process to properly generate a textgrid file? in the standalone praat app first?

@timmahrt
Copy link
Owner

If you want to create a TextGrid file manually in praat, this video shows how
https://youtu.be/64cclyKVJZ4?t=100

Earlier I gave an example of how to programmatically generate TextGrid files from audio. Did you have problems running this code or did you have questions about it?

import os
from praatio import tgio
from praatio import audioio

wavFN = "Full/path/to/file/myAudio.wav"
tgFN = os.path.splitext(wavFN)[0] + ".TextGrid"

duration = audioio.WavQueryObj(wavFN).getDuration()
tier = tgio.IntervalTier("words", [], 0, duration) # TierName, List of intervals, tier start time, tier end time

tg = tgio.TextGrid()
tg.addTier(tier)
tg.save(tgFN)

@yyf
Copy link
Author

yyf commented May 15, 2017

Sorry for the confusion. Was able to programmatically generate textgrid file. Diving into the video tutorials now and will see if I can get the example running using my own wav files.

A flow chart that illustrates how to use ProMo with other systems, i.e. annotation in praat, could be helpful, too.

BTW, the example without using textgrid works. Thanks.

@timmahrt
Copy link
Owner

How goes transcribing your textgrids and using praat?

I've released a new version of praatio and ProMo. I updated lots of documentation and tried to streamline the interface. It's hopefully easier to use now.

pip install praatio --upgrade
pip install promo --upgrade

I've finished the first praatio tutorial:
https://nbviewer.jupyter.org/github/timmahrt/praatIO/blob/master/tutorials/tutorial1_intro_to_praatio.ipynb
or find it in the /tutorials/ folder of praatio:
https://github.com/timmahrt/praatIO

If you go through it, I'd appreciate any feedback you have.

I'll need to step away from this for a while. Maybe I can work on the promo tutorial over the weekend.

@yyf
Copy link
Author

yyf commented May 17, 2017

Still work in progress, but i went through the PraatIO tutorial. It's super informative, thanks for writing it up. Will be interesting to see some tutorial on ProMo too. In ProMo, any fundamental limitation on speech re synthesis in terms of perceptual quality?

@timmahrt
Copy link
Owner

timmahrt commented May 17, 2017

There are roughly three limitations (that I can think of at the moment).

  1. The more you manipulate the pitch contour, the more distorted the signal becomes. If you take a word that has a sharp fall and you turn it into a sharp rise, you can expect distortion. It may or may not affect what you are trying to do.

  2. There can be correlations between segmental and prosodic phenomenon. For example, in English there is a phenomenon called focus, that is used, among other things, to introduce new information ("Who ate the cheese? [Tom] ate the cheese." 'Tom' is focused. 'Cheese' would be given or 'unfocused') For words with focus, they'll receive a pitch accent and greater articulation than words without focus. In the example I gave, 'Tom' will be produced with greater articulation and cheese with less.

Let's say you reversed the contour. You map the pitch of "Tom ate the [cheese]" onto "[Tom] ate the cheese". It might sound ok. Or it might not sound ok because the pitch contour mismatches with the focus information in the consonants and vowels.

If you can carefully control how the sentences are produced, it's possible to get around this issue. And it might not be a problem at all, but it has been a problem before in my data.

  1. Are you familiar with voicing?
    https://en.wikipedia.org/wiki/Voice_(phonetics)

Pitch is conveyed through F0, which only exists for voiced segments. Vowels are voiced but many consonants are not voiced. If your pitch manipulations are fine grained and you have lots of voiceless consonants in your utterances, there may be no audible difference in the resynthesized recordings.

@yyf
Copy link
Author

yyf commented May 25, 2017

Thanks @timmahrt

  1. wondering if there is any quantitative or statistical metrics of these limitations, for example, the maximum pitch change for a given duration while maintaining the identity of the voice?
  2. to articulate the focus, curious what are some common control parameters other than intensity, pitch, and duration? in general and in Praat. This might be a bit off the original topic if you don't mind.
  3. in terms of the degree of voicing, what might be the closest to 'voice onset time' in Praat?

I guess there is not a automatic way to separate voiced and voiceless segments yet. This will still have to be done manually at the stage of annotation in TextGrid, correct? Really appreciate your detailed explanation and tutorials.

@timmahrt
Copy link
Owner

  1. Identity is rather subjective and includes voice qualities other than just pitch. You can make someone's voice sound deeper or higher, but how much manipulation is necessary to 'trick' someone into thinking the speaker is someone else, is dependent on the speaker and the particular listener, I guess.

I will cover this point a bit in my ProMo tutorial, with some examples.

If you're trying to change the speaker's identity, you might have fun playing with the changeGender function in praat. Select an audio file in praat. Then press Convert >> Change Gender.

or in praatio:

from praatio import praat_scripts
praat_scripts.changeGender()

  1. If you wanted to control for non-prosodic aspects of focus you would have to use splicing. Splicing involves inserting a speech segment into a place where it wasn't said. So we could produce the utterance 'Bob' and "Father". Then using splicing, we could create new words like "Fob" or "Bother". If you've transcribed each phone, the splicing process can be done automatically using praatio. Splicing works best if the speaker is the same both recording samples and if the replaced material was said in the same context as the new material--thanks to coarticulation effects.

Splicing is a general use technique. If you are working on very specific sounds, you might be able to apply a sound-specific solution. For example, there has been a lot of work been done on manipulation of voice onset timing.

Just this week I was working with a focused production of the word 'him' that I needed unfocused. Manipulating pitch was not enough but I found that by removing about half of the 'h' sound led to a more natural unfocused production of 'him' (I didn't even need to worry about the 'i' or 'm'). I determined this ahead of time by getting the duration of 'h' when 'him' is focused and 'him' is unfocused and found them to be very different (~0.1 seconds long compared to ~0.03 seconds long in my small lab-produced dataset). Unvoiced fricatives can generally be chopped up without much care because they're just noise. More care is needed with other sounds.

  1. Voice onset time is the time when voicing begins after a burst. For voiced stops like 'b' it can be negative (voicing begins before for the burst--sometimes in American English 'bye' will be emphasized by starting the voicing earlier. The end result comes out like 'mbye'). Otherwise, VOT will be positive but very small and for unvoiced stops like 'p' it will be a larger, positive value.

What are you trying to do with VOT? I have a colleague who works on manipulating VOT if you have questions. From her I get the impression that it is not easy to get good quality results

I haven't used it before, but there is a tool for automatically measuring VOT:
https://github.com/mlml/autovot

I guess there is not a automatic way to separate voiced and voiceless segments yet.

This is actually trivial to do in praat. Select an audio file in praat and click "view". In the window that pops up, select the far right option "Pulses >> Show pulses". These pulses are "glottal pulses"--each is one movement of the vocal folds.

What do you want to do with that information?

This will still have to be done manually at the stage of annotation in TextGrid, correct?

What are you trying to annotate? If you want to annotate sub-phonemic information (like VOT), then yes, you'll likely have to do that by hand. If you want to annotate the words in an utterance, you will have to do that manually unless you want to see if some speech recognition tools work you for. If you want to annotate the phones in an utterance, I recommend you use a forced aligner like sppas or easy align (I linked to these earlier).

Really appreciate your detailed explanation and tutorials.

My pleasure! I've put quite a bit of work into the ProMo tutorial but it still needs some work. Maybe I can finish it this weekend.

@timmahrt
Copy link
Owner

I've got the pitch manipulation tutorial online. Two parts are up with another two planned.
https://nbviewer.jupyter.org/github/timmahrt/ProMo/blob/master/tutorials/tutorial1_1_intro_to_promo.ipynb

If you have a chance to go through it and have any feedback, please let me know!

@yyf
Copy link
Author

yyf commented Jun 29, 2017

Thanks for the tutorials, they are helpful.

Ran into this issue AttributeError: 'module' object has no attribute 'audioToPI'in a virtual environment (flask) when I called the following:

fromPitch = pitch_and_intensity.audioToPI(root, fromWavFN, root, fromPitchFN, praatEXE, minPitch, maxPitch, forceRegenerate=False)

Double checked it works fine in my iPython environment and regular python environment. Any idea why the submodule is not there under a virtual environment?

Thanks.

@timmahrt
Copy link
Owner

Glad to hear the tutorials are helpful. I'm pretty busy at the moment so it will likely be a few months before I can add more, but I do have plans for them eventually.

I believe you just need to update your praatio library in your library and change the instances of audioToPI() to extractPI().

If that doesn't work, let me know.
Thanks!

@yyf
Copy link
Author

yyf commented Jul 5, 2017

Got a different error after changing it to extractPI(), any suggestion?
Maybe related to my use of os.path.abspath in a virtualenv?

File "/Users/...app.py", line 121, in up
    fromPitch = pitch_and_intensity.extractPI(root, fromWavFN, root, fromPitchFN, praatEXE, minPitch, maxPitch, forceRegenerate=False)
  
File "/Users/...venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py", line 284, in extractPI
    pitchQuadInterp=pitchQuadInterp)
  
File "/Users/...venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py", line 97, in _extractPIFile
    utils.makeDir(outputPath)
  
File "/Users/...venv/lib/python2.7/site-packages/praatio/utilities/utils.py", line 146, in makeDir
    os.mkdir(path)

OSError: [Errno 2] No such file or directory: ''

@timmahrt
Copy link
Owner

timmahrt commented Jul 6, 2017

Sorry for the the problems. The function extractPI() takes different arguments than the old audioToPI(). Here is the new function argument list:

extractPI(inputFN, outputFN, praatEXE,
minPitch, maxPitch, sampleStep=0.01,
silenceThreshold=0.03, forceRegenerate=True,
tgFN=None, tierName=None, tmpOutputPath=None,
undefinedValue=None, medianFilterWindowSize=0,
pitchQuadInterp=False)

The older function used to take the file path and the file name as separate arguments, while all of my other functions took the full path to a file as an argument. I made this change so that this function is more consistent with other behavior in my code.

My last message stated that no further changes would be needed but that wasn't true. Sorry for the error.

@yyf
Copy link
Author

yyf commented Jul 6, 2017

Thanks, it solved the argument issue, but led to Praat execution failed in the same virtualenv. I'm still checking if all my paths/arguments are correct.

Traceback (most recent call last):
  ...
  File "...//app.py", line 143, in up
    pitchQuadInterp=False)
  File "...//venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py", line 284, in extractPI
    pitchQuadInterp=pitchQuadInterp)
  File "...//venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py", line 120, in _extractPIFile
    utils.runPraatScript(praatEXE, scriptFN, argList)
  File "...//venv/lib/python2.7/site-packages/praatio/utilities/utils.py", line 208, in runPraatScript
    raise PraatExecutionFailed(cmdList)
PraatExecutionFailed: 
Praat Execution Failed.  Please check the following:
- Praat exists in the location specified
- Praat script can execute ok outside of praat
- script arguments are correct

If you can't locate the problem, I recommend using absolute paths rather than relative paths and using paths without spaces in any folder or file names

Here is the command that python attempted to run:
/Applications/Praat.app/Contents/MacOS/Praat --run ...//venv/lib/python2.7/site-packages/praatio/praatScripts/get_pitch_and_intensity.praat .../myFolder/17-07-05_21-29-08.wav .../myFolder/17-07-05_21-29-08.txt 0.01 50 350 0.03 -1 -1 0 0

@timmahrt
Copy link
Owner

timmahrt commented Jul 6, 2017 via email

@yyf
Copy link
Author

yyf commented Jul 6, 2017

It was indeed a full path issue. extractPI() is working in my virtualenv : ]

Exploring F0morph() now. Wondering what's the recommended range of file length difference (millisecond or sample) for F0morph() to work nicely? i.e. the duration difference between one wav file and the other. Is there need to preprocess the files so they roughly align within certain percentage in terms of silence and voiced sections?

Thanks

@timmahrt
Copy link
Owner

timmahrt commented Jul 6, 2017 via email

@timmahrt
Copy link
Owner

If it's more convenient, I've set up a gitter page which has public and private messaging:
https://gitter.im/pythonProMo/Lobby

Also, I don't think I answered your question:
"Is there need to preprocess the files so they roughly align within certain percentage in terms of silence and voiced sections?"

Absolutely not. However, silence and voiced sections do pose a problem. The pitch tracker will have no data for those areas. To get around this issue, the function praatio.pitch_and_intensity.extractPitch() has an optional argument 'pitchQuadInterp'. If true, the pitch contour will be interpolated.

This is good for very short silences and for unvoiced regions. It probably is not appropriate for long silences. For example, if someone is reading sentences and pauses after each sentence. In cases like that, you would need to preprocess the speech into chunks.

@yyf
Copy link
Author

yyf commented Jul 29, 2017

Thanks for setting gitter up, was thinking about the same.

Also, working primarily with English as normal wav files. Gonna try out the interp option.

Closing the issue now.

@yyf yyf closed this as completed Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants