-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue running examples #2
Comments
There was a bug in one of the dependencies praatio, which I patched last night. When did you install promo/praatio? Try doing a fresh reinstall: Or at least reinstall praatio: If you still get the error, let me know. Thanks! |
For clarification, it was the exact same error (same file and same line): |
Thanks @timmahrt upgraded both resolved the issue. Still trying to get myself familiarized with praat, wondering what steps are needed to create the TextGrid file a arbitrary wav file? so as to put in the files folder. Can you provide an example using praatIO to create textgrids using data from other sources? Thanks |
To create an empty textgrid for an arbitrary audio file, the only piece of information you need is the duration of the audio file. The textgrid also must have at least one tier. A tier is either a point tier or an interval tier. If you wanted to mark all of the places where there was audio clipping, you'd use a point tier. If you wanted to mark all of the words in a recording, you'd use an interval tier, for example. import os wavFN = "Full/path/to/file/myAudio.wav" duration = audioio.WavQueryObj(wavFN).getDuration() tg = tgio.TextGrid() Later on you can access the list of intervals or points using: If you do alter it, it's generally best to work on a fresh copy of the list and create a new version of the tier and textgrid like so: Hmmm....I'm going to write up some formal tutorial to the library. Let me know if you have any more questions! |
Thanks, this is helpful and a formal tutorial will be great too. Tried to run the pitch morph example using two audio files with their associated TextGrid files, but ran into Key error: `--------------------------------------------------------------------------- /usr/local/lib/python2.7/site-packages/promo-1.2.5-py2.7.egg/promo/f0_morph.pyc in getPitchForIntervals(data, tgFN, tierName) KeyError: 'PhonAlign'` Need to specify PhonAlign as Key in writing a TextGrid file? |
You are using your own wav and textgrid files? With morphing, an important idea is that you can choose the regions to morph. At a base level, no textgrid is necessary. You can just morph the pitch contour of one file to that of another (I'll come back to this in a bit). However, if we want to morph regions, we need to have the same number of regions in the target and source files. In pitch_morph_example.py, I use textgrids for this purpose. You can call your tiers whatever they want. In the example files, the target tier name is "PhonAlign". For your data, you should use whatever tiers you want and make sense for your data. "Word"? "Utterances"? Etc. "PhonAlign" is not a magic or reserved word. It's just what I picked in the example file. Let's say in your source and target textgrids, there is a tier called "word". In that case, you should put Let's say your tiers have three labeled intervals each. The command f0_morph.getPitchForIntervals() should return a list with three sublists. Each sublist contains f0 data for that segment. The f0 data are the raw pitch values recorded at regular intervals.
Ok, so if you just want to morph one utterance to another without bothering with indivual segments, you don't even need textgrids. You can just do this: f0Morph expects a list of lists. audioToPI returns a list. So if you just want to morph across a whole utterance, the above trick will do what you need. For individual sentences or segments shorter than that, this will may work ok. For longer segments, the results will be garbage. |
To be short and explicit: The error you received is saying that there is no tier in your textgrid called 'PhonAlign'. You should change tierName to match one of the interval tiers in your textgrid. That tier must have at least one labeled interval. The number of intervals in that tier must match between the two textgrids. |
Progress is coming along on the tutorial for praatio. I hope it will be useful (to the community at large). I'll post here once I upload something. |
Thanks for the detailed explanation, and yes, trying to use my own wav files and their associated textgrid files. Still trying to figure out how to set tierName properly. How should I examine the interval tiers in my textgrids? I assume just open the textgrid file in the praat app? Do I have to manually label the interval or it's supposed to be there already in the file generated by pitch_morph_example.py? Would be helpful to have a simple morphing example and more advanced ones, where the simple one is just morphing one to another without textgrid/segment, fromPitch = [fromPitch, ] |
What is the task that you would like to do? For example, along the lines of:
Yes, unfortunately the intervals will have to be created by some other system. You can always manually create the intervals in praat. If you have never used praat to annotate audio files, this is a good tutorial that covers the basics: I recommend opening up the examples provided in promo using praat. And then open up your own audio files in praat. I think it might make it clearer what the textgrid is. If you have a lot of data, this might not be practical or possible. There are ways to automatically annotate your data. Depending on your data and the task you want to do, this could be easy or it could be difficult. For example, if you have clean recordings of English sentences where the speaker was reading out sentences from a script, you can used a forced aligner like SPPAS or easy align (a plugin for praat) which will automatically transcribe your data with high accuracy for free.
Here is an example that does not use textgrids. This will be added to a promo tutorial (which I'll work on after I finish the praatio tutorial): |
Noticed the textgrid file of my own wav file doesn't have the info as in your mary1.TextGrid file. Wondering what's the process to properly generate a textgrid file? in the standalone praat app first? |
If you want to create a TextGrid file manually in praat, this video shows how Earlier I gave an example of how to programmatically generate TextGrid files from audio. Did you have problems running this code or did you have questions about it? import os wavFN = "Full/path/to/file/myAudio.wav" duration = audioio.WavQueryObj(wavFN).getDuration() tg = tgio.TextGrid() |
Sorry for the confusion. Was able to programmatically generate textgrid file. Diving into the video tutorials now and will see if I can get the example running using my own wav files. A flow chart that illustrates how to use ProMo with other systems, i.e. annotation in praat, could be helpful, too. BTW, the example without using textgrid works. Thanks. |
How goes transcribing your textgrids and using praat? I've released a new version of praatio and ProMo. I updated lots of documentation and tried to streamline the interface. It's hopefully easier to use now. pip install praatio --upgrade I've finished the first praatio tutorial: If you go through it, I'd appreciate any feedback you have. I'll need to step away from this for a while. Maybe I can work on the promo tutorial over the weekend. |
Still work in progress, but i went through the PraatIO tutorial. It's super informative, thanks for writing it up. Will be interesting to see some tutorial on ProMo too. In ProMo, any fundamental limitation on speech re synthesis in terms of perceptual quality? |
There are roughly three limitations (that I can think of at the moment).
Let's say you reversed the contour. You map the pitch of "Tom ate the [cheese]" onto "[Tom] ate the cheese". It might sound ok. Or it might not sound ok because the pitch contour mismatches with the focus information in the consonants and vowels. If you can carefully control how the sentences are produced, it's possible to get around this issue. And it might not be a problem at all, but it has been a problem before in my data.
Pitch is conveyed through F0, which only exists for voiced segments. Vowels are voiced but many consonants are not voiced. If your pitch manipulations are fine grained and you have lots of voiceless consonants in your utterances, there may be no audible difference in the resynthesized recordings. |
Thanks @timmahrt
I guess there is not a automatic way to separate voiced and voiceless segments yet. This will still have to be done manually at the stage of annotation in TextGrid, correct? Really appreciate your detailed explanation and tutorials. |
I will cover this point a bit in my ProMo tutorial, with some examples. If you're trying to change the speaker's identity, you might have fun playing with the changeGender function in praat. Select an audio file in praat. Then press Convert >> Change Gender. or in praatio: from praatio import praat_scripts
Splicing is a general use technique. If you are working on very specific sounds, you might be able to apply a sound-specific solution. For example, there has been a lot of work been done on manipulation of voice onset timing. Just this week I was working with a focused production of the word 'him' that I needed unfocused. Manipulating pitch was not enough but I found that by removing about half of the 'h' sound led to a more natural unfocused production of 'him' (I didn't even need to worry about the 'i' or 'm'). I determined this ahead of time by getting the duration of 'h' when 'him' is focused and 'him' is unfocused and found them to be very different (~0.1 seconds long compared to ~0.03 seconds long in my small lab-produced dataset). Unvoiced fricatives can generally be chopped up without much care because they're just noise. More care is needed with other sounds.
What are you trying to do with VOT? I have a colleague who works on manipulating VOT if you have questions. From her I get the impression that it is not easy to get good quality results I haven't used it before, but there is a tool for automatically measuring VOT:
This is actually trivial to do in praat. Select an audio file in praat and click "view". In the window that pops up, select the far right option "Pulses >> Show pulses". These pulses are "glottal pulses"--each is one movement of the vocal folds. What do you want to do with that information?
What are you trying to annotate? If you want to annotate sub-phonemic information (like VOT), then yes, you'll likely have to do that by hand. If you want to annotate the words in an utterance, you will have to do that manually unless you want to see if some speech recognition tools work you for. If you want to annotate the phones in an utterance, I recommend you use a forced aligner like sppas or easy align (I linked to these earlier).
My pleasure! I've put quite a bit of work into the ProMo tutorial but it still needs some work. Maybe I can finish it this weekend. |
I've got the pitch manipulation tutorial online. Two parts are up with another two planned. If you have a chance to go through it and have any feedback, please let me know! |
Thanks for the tutorials, they are helpful. Ran into this issue
Double checked it works fine in my iPython environment and regular python environment. Any idea why the submodule is not there under a virtual environment? Thanks. |
Glad to hear the tutorials are helpful. I'm pretty busy at the moment so it will likely be a few months before I can add more, but I do have plans for them eventually. I believe you just need to update your praatio library in your library and change the instances of audioToPI() to extractPI(). If that doesn't work, let me know. |
Got a different error after changing it to extractPI(), any suggestion?
|
Sorry for the the problems. The function extractPI() takes different arguments than the old audioToPI(). Here is the new function argument list: extractPI(inputFN, outputFN, praatEXE, The older function used to take the file path and the file name as separate arguments, while all of my other functions took the full path to a file as an argument. I made this change so that this function is more consistent with other behavior in my code. My last message stated that no further changes would be needed but that wasn't true. Sorry for the error. |
Thanks, it solved the argument issue, but led to Praat execution failed in the same virtualenv. I'm still checking if all my paths/arguments are correct.
|
Are you using full paths? That output looks strange. You should be able
to copy and paste the output (the bit under "here is the command that
python tried to run") into a command window and have it run independently
of python. If it can run ok in the command window then python should be
able to run it too.
And if it can't run ok in the command window, then python won't be able to
run it.
Does that help?
Tim
On Jul 6, 2017 6:59 AM, "YYF" <notifications@github.com> wrote:
Thanks, it solved the argument issue, but led to Praat execution failed in
the same virtualenv. I'm still checking if all my paths/arguments are
correct.
Traceback (most recent call last):
...
File "...//app.py", line 143, in up
pitchQuadInterp=False)
File "...//venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py",
line 284, in extractPI
pitchQuadInterp=pitchQuadInterp)
File "...//venv/lib/python2.7/site-packages/praatio/pitch_and_intensity.py",
line 120, in _extractPIFile
utils.runPraatScript(praatEXE, scriptFN, argList)
File "...//venv/lib/python2.7/site-packages/praatio/utilities/utils.py",
line 208, in runPraatScript
raise PraatExecutionFailed(cmdList)
PraatExecutionFailed:
Praat Execution Failed. Please check the following:
- Praat exists in the location specified
- Praat script can execute ok outside of praat
- script arguments are correct
If you can't locate the problem, I recommend using absolute paths
rather than relative paths and using paths without spaces in any
folder or file names
Here is the command that python attempted to run:
/Applications/Praat.app/Contents/MacOS/Praat --run
...//venv/lib/python2.7/site-packages/praatio/praatScripts/get_pitch_and_intensity.praat
.../myFolder/17-07-05_21-29-08.wav .../myFolder/17-07-05_21-29-08.txt
0.01 50 350 0.03 -1 -1 0 0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AChv0DdUprMKLvF8sYtHxnY2fs350YnJks5sLGnEgaJpZM4NXFiE>
.
|
It was indeed a full path issue. extractPI() is working in my virtualenv : ] Exploring F0morph() now. Wondering what's the recommended range of file length difference (millisecond or sample) for F0morph() to work nicely? i.e. the duration difference between one wav file and the other. Is there need to preprocess the files so they roughly align within certain percentage in terms of silence and voiced sections? Thanks |
F0morph() does not require the two files to be the same length. F0morph()
uses proportion time for the target pitch contours (it will map the start
of contour A to the start of contour B and the end of contour A to the end
of contour B--regardless of what times the starts and ends occur at.
The answer to your question depends on A) the language and B) the kinds of
recordings you are morphing.
Japanese, French, English, and Chinese all use word-level intonation very
differently.
If you are working with recordings that are very similar, you might not
need to change anything, even if the durations are different. e.g.
John kicked the ball to Mike
and
Bob lobbed the can at Todd.
You can probably morph between those with no problem. But for a
structurally different sentence like
Tom praised Fred for winning
or even worse
For winning, Tom praised Fred
the output won't make sense.
In Chinese, words are differentiated by the shape of the f0 contour that
falls over the word. It probably doesn't make sense to map the pitch
between different sentences.
What language are you trying to work with and what kind of data do you have?
Tim Mahrt
Post-Doctoral Researcher
Laboratoire Parole et Langage
Aix-Marseille Université
www.timmahrt.com
…On Thu, Jul 6, 2017 at 5:33 PM, YYF ***@***.***> wrote:
It was indeed a full path issue. extractPI() is working in my virtualenv :
]
Exploring F0morph() now. Wondering what's the recommended range of file
length difference (millisecond or sample) for F0morph() to work nicely?
i.e. the duration difference between one wav file and the other. Is there
need to preprocess the files so they roughly align within certain
percentage in terms of silence and voiced sections?
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AChv0NGZMbrNZ1ZTO_xX3IAdCDcS4w0uks5sLP41gaJpZM4NXFiE>
.
|
If it's more convenient, I've set up a gitter page which has public and private messaging: Also, I don't think I answered your question: Absolutely not. However, silence and voiced sections do pose a problem. The pitch tracker will have no data for those areas. To get around this issue, the function praatio.pitch_and_intensity.extractPitch() has an optional argument 'pitchQuadInterp'. If true, the pitch contour will be interpolated. This is good for very short silences and for unvoiced regions. It probably is not appropriate for long silences. For example, if someone is reading sentences and pauses after each sentence. In cases like that, you would need to preprocess the speech into chunks. |
Thanks for setting gitter up, was thinking about the same. Also, working primarily with English as normal wav files. Gonna try out the interp option. Closing the issue now. |
Hi,
Thanks for the great library. Having some issue running the examples:
For pitch_morph_example.py
`Traceback (most recent call last):
File "pitch_morph_example.py", line 63, in
praatEXE=praatEXE)
File "build/bdist.macosx-10.12-x86_64/egg/promo/f0_morph.py", line 84, in f0Morph
promo.f0_morph.MissingPitchDataException:
No data points available in a region for morphing.
Two data points are needed in each region to do the morph
Regions with fewer than two samples are skipped, which should be fine for some cases (e.g. unvoiced segments).
If you need more data points, see promo.morph_utils.interpolation`
Maybe I miss some steps?
The text was updated successfully, but these errors were encountered: