New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I fill the blanks in the tier by extending the existing intervals? #25
Comments
Someone recently had a similar request when reading in textgrids in praat and I added a "readRaw" boolean parameter. I can do the same for the save function. If "writeRaw" is true, it won't insert blanks. In your case the automatic aligner is assuming the blank label is a phone? I used one a long time ago that required blank intervals to be labeled "sp" (small pause)--IIRC. I imagine the exact behaviour is vendor specific. When a segment is deleted in praat, praat inserts a space. I followed that behaviour when I first wrote praatio, but there are other use cases of course. I should be able to add this tomorrow.
I don't understand the use case. Without knowing your case more, I would assume that changing the duration of the intervals would invalidate the meaning of the labeled interval (phone, word, phrase). But, you can use praatio to do this if it would help (this is off the top of my head--I will double check tomorrow)
|
When you say automatically align, you're talking about forced aligners or something else? What tool are you using? |
Thank you so much for the quick response!
Yes, and actually it usually contains a set of phones, in contrast to the silent parts. And the blank areas in the tier are often very short, 0.001 second for example. They exist because I am combining some tg files together, and due to the accuracy of floats some of the digits are lost. I am using SPPAS to do some forced aligning work (just learnt this term :). But I wrote my own scripts to process the outcome of the aligner. The tokens and phones are originally corresponding, and silent parts are marked with "#", A blank interval would occur only when the SPPAS aligner failed, so some additional (blank or not) intervals produced by praatio would surprise my script. |
I have used spass and praatio together before and did not encounter the problem you are having. But, I have not used sppas for several years now and things may have changed. From the documentation, it seems that '#' is used to mark silence: I'm working on the two changes requests now. |
If you suspect those are errors, it may be better to correct them. By default its very short (0.00000001 second) and is used to fix floating point rounding errors. |
Thank you for your instructions! This can help about the problem a lot. |
For a textgrid produced by praat, praatio should be able to load and save the textgrid without modifying the original timestamps. If you have an example where save-and-load is distorting the textgrid, please share. That's a bug that would need to be fixed.
I think I misunderstood your original problem then. You first wrote:
By that, I thought you meant that praatio was making the use of SPPAS more difficult? Do you mean that it is more annoying to work with your textgrids in python? Or?
I'm not sure about the behaviour of |
The related files are attached below. A short blank interval in the end of the 'PhonAlign' tier in the textgrid with the filename without 'seg'.files.zip
Kind of, but not annoying in deed. I didn't expect this behavior. I though that an interval would be there only because of two reasons: my program added it, or it originally existed. This might be less a thing if the saved textgrid is only for manual use.
The option of |
Thanks, I'll give it a try. I've got the code done for both change requests and should be pushing out a release in a bit. |
|
I've released praatio 4.3.0. Textgrid.save() has a new parameter I already had written this before our latest conversation. What is missing is the exception throwing behaviour and For the exception throwing behaviour, I can see it useful for debugging but I don't quite understand when we would expect to use it. "I'd like to save this textgrid that I expect is full, but if there are any holes, throw an exception." Ideally praatio doesn't create holes. I didn't have a chance to look the files you provided yet but that might illustrate the need for it.
There is a lot of complex behaviour in loading and saving; and lots of configurations. I've been thinking of how I can distill that down and make it simpler. I've been thinking about adjusting the interface for some time now. No concrete ideas or plans though. |
It's late here so I only took a cursory look at your data. Just to be clear, your concern is the very final entry (and not other blank entries)?
That does indeed look like some sort of truncation problem. I'll investigate further--tomorrow if I have time. |
The work is so fast and good!
Is it possible to add nonadjacent Intervals directly to an newly created IntervalTier? If so, holes might be manually created by such low-level operations. |
Holes cannot be selected in praat, so it is a problem. I think there are two kinds of holes: intentional holes (eg imagine a long pause between two sentences--I think it should be filled in with a blank) and unintentional holes (rounding errors etc).
That is absolutely possible.
Praatio's behavior is mostly based on my original use case 🙈 --I worked with long recordings which naturally have lots of holes and post-processed the textgrids manually in praat, so I wanted the holes filled. Of course, this does won't apply to everyone/most people. I'm going to think on it a bit. 🤔
I found the problem. So Did those come from SPPAS directly? If you already have a lot of data beyond just these files, I would run a function that checks the length of each pair. For praatio, I think some sort of optional validation is needed but I'm not sure what it should look like. I'm going to open a new issue for validation. |
I think filling the holes is better for manual processes, while not filling them is better for programs to recognize [EDIT: because the program importing praatio.save as a black box can hardly predict whether a hole will be filled]. The reason of the problem is totally different from what I thought 🤣 The files were produced by SPPAS directly. |
One solution could be to make the optional paramaters non-optional [eg ignoreBlankSpaces]. This would raise awareness that
In that case, I think its worth contacting the SPPAS author with the input and output files explaining the situation. It seems like a bug. |
That's a concise solution. If so, it's better that most of the potential modifications can be reflected by the parameters.
OK, I'm doing it later. I've contacted her for other issues in SPPAS. She is also a very kind and helpful developer😉 |
This will break backwards compatibility. There are some other breaking changes I want to make, so I will bundle them together in a major change (praatio 5.0). I will try to push out a release this month if I can. I've made an issue: #27 |
That's great. Thanks for your efforts and patience! |
I've just released Praatio 5.0 which adds two new required parameters: format and includeBlankSpaces
Thank you for your help in guiding this new feature! |
I'm so glad to make some contributions! |
I noticed that when saving the textgrid file, praatio would try to fill up the tiers with new blank intervals, which seems not to be quite friendly to automatic aligning. I am not sure but what would happen if the tier is left there and not filled up? Or can I use praatio to fill the blanks by extending the existing intervals (which would not change the total number of intervals, making it much easier for machines to recognize)?
The text was updated successfully, but these errors were encountered: