`intersection`: issue on consecutive duplicate words #45

scottmk · 2023-01-23T09:16:38Z

Today I encountered an issue with the behavior of intersection.

Say I have a WORD tier that looks like this:

UPON | A | A | TIME

And I have a PHONE tier that looks like this:

AH0 | P | AA1 | N | AH0 | EY1 | T | AY1 | M

Assuming these are time-aligned correctly, when I call intersection, I get a list that looks something like this:

['UPON-AH0', 'UPON-P', 'UPON-AA1', 'UPON-N', 'A-AH0', 'A-EY1', 'TIME-T', 'TIME-AY1', 'TIME-M']

Because I have two intervals in the WORD tier which have the same label, from this intersection I can't really tell if I have two distinct words "A" that have the respective transcriptions "AH0" and "EY1", or if I have one distinct word "A" transcribed as "AH0 EY1".

Obviously, there is no right way to solve this, but I would suggest that since we do know that the word entries are distinct, that perhaps instead the label should be the WORD label plus a tuple of all the PHONE labels that coincide with it. Something like this:

['UPON-(AH0, P, AA1, N)', 'A-(AH0)', 'A-(EY1)', 'TIME-(T, AY1, M)']

This would also mean that the interval boundaries would be the boundaries of the left-hand side tier. So my example would be for

word_tier.intersection(phone_tier)

If you instead did

phone_tier.intersection(word_tier)

you would get

['AH0-UPON', 'P-UPON', 'AA1-UPON', 'N-UPON', 'AH0-A', 'EY1-A', 'T-TIME', 'AY1-TIME', 'M-TIME']

What do you think?

The text was updated successfully, but these errors were encountered:

scottmk · 2023-01-23T09:58:23Z

Alternatively, it could behave as it currently does, and instead you could add the entryList entry index from each tier to the intersection entries.

timmahrt · 2023-01-26T13:15:09Z

I think you first solution sounds reasonable. I'll take a look and see what I can do.

timmahrt · 2023-02-04T08:49:13Z

When I went to implement the changes, I realized that my original intention with intersection() wouldn't be compatible with the changes you suggested.

For example, what would happen for a phone list, where only some of the phones are listed for each word, e.g.
[(0, 1, "hello")] and [(0.1, 0.2, 'e'), (0.7, 1.0, 'o')]
What would the expected output be? Under the existing intersection method, there would be two intervals output [(0.1, 0.2, "hello(e)"), (0.7, 1.0, "hello(o)")], but I think one could argue that in some cases only one interval is wanted (0, 1, "hello(e,o)")--which is more in line with your use case.

I wondered how I could accommodate these two scenarios--parameterize intersection()?

I decided that a simpler solution was to create a different method mergeLabels(). I implemented that in #47 I also added some documentation to the existing intersection().

What do you think? Does mergeLabels() work for your use case?

timmahrt · 2023-02-04T08:55:29Z

Here is the method signature:
https://github.com/timmahrt/praatIO/pull/47/files#diff-35a03755d23b8e11ea1a0d22db05fa23181cc9dfc8a6675bb72e8781ca4b269eR572

Here is an example usage from the tests, using the example you provided:
https://github.com/timmahrt/praatIO/pull/47/files#diff-821de34f450931440c2ec4dcdea75ca2127eea10060b8674de5f20eeae4a303dR1225

timmahrt · 2023-02-05T10:01:08Z

I merged my PR and built a release. I've been sitting on a lot of code since November which I really shouldn't have done.

Reviews on the merged PR are still welcome--I can make a follow-up PR. 🙇

scottmk · 2023-02-05T19:48:09Z

Thanks for this! I think creating the new method is a great compromise and this helps my use case a lot.

I'll take a look at the PR and see if I have any comments to make.

Thanks for the quick response!

scottmk closed this as completed Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`intersection`: issue on consecutive duplicate words #45

`intersection`: issue on consecutive duplicate words #45

scottmk commented Jan 23, 2023 •

edited

Loading

scottmk commented Jan 23, 2023

timmahrt commented Jan 26, 2023

timmahrt commented Feb 4, 2023

timmahrt commented Feb 4, 2023

timmahrt commented Feb 5, 2023

scottmk commented Feb 5, 2023

intersection: issue on consecutive duplicate words #45

intersection: issue on consecutive duplicate words #45

Comments

scottmk commented Jan 23, 2023 • edited Loading

scottmk commented Jan 23, 2023

timmahrt commented Jan 26, 2023

timmahrt commented Feb 4, 2023

timmahrt commented Feb 4, 2023

timmahrt commented Feb 5, 2023

scottmk commented Feb 5, 2023

`intersection`: issue on consecutive duplicate words #45

`intersection`: issue on consecutive duplicate words #45

scottmk commented Jan 23, 2023 •

edited

Loading