Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextGrid.tierDict can be modified, corrupting the TextGrid #39

Closed
scottmk opened this issue Aug 27, 2022 · 9 comments
Closed

TextGrid.tierDict can be modified, corrupting the TextGrid #39

scottmk opened this issue Aug 27, 2022 · 9 comments

Comments

@scottmk
Copy link

scottmk commented Aug 27, 2022

As far as I can tell from your examples, the canonical way to access a TextGridTier from a TextGrid is to access the internal tierDict (Example).

However, because tierDict is mutable, one can end up adding a new TextGridTier to this tierDict, e.g.,

new_textgrid_tier = new_textgrid.tierDict.setdefault(
    "new_tier",
    IntervalTier(
        "new_tier", [], textgrid_obj.minTimestamp, textgrid_obj.maxTimestamp
    )
)

If this happens, the TextGrid will essentially be in an Illegal State, because TextGrid.tierNameList will be missing the new tier. This obviously breaks functions like TextGridTier#replaceTier, among other things.

I think there are at least two possible solutions:

  1. Add a getTier method and rename tierDict to _tierDict to indicate that it is a protected member that shouldn't be altered
  2. Change tierDict to an OrderedDict and remove tierNameList entirely. You can then just use tierDict.keys() in place of tierNameList

I think it'd probably be best to implement both solutions, particularly because it's dangerous to have two parallel data structures that you need to keep in sync. However, simply implementing number 1 alone should solve the problem quickly and easily.

I'm also happy to submit a PR for this if you'd like.

Thanks again for all your work!

@timmahrt
Copy link
Owner

I think full immutability would be pretty cool but I'm not sure how easy it would be to do. I might look into that. If done, then a full set of setter and getter methods will be necessary.

Regardless, I like the idea of moving to OrderedDict just for the semantics.

It might be a while before I can focus my attention on this, but I will sleep on it!

@scottmk
Copy link
Author

scottmk commented Aug 28, 2022

I think full immutability would be pretty cool but I'm not sure how easy it would be to do.

In hindsight, not very easy at all in Python. You basically have to make a new data structure. I think just marking it as protected and adding a getter makes more sense. But I think just removing the list and relying on the keyset of the dict instead will protect from this problem. There will probably be some annoying finagling to get it to work tho, haha.

Regardless, I like the idea of moving to OrderedDict just for the semantics.

Totally! Actually, as of Python 3.6, all dicts preserve the order of keys, but that doesn't help if you want to be compatible with older versions.

One downside of OrderedDict is it always appends new entries to the end, and if you update an entry, it moves it to the end. So if you want to preserve the order of entries as you update them, you have to do an annoying O(n) solution using OrderedDict#move_to_end and/or OrderedDict#popitem. Same for arbitrary insertion, which is something you currently support. I'll see if I can think of more elegant solutions.

It might be a while before I can focus my attention on this, but I will sleep on it!

Of course, makes sense! I just wanted to post issues as I found them, so I didn't forget!

If you're open to it, I might send pull requests later down the road. I'm working on a thesis right now, so it may be some time.

@timmahrt
Copy link
Owner

timmahrt commented Jan 7, 2023

Sorry for taking so long.

I've got a PR up. I decided that for now I'd like to keep the same interface but I resolved the issue where tierNameList can diverge from tierDict.

I did this by making tierDict an OrderedDict and making tierNameList a property.

One downside of OrderedDict is...if you update an entry, it moves it to the end.

I tried to replicate this in a test but couldn't. Was the issue maybe resolved in Python 3.6?

https://docs.python.org/3/library/collections.html#collections.OrderedDict:
Changed in version 3.6: With the acceptance of [PEP 468](https://peps.python.org/pep-0468/), order is retained for keyword arguments passed to the [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict) constructor and its update() method.

Please let me know if you have any feedback.

@scottmk
Copy link
Author

scottmk commented Jan 10, 2023

No worries! Life happens, I know how it goes.

I tried to replicate this in a test but couldn't. Was the issue maybe resolved in Python 3.6?

Fascinating. I didn't know they had fixed this behavior. That's really nice!

Thanks for making the fixes. I'll take a look at the PR as soon as I can, but if I take too long, obviously don't wait on me :)

@timmahrt
Copy link
Owner

timmahrt commented Feb 4, 2023

Add a getTier method and rename tierDict to _tierDict to indicate that it is a protected member that shouldn't be altered

I was a bit on the fence at first but actually I think its a nice change so I added it to my PR as well.

@timmahrt
Copy link
Owner

timmahrt commented Feb 4, 2023

I merged my PR in prep for a release. But if you have any comments or concerns, please share! I can open a follow-up PR.

@timmahrt
Copy link
Owner

timmahrt commented Feb 5, 2023

I created a follow-up PR that also makes tier.entries (formally tier.entryList) read only. Both changes are now released!

If you have any feedback, please share!

@scottmk
Copy link
Author

scottmk commented Feb 5, 2023

Thanks for this! I'll check it out

@timmahrt
Copy link
Owner

timmahrt commented Nov 4, 2023

I'll close this for now but if there are any related issues, feel free to reopen!

@timmahrt timmahrt closed this as completed Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants