Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepDiff Fails to Detect Timezone Changes in Arrays with ignore_order=True #516

Closed
3schwartz opened this issue Jan 9, 2025 · 6 comments
Closed

Comments

@3schwartz
Copy link

3schwartz commented Jan 9, 2025

Describe the bug

The deepdiff library is unable to detect timezone changes in datetimes within arrays.

When ignore_order=True and a datetime is present within an array, deepdiff uses an execution path that makes a DeepHash. Due to the implementation of datetime_normalize, all datetimes have their timezone set to UTC:

obj = obj.replace(tzinfo=datetime.timezone.utc)

This behavior causes issues in our use case, as we rely on deepdiff to detect changes in datetime objects, including changes to their timezones.

To Reproduce

Here’s a minimal example demonstrating the issue. The data structure includes datetimes within an array. When the timezone of the datetimes is changed, deepdiff` does not recognize this change:

class Simple:
    def __init__(self, date: datetime):
        self.dates: List[datetime] = [date]

    @staticmethod
    def construct(add_timezone: bool = False) -> "Simple":
        if add_timezone:
            date = datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)
        else:
            date = datetime(2020, 8, 31, 13, 14, 1)
        return Simple(date=date)

def test_simple():
    old = Simple.construct()
    new = Simple.construct(add_timezone=True)

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

def test_simple_array():
    old = [datetime(2020, 8, 31, 13, 14, 1)]
    new = [datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)]

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

Execution path

  • diff.py (725): _diff_iterable -> _diff_iterable_with_deephash
  • diff.py (1269): _diff_iterable_with_deephash -> _create_hashtable
  • diff.py (1085): _create_hashtable -> DeepHash
  • diff.py (1085): _create_hashtable -> DeepHash
  • deephash.py (216): DeepHash -> (DeepHash)self._hash
  • deephash.py (535): (DeepHash)self._hash -> (DeepHash)self._prep_datetime
  • deephash.py (478): (DeepHash)self._prep_datetime -> datetime_normalize
  • helper.py (627): datetime_normalize

Expected behavior

I expect deepdiff to detect changes to datetime objects, including differences in their timezones, and report the data structures as different.

Is there a specific reason why datetime_normalize replaces the timezone with UTC? Would it be reasonable to modify or remove this behavior?

Alternatively, could the logic be adjusted to conditionally apply the timezone replacement only when truncate_datetime is explicitly set? For example, the following indentation could be introduced:

def datetime_normalize(truncate_datetime, obj):
    if truncate_datetime:
        if truncate_datetime == 'second':
            obj = obj.replace(microsecond=0)
        elif truncate_datetime == 'minute':
            obj = obj.replace(second=0, microsecond=0)
        elif truncate_datetime == 'hour':
            obj = obj.replace(minute=0, second=0, microsecond=0)
        elif truncate_datetime == 'day':
            obj = obj.replace(hour=0, minute=0, second=0, microsecond=0)

        if isinstance(obj, datetime.datetime):
            obj = obj.replace(tzinfo=datetime.timezone.utc)
        elif isinstance(obj, datetime.time):
            obj = time_to_seconds(obj)
    return obj

This approach ensures that timezone replacement occurs only when truncate_datetime is explicitly set.

A solution with the proposed approach mentioned above has been submitted in PR #517.

OS, DeepDiff version and Python version (please complete the following information):

  • Python Version 3.10
  • DeepDiff Version 8.0.1 (Note: logic is present on main branch)
@3schwartz 3schwartz changed the title DeepDiff Fails to Detect Timezone Changes with ignore_order=True DeepDiff Fails to Detect Timezone Changes in Arrays with ignore_order=True Jan 9, 2025
@seperman
Copy link
Owner

seperman commented Mar 5, 2025

@3schwartz We only assume the UTC timezone if the datetime is not timezone aware. If the datetime is timezone aware, we don't overwrite it. Also you are using an older version of DeepDiff.
This is explained in the FAQ: Q: Why my datetimes are reported in UTC?

@chriswyatt
Copy link

@3schwartz We only assume the UTC timezone if the datetime is not timezone aware. If the datetime is timezone aware, we don't overwrite it. Also you are using an older version of DeepDiff. This is explained in the FAQ: Q: Why my datetimes are reported in UTC?

What are your reasons for such peculiar default behaviour? This would be a useful option, but having this as the default behaviour is just wrong.

@seperman
Copy link
Owner

@chriswyatt The reason was so you get deterministic diff regardless of your local timezone.

@seperman
Copy link
Owner

What we can do is to allow the user to pass a "default" timezone if the datetime values are not timezone native. And our default will be UTC unless the user overrides it.

@chriswyatt
Copy link

What we can do is to allow the user to pass a "default" timezone if the datetime values are not timezone native. And our default will be UTC unless the user overrides it.

UTC and local system time are the only ones that really make sense when it comes to naive datetimes. Python assumes local time by default, so defaulting to UTC is just not the expected behaviour. It's particularly confusing here in the UK, where we might write some code that works fine in the winter, and then it will suddenly not work correctly in the summer.

The problem with assuming other timezones, is that a lot of countries use 2 timezones depending on the time of year. That would be quite a bit of extra complication and test coverage needed. It doesn't seem like a feature that many would need, so I would personally avoid it, myself.

I do appreciate the feature though. The projects I'm working on right now do use naive UTC quite a lot, so this feature is really useful to me.

@seperman
Copy link
Owner

@chriswyatt Thanks. I added the Default Timezone parameter: https://zepworks.com/deepdiff/current/basics.html#default-time-zone
You can set your default timezone to be your local timezone now to avoid your native datetimes to be converted to UTC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants