Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend as_of to return a queryset instead of a list of instances (both are iterable). #866

Merged

Conversation

jeking3
Copy link
Contributor

@jeking3 jeking3 commented Aug 13, 2021

Description

The discussion in #397 is about using filtering and querysets with point-in-time queries. The current as_of implementation returns a instance generator so it can only be used to make a single query and obtain the result. The documentation states that it must return an iterable, so it does not have to be a generator. As a queryset, no tests needed to change to handle it.

HistoricalQuerySet is now returned from queries on history. This is used internally by as_of queries to prune results to the latest of each primary key, and to convert to instances.

The resulting queryset from as_of can be further filtered; any use of pk will be converted to the original model's primary key if the queryset is returning instances.

Example (from unit tests):

    def test_as_of(self):
        """Demonstrates how as_of works now that it returns a QuerySet."""
        t0 = datetime.now()
        document1 = RankedDocument.objects.create(rank=42)
        document2 = RankedDocument.objects.create(rank=84)
        t1 = datetime.now()
        document2.rank = 51
        document2.save()
        document1.delete()
        t2 = datetime.now()

        # nothing exists at t0
        queryset = RankedDocument.history.as_of(t0)
        self.assertEqual(queryset.count(), 0)

        # at t1, two records exist
        queryset = RankedDocument.history.as_of(t1)
        self.assertEqual(queryset.count(), 2)
        self.assertEqual(queryset.filter(rank__gte=75).count(), 1)
        ids = set([item["id"] for item in queryset.values("id")])
        self.assertEqual(ids, {document1.id, document2.id})

        # at t2 we have one record left
        queryset = RankedDocument.history.as_of(t2)
        self.assertEqual(queryset.count(), 1)
        self.assertEqual(queryset.filter(rank__gte=75).count(), 0)

Related Issues

#118
#229
#354
This closes #397
#715
#758
This closes #803

Motivation and Context

A long-standing issue that kept me from using the library effectively.

How Has This Been Tested?

Tests were added, all tests pass. Existing tests did not have to change. Additionally, my company has been using this in production for over a month.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have run the make format command to format my code
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have added my name and/or github handle to AUTHORS.rst
  • I have added my change to CHANGES.rst
  • All new and existing tests passed.

@codecov
Copy link

codecov bot commented Aug 13, 2021

Codecov Report

Merging #866 (83d2df2) into master (76691d9) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #866      +/-   ##
==========================================
+ Coverage   97.83%   97.87%   +0.04%     
==========================================
  Files          22       22              
  Lines        1062     1082      +20     
  Branches      206      209       +3     
==========================================
+ Hits         1039     1059      +20     
  Misses         10       10              
  Partials       13       13              
Impacted Files Coverage Δ
simple_history/manager.py 98.11% <100.00%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76691d9...83d2df2. Read the comment docs.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 7, 2021

Waiting for #863 to merge before I rebase this one. Could someone please review it? Thanks.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 8, 2021

I believe I found an issue at https://github.com/jazzband/django-simple-history/pull/866/files#diff-4b2bcf1fafd0c47640f253c43a1cb48ac5b9a244a013d704192ec3a33e1dd146R174 whereby if instance=False then what we want is a queryset.filter(history_date__lte=as_of) instead of calling the HistoricalQuerySet._as_of which does additional instance based filtering to remove deletions.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 9, 2021

The next push resolves the issue I found (and the incorrect test assertions), and makes the code more understandable.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from 8e35d5c to 6a9e229 Compare September 9, 2021 01:54
@jeking3
Copy link
Contributor Author

jeking3 commented Sep 9, 2021

Hmm, I think perhaps as_of should always perform a latest_for_each() instead of what I pushed. If someone wants multiple historical records for the same object they can filter on history_date__lte directly. Fixing that up. The logic was actually correct before in that respect. :|

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from 6a9e229 to 644526e Compare September 9, 2021 13:47
@jeking3
Copy link
Contributor Author

jeking3 commented Sep 9, 2021

Forgot to modify the documentation appropriately for the last push.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch 2 times, most recently from f332213 to edef9f6 Compare September 9, 2021 14:12
@jeking3
Copy link
Contributor Author

jeking3 commented Sep 9, 2021

That last push was to cover an idempotence branch in HistoricalQuerySet.as_instances() - looks like it was successful.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 9, 2021

I changed as_instances() to use the original table's ordering, however that ordering can refer to other non-historical tables by relation, which will not have linkage, and fail. Removing the ordering; as_of instances will come back in reverse chronological order.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from edef9f6 to d6fc82e Compare September 9, 2021 19:09
@jeking3 jeking3 changed the title Extend as_of to get history or instances and return a queryset. [NEEDS REVIEW PLEASE] Extend as_of to get history or instances and return a queryset. Sep 16, 2021
@jeking3 jeking3 requested review from a team and removed request for treyhunner and rossmechanic September 19, 2021 13:25
@jeking3
Copy link
Contributor Author

jeking3 commented Sep 19, 2021

I have been using this code in production for a few weeks, it's been stable and effective at allowing for queries into history with additional filters.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from d6fc82e to 47c496b Compare September 29, 2021 14:09
@jeking3
Copy link
Contributor Author

jeking3 commented Sep 29, 2021

Rebased now that the history_date PR was merged.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 29, 2021

Well shoot, one of the recent test assertions on the number of queries for previous and next caught something; looking into it.

Root cause; there was a query where there should not have been one, this jacked up the query count. Fixed.

@jeking3
Copy link
Contributor Author

jeking3 commented Sep 29, 2021

I split the test runner changes into their own PR.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch 3 times, most recently from 4529d34 to e9c600e Compare September 29, 2021 15:13
@jeking3
Copy link
Contributor Author

jeking3 commented Oct 3, 2021

@inexcode give it a try.

@inexcode
Copy link

@inexcode give it a try.

I've checked it some time ago and it worked, but now as master branch has conflicting changes with release, it's harder for me to apply this patch on production. Waiting for this to be released.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from 8b92951 to 3d1de5e Compare November 8, 2021 12:55
@jeking3
Copy link
Contributor Author

jeking3 commented Nov 8, 2021

Rebased to master.

@jeking3
Copy link
Contributor Author

jeking3 commented Dec 3, 2021

@jazzband/django-simple-history is this project no longer maintained? Nobody's looking at pull requests.

Copy link
Contributor

@tim-schilling tim-schilling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes here are minimal. My concerns are around the language of the documentation and comments. Please be aware that two of my comments can be ignored entirely as it's related to old code that's simply being moved, but I wanted to raise it incase you agree.

I could be misunderstanding some of the comments or code, so if you think that's the case please let me know and we can work through it.

simple_history/manager.py Show resolved Hide resolved
simple_history/manager.py Show resolved Hide resolved
class HistoryDescriptor:
def __init__(self, model):
self.model = model

def __get__(self, instance, owner):
if instance is None:
return HistoryManager(self.model)
return HistoryManager.from_queryset(HistoricalQuerySet)(self.model)
return HistoryManager(self.model, instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we don't want to use the from_queryset when instantiating the manager with an instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I believe it would be useless to get a historical queryset when instantiating from an instance because HistoricalQuerySet exists to facilitate being able to query history or instances; if you are already at an instance then you do not need to query history. That said, the better answer here is "I don't know" as my implementation and tests haven't needed it. as_of queries in all my use cases are always on an model and not on an instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good thing to document to at least partially stave off future questions if someone expects it to work differently.

simple_history/manager.py Outdated Show resolved Hide resolved
simple_history/manager.py Outdated Show resolved Hide resolved
Comment on lines 102 to 104
records or instances. The ``as_of`` method will return instances from a specific
date and time that you specify, and return a queryset that you can use to further
filter the result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't as_of either return a single instance or a queryset? If so, I think this should be reworded a little to make that clear.

The as_of method will return an instance or a queryset from a specified date and time.

Ideally we'd call out when it will return a instance vs queryset too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you explore the history of the related issues, you will see most people wanted to leave as_of alone and use something else to represent a proper timepoint query. I agree with you that as_of returning an instance or a queryset is confusing. Previously it returned either an instance or an array of instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for a little more documentation explaining that. Not a functional change.

@tim-schilling
Copy link
Contributor

@jeking3 We should do a release before this gets merged in. This change necessitates a major version change as it's changing the API.

@jeking3
Copy link
Contributor Author

jeking3 commented Dec 6, 2021

My intention was that the documented interface for as_of declares the return be an iterable. It used to be a list of instances. It is now a queryset; both of which are iterable. From a documentation perspective they are not different.

@jeking3
Copy link
Contributor Author

jeking3 commented Dec 15, 2021

I'm really starting to see this problem better now, in that "as of" is a global time point for the query and anything that follows it including chasing relations. I actually have a solution working against this now in my fork and in my product, for example when two tables are historic and one refers to the other, I added a HistoricForeignKey that is used in place of ForeignKey that is smart enough to honor the referrer's timepoint and can chase the relationship properly (the code is not in this PR; something I have to work on getting into this project). This behavior relies on the code in this PR adding more information to objects coming out of instanceize(), namely _history is a property with the historic record that made the instance, and _history._as_of is the timepoint of the query that generated the instance from which the relation is being chased.

In all what this PR plus that additional work provides is foreign keys that work naturally between two tables that capture history. So not only would we get queryset results that we can filter, but also the ability to chase relationships correctly from the as_of timepoint in the queryset.

@tim-schilling
Copy link
Contributor

@jeking3 I didn't approve the PR after your comments yet because I was waiting for your opinion/response regarding my desire for more documentation and commenting.

@jeking3
Copy link
Contributor Author

jeking3 commented Jan 2, 2022

Understood. It's probably best to address your concerns here and then open a separate pull request for issue-880.

@jeking3 jeking3 force-pushed the issue-397-as-of-queryset-filtering branch from 3d1de5e to d1ab704 Compare January 2, 2022 21:52
@jeking3 jeking3 changed the title [NEEDS REVIEW PLEASE] Extend as_of to get history or instances and return a queryset. Extend as_of to return a queryset instead of a list of instances (both are iterable). Jan 2, 2022
@jeking3
Copy link
Contributor Author

jeking3 commented Jan 2, 2022

CI success blocked on #938

tim-schilling
tim-schilling previously approved these changes Jan 17, 2022
Copy link
Contributor

@tim-schilling tim-schilling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this for as much as I'm familiar with the codebase.

@jeking3 jeking3 mentioned this pull request Jan 18, 2022
This changes `as_of` to return a QuerySet instead of a list of instances
so that further filtering can be applied to the history.
@jeking3
Copy link
Contributor Author

jeking3 commented Jan 24, 2022

@tim-schilling sorry to ask, I had to rebase and it resets the review; could you re-approve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants