Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TP2000 360 - Refactor automated business rule checks. #633

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Commits on Aug 3, 2022

  1. Configuration menu
    Copy the full SHA
    e59fc8a View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2022

  1. This PR builds on the work in the initial PR to move business rules t…

    …o celery along with info learned deploying this.
    
    Avoid filling the task queue with orchestration tasks and starving the workers.
    ===============================================================================
    
    In the previous system there were about 3 layers of tasks, that orchestrated other tasks,
    by using the .replace() API in each task.
    
    Unfortunately it was possible for celery workers to become full of orchestration tasks
    leaving no room for the business rule tasks at the bottom of the to actually run.
    
    This PR attempts two mitigations:
    
    1. Use celery workflows instead of .replace()
    
    This PR builds a celery workflow in the check_workbasket using celery constructs such as chain and group.
    In theory, since most of the work is done ahead of time the system should have more awareness of the task structure avoiding the issue of starvation.
    
    2. Cancel existing workbasket checks when a new check is requested.
    
    When check_workbasket is started, it will attempt to revoke existing check_workbasket tasks for the same workbasket.
    
    Treat intermediate data structures as ephemeral
    ===============================================
    
    A celery task may execute at any time, right now - or when a system comes up tomorrow, based on this assumption models such as TrackedModelCheck (which stores the result of a business rule check on a TrackedModel) are no longer passed to celery tasks by ID, instead all the information needed to receate the data is passed to the celery task, this means the system will still work even if developers delete these while it is running.
    
    Reduce layers in business rule checking
    =======================================
    
    BusinessRuleChecker and LinkedModelsBusinessRuleChecker are now the only checkers, these now take BusinessRule instances, instead of being subclassed for each business rule.
    While more parameters are passed when rules are checked a conceptual layer has been removed and the simplification is reflected with around 20 lines of code being removed from checks.py
    
    Celery flower is now very easier to read
    ========================================
    Due to the changes above, the output in celery flower should correspond more closely to a users intentions - ids of models.
    
    Content Checksums
    =================
    
    Result caching now validates using checksums of the content, which should reduce the amount of checking the system needs to do.
    
    When a workbasket has been published, it's content could invalidate some content in other unpublished workbaskets, by associating business rule checks with checksums of a models content, any models that do not clash can be skipped.
    
    Model checksums (generated by `.content_hash()`) are not currently stored in the database (though it may be desirable to store them on TrackedModels, as it would provide an mechanism to address any content in the system).
    The checksuming scheme is a combination of the type and a sha256 of the fields in `.copyable_fields` (which should represent the fields a user can edit, but not fields such as pk).
    Blake3 was tested, as it provides a fast hashing algorithm, in practice it didn't provide much of a speedup over sha256.
    
    PK ranges
    =========
    
    Occasionally workbaskets with many items may need to be checker (the initial workbasket has 9 million items).
    Based on the observations that the ID column of the contained TrackedModels is mostly continguous, the system allows passing sequences of contiguous TrackedModels specified by tuples of (first_pk, last_pk).
    This is relatively compact, suitable for passing over the network with celery and readable in Celery flower.
    
    This also enables chunking of tasks - further enabled by specifying a maximum amount of items in each tuple.
    
    On TrackedModelQueryset `.as_pk_intervals` and `.from_pk_intervals` are provided to go to and from this format.
    stuaxo committed Aug 10, 2022
    Configuration menu
    Copy the full SHA
    c12f576 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2022

  1. Migration:

    Migrate
    TrackedModelChecks to new structure.
    remove TransactionCheck.
    
    Start moving business rules into the database, and provide sync_business_rules to do that, along with a mechanism to do this in tests.
    stuaxo committed Aug 26, 2022
    Configuration menu
    Copy the full SHA
    6eed927 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    88624e0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4a6c1b9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1bc6b60 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    686f4c2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ae779ad View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    672cbff View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ece55b7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    fdb4de9 View commit details
    Browse the repository at this point in the history
  10. Fix rule parsing default (?)

    stuaxo committed Aug 26, 2022
    Configuration menu
    Copy the full SHA
    583070d View commit details
    Browse the repository at this point in the history