Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TP2000 360 - Refactor automated business rule checks. #633

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

stuaxo
Copy link
Contributor

@stuaxo stuaxo commented Jul 27, 2022

This PR is an(other) attempt at bringing moving the business rules into celery.

This PR attempts to:

Build celery workflow in terms of celery workflow constructs: so celery celery can correctly schedule tasks.
Attempts to reduce the amount of work the system has to:

  • Cancellation of running tasks.
  • Cache results of some business rules when when operations such as publishing workbaskets occur.

Much of the learning in this PR is based on the previous PR to bring use celery to run business rules, and input from the most of the team.

@stuaxo stuaxo force-pushed the TP2000-360 branch 3 times, most recently from 2a77ba1 to 004d26b Compare August 3, 2022 17:51
checks/checks.py Outdated Show resolved Hide resolved
checks/checks.py Outdated
"""Runs Checker-dependent logic and returns an indication of success."""
raise NotImplementedError()
@classmethod
def apply_rule(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first glanceapply_rule doesn't seem so different to run_rule(). A bit more docstring to distinguish between them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the distinction confusing myself - this may need more thought, or at least more docs.

def apply_rule(
cls,
rule: BusinessRule,
transaction: Transaction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear why transaction is required and how it's used - docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have atomisity, but we can try and get closer to it: one way is by passing in the current transaction when you check.

This may also allow us to do things like check the system as it was in the past.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add that explanation into the docstring?

from typing import Collection
from typing import Dict
from typing import Iterator
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be right to think of this module as a main entry point for checks (besides the scheduling)?
A good, high-level description of how checks work in a module-level docstring would be good.

checks/checks.py Outdated
cls._checker_cache[checker_name] = BusinessRuleCheckerOf
return BusinessRuleCheckerOf
class LinkedModelsBusinessRuleChecker(Checker):
"""Apply BusinessRules specified in a TrackedModels indirect_business_rules
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good description of what indirect business rules are in our source code (how they differ to regular business rules, for instance). If you have a good one in mind from working in this area, then it would be useful to add it against common.models.trackedmodel.TrackedModel.indirect_business_rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this closer to the original text in that case -

tamato/checks/checks.py

Lines 167 to 170 in 25debc2

"""
A ``Checker`` that runs a ``BusinessRule`` against a model that is linked to
the model being checked, and for which a change in the checked model could
result in a business rule failure against the linked model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And see if I can find examples :)

CELERY_RESULT_EXTENDED = True # Adds Task name, args, kwargs to results.

# The following settings are usually useful for development, but not for production.
CELERY_TASK_ALWAYS_EAGER = is_truthy(os.environ.get("CELERY_TASK_ALWAYS_EAGER", "N"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to use actual bools as the default param to is_truthy(os.environ.get()) combos. Worth sticking with that convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - this was actually a source of bugs, and I previously commited a change to master so is_truthy to returns False if the value passed to it is not truthy.

This is to solve a bug where I had a line like:

CELERY_TASK_ALWAYS_EAGER = is_truthy(os.environ.get("CELERY_TASK_ALWAYS_EAGER"))

This defaulted to True... because:

is_truthy(None) was called:

Which then has the line:
return str(value).lower() not in ("n", "no", "off", "f", "false", "0")

Since str(value), was "None", it returned True.

@stuaxo stuaxo force-pushed the TP2000-360 branch 4 times, most recently from 392667e to fca35a1 Compare August 10, 2022 17:02
…o celery along with info learned deploying this.

Avoid filling the task queue with orchestration tasks and starving the workers.
===============================================================================

In the previous system there were about 3 layers of tasks, that orchestrated other tasks,
by using the .replace() API in each task.

Unfortunately it was possible for celery workers to become full of orchestration tasks
leaving no room for the business rule tasks at the bottom of the to actually run.

This PR attempts two mitigations:

1. Use celery workflows instead of .replace()

This PR builds a celery workflow in the check_workbasket using celery constructs such as chain and group.
In theory, since most of the work is done ahead of time the system should have more awareness of the task structure avoiding the issue of starvation.

2. Cancel existing workbasket checks when a new check is requested.

When check_workbasket is started, it will attempt to revoke existing check_workbasket tasks for the same workbasket.

Treat intermediate data structures as ephemeral
===============================================

A celery task may execute at any time, right now - or when a system comes up tomorrow, based on this assumption models such as TrackedModelCheck (which stores the result of a business rule check on a TrackedModel) are no longer passed to celery tasks by ID, instead all the information needed to receate the data is passed to the celery task, this means the system will still work even if developers delete these while it is running.

Reduce layers in business rule checking
=======================================

BusinessRuleChecker and LinkedModelsBusinessRuleChecker are now the only checkers, these now take BusinessRule instances, instead of being subclassed for each business rule.
While more parameters are passed when rules are checked a conceptual layer has been removed and the simplification is reflected with around 20 lines of code being removed from checks.py

Celery flower is now very easier to read
========================================
Due to the changes above, the output in celery flower should correspond more closely to a users intentions - ids of models.

Content Checksums
=================

Result caching now validates using checksums of the content, which should reduce the amount of checking the system needs to do.

When a workbasket has been published, it's content could invalidate some content in other unpublished workbaskets, by associating business rule checks with checksums of a models content, any models that do not clash can be skipped.

Model checksums (generated by `.content_hash()`) are not currently stored in the database (though it may be desirable to store them on TrackedModels, as it would provide an mechanism to address any content in the system).
The checksuming scheme is a combination of the type and a sha256 of the fields in `.copyable_fields` (which should represent the fields a user can edit, but not fields such as pk).
Blake3 was tested, as it provides a fast hashing algorithm, in practice it didn't provide much of a speedup over sha256.

PK ranges
=========

Occasionally workbaskets with many items may need to be checker (the initial workbasket has 9 million items).
Based on the observations that the ID column of the contained TrackedModels is mostly continguous, the system allows passing sequences of contiguous TrackedModels specified by tuples of (first_pk, last_pk).
This is relatively compact, suitable for passing over the network with celery and readable in Celery flower.

This also enables chunking of tasks - further enabled by specifying a maximum amount of items in each tuple.

On TrackedModelQueryset `.as_pk_intervals` and `.from_pk_intervals` are provided to go to and from this format.
@stuaxo stuaxo changed the title TP2000 360 TP2000 360 - Refactor business rule celery workflow, use content hashing to cache business rule logic. Aug 11, 2022
@stuaxo stuaxo changed the title TP2000 360 - Refactor business rule celery workflow, use content hashing to cache business rule logic. TP2000 360 - Refactor automated business rule checks. Aug 12, 2022
@stuaxo stuaxo force-pushed the TP2000-360 branch 2 times, most recently from c0f5168 to 3106ac4 Compare August 26, 2022 12:26
Migrate
TrackedModelChecks to new structure.
remove TransactionCheck.

Start moving business rules into the database, and provide sync_business_rules to do that, along with a mechanism to do this in tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants