New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add framework to write more robust action scheduler based migrations. #33233
Conversation
d03ec96
to
c2c2372
Compare
c2c2372
to
9a50012
Compare
Awesome work, however I'd like to suggest some structural changes that would make it even more awesome 🙂Additionally, I've left some smaller suggestions related to concrete pieces of code. Moving responsibilitiesI feel like the processors themselves are doing too much. Things like measuring the execution time and logging errors seem to be orthogonal to what pure "batch processing" means, and are best suited for the controller itself. Thus, I suggest to leave the base processor class as an interface class like this: abstract class BatchProcessor {
public function get_name() : string;
public function get_description() : string;
public function get_default_batch_size() : int;
public function get_total_pending_count() : int;
public function get_next_batch_to_process(int $size) : array;
public function process_batch(array $batch) : void;
} The responsibilities that are taken away from the processor go to the controller as follows:
Note also that with this structure:
An advantage of keeping the processors as (structurally) simple as possible is ease of maintainability: we can iterate on the controller as much as we need in successive pull requests without risking to break the already existing processors. Also we want to allow (and maybe even encourage) 3pd to develop their own solutions based on the WooCommerce batch processing engine (doesn't that name sound awesomely cool?), so the easiest it is to do that the better. Enqueueing vs registeringThere's a If that's the case it looks like instead of allowing to enqueue processors we could allow to register them. For example with a Even if we decide that performance wise this is not advisable, we could still require that any processor that is passed to If we go for the registration paradigm we would need an additional Additional hooksI think we could add a couple of hooks to complete the infrastructure:
RobustnessWe are catching errors thrown by a batch processing step, but not "higher level" errors such as a scheduled action timing out; I'm not sure if the intent is to implement the handling of these in this pull request or in a separate one. |
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
1732db7
to
bf13363
Compare
bf13363
to
bca535a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving responsibilities
Thanks for the suggestion, I have moved the methods around, BatchProcessor should be pretty lightweight now (and is stateless).
Enqueueing vs registering
Enqueuing works like registering as you suggest. BatchProcessingController
class is always aware of pending and scheduled processes. To maintain performance, the always-on
action only starts when a batch process is enqueued for the first time, and when everything is done it also goes away.
Since it's maintaining a state (and saving it in an option) we can show all pending actions in a nice UI later on if we want to.
Additional hooks
wc_batch_processing_batch_size
I am not sure about this, because the way I am imagining this, we will control the batch size ourselves depending upon how much time a process eventually. Default batch size is just a suggestion to start with, adding a filter complicates it as we won't have control over that algorithm in the future.
wc_batch_processing_finished
Let's add this one when we need it, right now we have a method that is called when a batch completes. I am hoping that we won't need this action because there is already a method which will be called in the BatchProcessor class when it's completed.
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
Thanks for the changes, and what you say about my "Enqueueing vs registering" and "Additional hooks" comments makes sense. I still miss the "Robustness" part, though; what's going to happen if a batch times out or throws an exception? Won't that interrupt the entire process? |
* | ||
* @return array Batch of records. | ||
*/ | ||
abstract public function get_batch_data( int $size, $last_processed ) : array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the method name: I still thing that get_next_batch_to_process
better conveys the purpose of the method, as get_batch_data
sounds too generic (batch data? Which batch data?)
About the $last_processed
argument: I get the intent, you want to shield the processor class from having to store any kind of intermediate state. However I think there's a better way to achieve this: a &$state
parameter passed by reference. Pass it as null
the first time, let the processor set/alter it as it wishes, and just store it treating it as an opaque value. Bonus points if you pass the same parameter to process_for_batch
and to get_total_pending_count
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the method name in https://github.com/woocommerce/woocommerce/pull/33233/files#diff-1e070cc170d3b65cd6ba61b60cbd0697c8431fda674b4720926c8fbc1afecce7R297. I agree about the state param, but then we would have to manage it's safe serialization and vice-versa, so let's add it when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case maybe we can just remove the parameter, since:
- We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.
- The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).
- The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).
So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and lean more towards not having a state to pass at all. I have removed the last_processed
argument in 311f014
plugins/woocommerce/src/Internal/Utilities/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Utilities/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/Utilities/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
One more thing: maybe these new classes should go into a separate directory/namespace, e.g. |
Test Results SummaryCommit SHA: b2236a8
To view the full E2E test report, click here. To view all test reports, visit the WooCommerce Test Reports Dashboard. |
So to make things robust, we currently run two processes, one is the normal batch process |
hey @Konamiman this is ready for another review (subjected to tests passing) |
94c9e69
to
f92bb65
Compare
So if I understood correctly:
Even then, I think it would be convenient to detect timeouts (I guess AS has some dedicated API for that?) and maybe trigger a dedicated action passing the processor class name and the offending batch, so that controllers can throttle back the batch size. |
plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
/** | ||
* Filters the instance of processor for current class name. | ||
* | ||
* @since 6.7.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to be more like 6.8.0 at this point.
* | ||
* @return array Batch of records. | ||
*/ | ||
abstract public function get_batch_data( int $size, $last_processed ) : array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case maybe we can just remove the parameter, since:
- We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.
- The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).
- The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).
So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)
plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
$processor = new $processor_class_name(); | ||
} | ||
if ( ! is_a( $processor, BatchProcessorInterface::class ) ) { | ||
throw new \Exception( 'Unable to initialize batch processor instance.' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the class name to the exception message to make troubleshooting easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Done in 311f014
plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php
Outdated
Show resolved
Hide resolved
* @param array $batch Batch that finished processing. | ||
*/ | ||
protected function log_error( \Exception $error, BatchProcessorInterface $batch_processor, array $batch ) : void { | ||
$batch_detail_string = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for composing this string:
- Check if all the batch items are numbers, and if so, use ArrayUtil::to_ranges_string
- If not, check if first and last items are numbers or strings, and include them.
- If not, revert to
print_r
or maybe even do not include nothing.
We can add some documentation recommending using numbers whenever possible, or strings if not, for the returned batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if going through all data in batches and type checking will be a bit of a performance issue? I have added a filter on the final log message in efb29e5, maybe that's enough for this PR. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that this code is going to run only in case of error, so it shouldn't run often and it it runs it means that something is broken and thus performance won't really matter. But sure, a filter will do for now.
Parent cron manager should be scheduled only after there is atleast one update needed to be scheduled. First cron acitons schedules immediately, afterwards it's only every hour.
efb29e5
to
4715fd2
Compare
I think this is ready to go as soon as the errors in the build are fixed (if these are really bugs in this PR) |
Hi @Konamiman, thanks for merging this pull request. Please take a look at these follow-up tasks you may need to perform:
|
All Submissions:
Changes proposed in this Pull Request:
From time to time, we have had the need to migrate a lot of data, either to power features or to fix bugs. However, we don’t have a shared mechanism to do this, so every developer working on implementing any kind of migration would do it from scratch. Since the developer implements it from scratch, we often also push bugs specific to the migration routine itself that we have previously solved before.
In this PR, I propose implementing a lightweight framework to write a migration that we can use most of the time. This framework will consider meta aspects of migration such as scheduling, re-entring, pausing, resuming, etc. We, anyway, need to work on making migrations more robust as part of the Custom order tables project, so it’s a good time to make it re-usable as we implement it.
We can create two classes –
BatchProcessingController
andBatchProcessor
, where:BatchProcessor
will be an abstract class that will provide a framework for implementing details specific to migration. Such as methods to process batches. All the meta details like starting the process, error handling, updating progress status, etc., will be already implemented.BatchProcessingController
class will take care of running the migration as needed, starting again if it gets stuck, scheduling in the action scheduler queue, etc. This also implements a periodic check for all pending migration and schedules action for error’ed migrations as well.In other words, we divide actual migration logic and logic for scheduling migration into two separate concerns. While the actual migration logic changes on case to case basis, the logic for scheduling the migrations can remain the same.
This PR also modifies custom order table migrations to use this common framework.
Closes #32922 .
How to test the changes in this Pull Request:
The easiest way to test is to run the COT migration from the interface -
Other information:
pnpm nx changelog <project>
?FOR PR REVIEWER ONLY: