Add framework to write more robust action scheduler based migrations. #33233

vedanshujain · 2022-05-27T10:10:49Z

All Submissions:

Have you followed the WooCommerce Contributing guideline?
Does your code follow the WordPress' coding standards?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes proposed in this Pull Request:

From time to time, we have had the need to migrate a lot of data, either to power features or to fix bugs. However, we don’t have a shared mechanism to do this, so every developer working on implementing any kind of migration would do it from scratch. Since the developer implements it from scratch, we often also push bugs specific to the migration routine itself that we have previously solved before.

In this PR, I propose implementing a lightweight framework to write a migration that we can use most of the time. This framework will consider meta aspects of migration such as scheduling, re-entring, pausing, resuming, etc. We, anyway, need to work on making migrations more robust as part of the Custom order tables project, so it’s a good time to make it re-usable as we implement it.

We can create two classes – BatchProcessingController andBatchProcessor, where:

BatchProcessor will be an abstract class that will provide a framework for implementing details specific to migration. Such as methods to process batches. All the meta details like starting the process, error handling, updating progress status, etc., will be already implemented.
BatchProcessingController class will take care of running the migration as needed, starting again if it gets stuck, scheduling in the action scheduler queue, etc. This also implements a periodic check for all pending migration and schedules action for error’ed migrations as well.

In other words, we divide actual migration logic and logic for scheduling migration into two separate concerns. While the actual migration logic changes on case to case basis, the logic for scheduling the migrations can remain the same.

This PR also modifies custom order table migrations to use this common framework.

Closes #32922 .

How to test the changes in this Pull Request:

The easiest way to test is to run the COT migration from the interface -

Go to WooCommerce > Status > Tools and delete and recreate custom order tables.
Go to WooCommerce > Settings > Advanced > Custom Data Stores and start the migration process.
Verify that migration starts and completes as expected.

Other information:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your changes, as applicable?
Have you successfully run tests with your changes locally?
Have you created a changelog file for each project being changed, ie pnpm nx changelog <project>?

FOR PR REVIEWER ONLY:

I have reviewed that everything is sanitized/escaped appropriately for any SQL or XSS injection possibilities. I made sure Linting is not ignored or disabled.

Konamiman · 2022-06-21T08:35:09Z

Awesome work, however I'd like to suggest some structural changes that would make it even more awesome 🙂Additionally, I've left some smaller suggestions related to concrete pieces of code.

Moving responsibilities

I feel like the processors themselves are doing too much. Things like measuring the execution time and logging errors seem to be orthogonal to what pure "batch processing" means, and are best suited for the controller itself.

Thus, I suggest to leave the base processor class as an interface class like this:

abstract class BatchProcessor {
	public function get_name() : string;
	public function get_description() : string;
	public function get_default_batch_size() : int;
        public function get_total_pending_count() : int;
        public function get_next_batch_to_process(int $size) : array;
        public function process_batch(array $batch) : void;
}

The responsibilities that are taken away from the processor go to the controller as follows:

Measuring execution time, logging errors: right after executing process_batch.
Deciding whether further batches need to be processed and thus if rescheduling is needed or not: this can be decided by checking if get_total_pending_count returns zero after executing process_batch.
Deciding the size of the batch to process: although the processors can give a hint with get_default_batch_size, the controller itself should have the final saying on this (the $size parameter in get_next_batch_to_process); this allows for easy customization via setting or filter.

Note also that with this structure:

Processors can be stateless, but they can store state data by themselves if they need.
Processors can still do any required cleanup if at the end of process_batch they find that there isn't any more pending items to process.

An advantage of keeping the processors as (structurally) simple as possible is ease of maintainability: we can iterate on the controller as much as we need in successive pull requests without risking to break the already existing processors. Also we want to allow (and maybe even encourage) 3pd to develop their own solutions based on the WooCommerce batch processing engine (doesn't that name sound awesomely cool?), so the easiest it is to do that the better.

Enqueueing vs registering

There's a enqueue_processor method in the controller that is used to make the engine aware of a processor that has pending changes, however the pull request description says "This also implements a periodic check for all pending migration and schedules action for error’ed migrations as well", so I guess there's an "always on" action that will permanently cycle through any processors that were enqueued in the past in order to check if there are new items ready to be processed.

If that's the case it looks like instead of allowing to enqueue processors we could allow to register them. For example with a wc_registered_batch_processors filter that receives and returns an array of processor class names. The "always on" scheduled action would then just run get_total_pending_count on all the registered processors and run process_batch on those for which
get_total_pending_count returns non-zero.

Even if we decide that performance wise this is not advisable, we could still require that any processor that is passed to enqueue_processor must be previously registered via the said filter. This allows to show (via UI or code) a nice list of known processors, together with their names and descriptions, default batch sizes, status (count of pending items) and last/average batch processing time. I'm not saying that all of this should be implemented in this pull request, though; but laying a solid foundation for being able to do that later is important.

If we go for the registration paradigm we would need an additional get_registered_processors method in the controller, as a complement to get_pending.

Additional hooks

I think we could add a couple of hooks to complete the infrastructure:

wc_batch_processing_batch_size: A filter that gets the processor instance and the default batch size given by the class, and returns the actual batch size to use. Alternatively this could be handled via settings, but then there should be one setting per processor class.
wc_batch_processing_finished: An action triggered after running process_batch on a processor, it would receive (as arguments or in a single associative array) the processor instance, the processed batch, the elapsed time, and the error if there's one.

Robustness

We are catching errors thrown by a batch processing step, but not "higher level" errors such as a scheduled action timing out; I'm not sure if the intent is to implement the handling of these in this pull request or in a separate one.

plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php

plugins/woocommerce/includes/class-woocommerce.php

plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php

plugins/woocommerce/src/Internal/Updates/BatchProcessor.php

vedanshujain

Moving responsibilities

Thanks for the suggestion, I have moved the methods around, BatchProcessor should be pretty lightweight now (and is stateless).

Enqueueing vs registering

Enqueuing works like registering as you suggest. BatchProcessingController class is always aware of pending and scheduled processes. To maintain performance, the always-on action only starts when a batch process is enqueued for the first time, and when everything is done it also goes away.

Since it's maintaining a state (and saving it in an option) we can show all pending actions in a nice UI later on if we want to.

Additional hooks

wc_batch_processing_batch_size

I am not sure about this, because the way I am imagining this, we will control the batch size ourselves depending upon how much time a process eventually. Default batch size is just a suggestion to start with, adding a filter complicates it as we won't have control over that algorithm in the future.

wc_batch_processing_finished

Let's add this one when we need it, right now we have a method that is called when a batch completes. I am hoping that we won't need this action because there is already a method which will be called in the BatchProcessor class when it's completed.

plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php

plugins/woocommerce/includes/class-woocommerce.php

plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php

plugins/woocommerce/src/Internal/Updates/BatchProcessor.php

plugins/woocommerce/src/Internal/Updates/BatchProcessingController.php

Konamiman · 2022-06-28T10:56:42Z

Thanks for the changes, and what you say about my "Enqueueing vs registering" and "Additional hooks" comments makes sense. I still miss the "Robustness" part, though; what's going to happen if a batch times out or throws an exception? Won't that interrupt the entire process?

Konamiman · 2022-06-28T10:43:22Z

plugins/woocommerce/src/Internal/Utilities/BatchProcessor.php

+	 *
+	 * @return array Batch of records.
+	 */
+	abstract public function get_batch_data( int $size, $last_processed ) : array;


About the method name: I still thing that get_next_batch_to_process better conveys the purpose of the method, as get_batch_data sounds too generic (batch data? Which batch data?)

About the $last_processed argument: I get the intent, you want to shield the processor class from having to store any kind of intermediate state. However I think there's a better way to achieve this: a &$state parameter passed by reference. Pass it as null the first time, let the processor set/alter it as it wishes, and just store it treating it as an opaque value. Bonus points if you pass the same parameter to process_for_batch and to get_total_pending_count too.

I have changed the method name in https://github.com/woocommerce/woocommerce/pull/33233/files#diff-1e070cc170d3b65cd6ba61b60cbd0697c8431fda674b4720926c8fbc1afecce7R297. I agree about the state param, but then we would have to manage it's safe serialization and vice-versa, so let's add it when we need it.

In that case maybe we can just remove the parameter, since:

We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.

The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).

The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).

So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)

I agree and lean more towards not having a state to pass at all. I have removed the last_processed argument in 311f014

plugins/woocommerce/src/Internal/Utilities/BatchProcessor.php

plugins/woocommerce/src/Internal/Utilities/BatchProcessingController.php

Konamiman · 2022-06-29T07:28:00Z

One more thing: maybe these new classes should go into a separate directory/namespace, e.g. Internal\BatchProcessing. Utilities is more suited for classes with small and often static utility methods to be used freely across the codebase.

github-actions · 2022-07-11T06:17:42Z

Test Results Summary

Commit SHA: b2236a8

Test 🧪	Passed ✅	Failed 🚨	Broken 🚧	Skipped ⏭️	Unknown ❔	Total 📊	Duration ⏱️
API Tests	115	0	0	2	0	117	1m 0s
E2E Tests	185	0	0	1	0	186	14m 19s

To view the full API test report, click here.
To view the full E2E test report, click here.
To view all test reports, visit the WooCommerce Test Reports Dashboard.

vedanshujain · 2022-07-11T06:44:19Z

I still miss the "Robustness" part, though; what's going to happen if a batch times out or throws an exception?

So to make things robust, we currently run two processes, one is the normal batch process SINGLE_BATCH_PROCESS_ACTION which schedules itself after processing the batch. However, we run another process CONTROLLER_CRON_NAME which periodically checks if whether there is a pending process that is not scheduled, (and schedules if not.

vedanshujain · 2022-07-11T06:53:20Z

hey @Konamiman this is ready for another review (subjected to tests passing)

Konamiman · 2022-07-12T08:32:44Z

So to make things robust, we currently run two processes, one is the normal batch process SINGLE_BATCH_PROCESS_ACTION which schedules itself after processing the batch. However, we run another process CONTROLLER_CRON_NAME which periodically checks if whether there is a pending process that is not scheduled, (and schedules if not.

So if I understood correctly:

If a batch processing throws an exception, that's going to be caught and logged, then the next processing will be scheduled as usual.
If a batch processing causes the action to timeout, the CONTROLLER_CRON_NAME will detect that in at most one hour, and will reschedule the action.

Even then, I think it would be convenient to detect timeouts (I guess AS has some dedicated API for that?) and maybe trigger a dedicated action passing the processor class name and the offending batch, so that controllers can throttle back the batch size.

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

Konamiman · 2022-07-12T07:18:50Z

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

+		/**
+		 * Filters the instance of processor for current class name.
+		 *
+		 * @since 6.7.0.


I think this is going to be more like 6.8.0 at this point.

Konamiman · 2022-07-12T07:26:57Z

plugins/woocommerce/src/Internal/Utilities/BatchProcessor.php

+	 *
+	 * @return array Batch of records.
+	 */
+	abstract public function get_batch_data( int $size, $last_processed ) : array;


In that case maybe we can just remove the parameter, since:

We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.

The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).

The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).

So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

Konamiman · 2022-07-12T08:39:48Z

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

+			$processor = new $processor_class_name();
+		}
+		if ( ! is_a( $processor, BatchProcessorInterface::class ) ) {
+			throw new \Exception( 'Unable to initialize batch processor instance.' );


Let's add the class name to the exception message to make troubleshooting easier?

👍 Done in 311f014

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

Konamiman · 2022-07-12T14:23:01Z

plugins/woocommerce/src/Internal/BatchProcessing/BatchProcessingController.php

+	 * @param array                   $batch Batch that finished processing.
+	 */
+	protected function log_error( \Exception $error, BatchProcessorInterface $batch_processor, array $batch ) : void {
+		$batch_detail_string = '';


Suggestion for composing this string:

Check if all the batch items are numbers, and if so, use ArrayUtil::to_ranges_string

If not, check if first and last items are numbers or strings, and include them.

If not, revert to print_r or maybe even do not include nothing.

We can add some documentation recommending using numbers whenever possible, or strings if not, for the returned batches.

I wonder if going through all data in batches and type checking will be a bit of a performance issue? I have added a filter on the final log message in efb29e5, maybe that's enough for this PR. wdyt?

Keep in mind that this code is going to run only in case of error, so it shouldn't run often and it it runs it means that something is broken and thus performance won't really matter. But sure, a filter will do for now.

Parent cron manager should be scheduled only after there is atleast one update needed to be scheduled. First cron acitons schedules immediately, afterwards it's only every hour.

Konamiman · 2022-07-18T06:48:20Z

I think this is ready to go as soon as the errors in the build are fixed (if these are really bugs in this PR)

github-actions · 2022-07-19T12:42:54Z

Hi @Konamiman, thanks for merging this pull request. Please take a look at these follow-up tasks you may need to perform:

Add the release: add testing instructions label

github-actions bot added the plugin: woocommerce Issues related to the WooCommerce Core plugin. label May 27, 2022

vedanshujain force-pushed the enhancement/32922 branch from d03ec96 to c2c2372 Compare May 27, 2022 10:16

vedanshujain force-pushed the enhancement/32922 branch from c2c2372 to 9a50012 Compare June 14, 2022 07:24

woocommercebot added the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jun 14, 2022

vedanshujain marked this pull request as ready for review June 14, 2022 15:37

vedanshujain requested review from a team and Konamiman and removed request for a team June 14, 2022 15:37

Konamiman requested changes Jun 21, 2022

View reviewed changes

vedanshujain force-pushed the enhancement/32922 branch from 1732db7 to bf13363 Compare June 24, 2022 11:05

woocommercebot removed the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jun 24, 2022

vedanshujain force-pushed the enhancement/32922 branch from bf13363 to bca535a Compare June 24, 2022 13:08

vedanshujain commented Jun 27, 2022

View reviewed changes

vedanshujain requested a review from Konamiman June 27, 2022 08:11

Konamiman requested changes Jun 28, 2022

View reviewed changes

vedanshujain requested a review from Konamiman July 11, 2022 06:53

vedanshujain force-pushed the enhancement/32922 branch from 94c9e69 to f92bb65 Compare July 11, 2022 08:09

woocommerce deleted a comment from botwoo Jul 11, 2022

Konamiman requested changes Jul 12, 2022

View reviewed changes

Konamiman reviewed Jul 12, 2022

View reviewed changes

woocommercebot added the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jul 13, 2022

vedanshujain added 2 commits July 13, 2022 12:48

Add framework to write more robust action scheduler based migrations.

d2217eb

Add framework for robust Action based migrations.

239728d

vedanshujain added 17 commits July 13, 2022 12:48

Add changelog.

f16c129

Refactor batch processor to make processor class lighter.

e79e0be

Update tests for modified logic.

ce5e0e5

Fine tuning scheduling parent cron and individual actions.

e320205

Parent cron manager should be scheduled only after there is atleast one update needed to be scheduled. First cron acitons schedules immediately, afterwards it's only every hour.

Rename method for better clarity.

1844388

Modified class names for clarity.

a4d0bd0

PR feedback changes.

77c2159

Removed unused method.

d0e1ef1

Address PR feedback.

dee5225

Applied coding standards.

0278951

Ignore todo statmement phpcs error.

745cfd6

Make scheduling initial actions unique.

fb50639

Address PR feedback.

4203699

Add logging.

aae53f3

Better error handling, also throw error to AS when a process failed.

78f7dd9

Remove callee expectation from total pending count.

7e8fae2

Add a filter to modify error message in logs.

4715fd2

vedanshujain force-pushed the enhancement/32922 branch from efb29e5 to 4715fd2 Compare July 13, 2022 07:18

Update code comments, rename methods/parameters, add return types.

71eb3a3

Konamiman self-requested a review July 18, 2022 06:47

Make the method declaration compatible with parent interface.

b2236a8

woocommercebot removed the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jul 19, 2022

Konamiman approved these changes Jul 19, 2022

View reviewed changes

Konamiman merged commit df1937d into trunk Jul 19, 2022

Konamiman deleted the enhancement/32922 branch July 19, 2022 12:42

github-actions bot added this to the 6.9.0 milestone Jul 19, 2022

Konamiman added the release: add testing instructions PRs that have not had testing instructions added to the wiki. [auto] label Jul 19, 2022

github-actions bot mentioned this pull request Sep 14, 2022

Test WooCommerce 6.9 UVLabs/location-picker-at-checkout-for-woocommerce#50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add framework to write more robust action scheduler based migrations. #33233

Add framework to write more robust action scheduler based migrations. #33233

vedanshujain commented May 27, 2022 •

edited

Konamiman commented Jun 21, 2022 •

edited

vedanshujain left a comment

Konamiman commented Jun 28, 2022

Konamiman Jun 28, 2022 •

edited

vedanshujain Jul 11, 2022

Konamiman Jul 12, 2022

vedanshujain Jul 12, 2022

Konamiman commented Jun 29, 2022 •

edited

github-actions bot commented Jul 11, 2022 •

edited

vedanshujain commented Jul 11, 2022

vedanshujain commented Jul 11, 2022

Konamiman commented Jul 12, 2022

Konamiman Jul 12, 2022

Konamiman Jul 12, 2022

Konamiman Jul 12, 2022

vedanshujain Jul 12, 2022

Konamiman Jul 12, 2022

vedanshujain Jul 13, 2022

Konamiman Jul 13, 2022

Konamiman commented Jul 18, 2022

github-actions bot commented Jul 19, 2022

Add framework to write more robust action scheduler based migrations. #33233

Add framework to write more robust action scheduler based migrations. #33233

Conversation

vedanshujain commented May 27, 2022 • edited

All Submissions:

Changes proposed in this Pull Request:

How to test the changes in this Pull Request:

Other information:

FOR PR REVIEWER ONLY:

Konamiman commented Jun 21, 2022 • edited

Moving responsibilities

Enqueueing vs registering

Additional hooks

Robustness

vedanshujain left a comment

Choose a reason for hiding this comment

Konamiman commented Jun 28, 2022

Konamiman Jun 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Konamiman commented Jun 29, 2022 • edited

github-actions bot commented Jul 11, 2022 • edited

Test Results Summary

vedanshujain commented Jul 11, 2022

vedanshujain commented Jul 11, 2022

Konamiman commented Jul 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Konamiman commented Jul 18, 2022

github-actions bot commented Jul 19, 2022

vedanshujain commented May 27, 2022 •

edited

Konamiman commented Jun 21, 2022 •

edited

Konamiman Jun 28, 2022 •

edited

Konamiman commented Jun 29, 2022 •

edited

github-actions bot commented Jul 11, 2022 •

edited