Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add framework to write more robust action scheduler based migrations. #33233

Merged
merged 27 commits into from Jul 19, 2022

Conversation

vedanshujain
Copy link
Contributor

@vedanshujain vedanshujain commented May 27, 2022

All Submissions:

Changes proposed in this Pull Request:

From time to time, we have had the need to migrate a lot of data, either to power features or to fix bugs. However, we don’t have a shared mechanism to do this, so every developer working on implementing any kind of migration would do it from scratch. Since the developer implements it from scratch, we often also push bugs specific to the migration routine itself that we have previously solved before.

In this PR, I propose implementing a lightweight framework to write a migration that we can use most of the time. This framework will consider meta aspects of migration such as scheduling, re-entring, pausing, resuming, etc. We, anyway, need to work on making migrations more robust as part of the Custom order tables project, so it’s a good time to make it re-usable as we implement it.

We can create two classes – BatchProcessingController andBatchProcessor, where:

  1. BatchProcessor will be an abstract class that will provide a framework for implementing details specific to migration. Such as methods to process batches. All the meta details like starting the process, error handling, updating progress status, etc., will be already implemented.
  2. BatchProcessingController class will take care of running the migration as needed, starting again if it gets stuck, scheduling in the action scheduler queue, etc. This also implements a periodic check for all pending migration and schedules action for error’ed migrations as well.

In other words, we divide actual migration logic and logic for scheduling migration into two separate concerns. While the actual migration logic changes on case to case basis, the logic for scheduling the migrations can remain the same.

This PR also modifies custom order table migrations to use this common framework.

Closes #32922 .

How to test the changes in this Pull Request:

The easiest way to test is to run the COT migration from the interface -

  1. Go to WooCommerce > Status > Tools and delete and recreate custom order tables.
  2. Go to WooCommerce > Settings > Advanced > Custom Data Stores and start the migration process.
  3. Verify that migration starts and completes as expected.

Other information:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes, as applicable?
  • Have you successfully run tests with your changes locally?
  • Have you created a changelog file for each project being changed, ie pnpm nx changelog <project>?

FOR PR REVIEWER ONLY:

  • I have reviewed that everything is sanitized/escaped appropriately for any SQL or XSS injection possibilities. I made sure Linting is not ignored or disabled.

@github-actions github-actions bot added the plugin: woocommerce Issues related to the WooCommerce Core plugin. label May 27, 2022
@woocommercebot woocommercebot added the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jun 14, 2022
@vedanshujain vedanshujain marked this pull request as ready for review June 14, 2022 15:37
@vedanshujain vedanshujain requested review from a team and Konamiman and removed request for a team June 14, 2022 15:37
@Konamiman
Copy link
Contributor

Konamiman commented Jun 21, 2022

Awesome work, however I'd like to suggest some structural changes that would make it even more awesome 🙂Additionally, I've left some smaller suggestions related to concrete pieces of code.

Moving responsibilities

I feel like the processors themselves are doing too much. Things like measuring the execution time and logging errors seem to be orthogonal to what pure "batch processing" means, and are best suited for the controller itself.

Thus, I suggest to leave the base processor class as an interface class like this:

abstract class BatchProcessor {
	public function get_name() : string;
	public function get_description() : string;
	public function get_default_batch_size() : int;
        public function get_total_pending_count() : int;
        public function get_next_batch_to_process(int $size) : array;
        public function process_batch(array $batch) : void;
}

The responsibilities that are taken away from the processor go to the controller as follows:

  • Measuring execution time, logging errors: right after executing process_batch.
  • Deciding whether further batches need to be processed and thus if rescheduling is needed or not: this can be decided by checking if get_total_pending_count returns zero after executing process_batch.
  • Deciding the size of the batch to process: although the processors can give a hint with get_default_batch_size, the controller itself should have the final saying on this (the $size parameter in get_next_batch_to_process); this allows for easy customization via setting or filter.

Note also that with this structure:

  • Processors can be stateless, but they can store state data by themselves if they need.
  • Processors can still do any required cleanup if at the end of process_batch they find that there isn't any more pending items to process.

An advantage of keeping the processors as (structurally) simple as possible is ease of maintainability: we can iterate on the controller as much as we need in successive pull requests without risking to break the already existing processors. Also we want to allow (and maybe even encourage) 3pd to develop their own solutions based on the WooCommerce batch processing engine (doesn't that name sound awesomely cool?), so the easiest it is to do that the better.

Enqueueing vs registering

There's a enqueue_processor method in the controller that is used to make the engine aware of a processor that has pending changes, however the pull request description says "This also implements a periodic check for all pending migration and schedules action for error’ed migrations as well", so I guess there's an "always on" action that will permanently cycle through any processors that were enqueued in the past in order to check if there are new items ready to be processed.

If that's the case it looks like instead of allowing to enqueue processors we could allow to register them. For example with a wc_registered_batch_processors filter that receives and returns an array of processor class names. The "always on" scheduled action would then just run get_total_pending_count on all the registered processors and run process_batch on those for which
get_total_pending_count returns non-zero.

Even if we decide that performance wise this is not advisable, we could still require that any processor that is passed to enqueue_processor must be previously registered via the said filter. This allows to show (via UI or code) a nice list of known processors, together with their names and descriptions, default batch sizes, status (count of pending items) and last/average batch processing time. I'm not saying that all of this should be implemented in this pull request, though; but laying a solid foundation for being able to do that later is important.

If we go for the registration paradigm we would need an additional get_registered_processors method in the controller, as a complement to get_pending.

Additional hooks

I think we could add a couple of hooks to complete the infrastructure:

  • wc_batch_processing_batch_size: A filter that gets the processor instance and the default batch size given by the class, and returns the actual batch size to use. Alternatively this could be handled via settings, but then there should be one setting per processor class.
  • wc_batch_processing_finished: An action triggered after running process_batch on a processor, it would receive (as arguments or in a single associative array) the processor instance, the processed batch, the elapsed time, and the error if there's one.

Robustness

We are catching errors thrown by a batch processing step, but not "higher level" errors such as a scheduled action timing out; I'm not sure if the intent is to implement the handling of these in this pull request or in a separate one.

@woocommercebot woocommercebot removed the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jun 24, 2022
Copy link
Contributor Author

@vedanshujain vedanshujain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving responsibilities

Thanks for the suggestion, I have moved the methods around, BatchProcessor should be pretty lightweight now (and is stateless).

Enqueueing vs registering

Enqueuing works like registering as you suggest. BatchProcessingController class is always aware of pending and scheduled processes. To maintain performance, the always-on action only starts when a batch process is enqueued for the first time, and when everything is done it also goes away.

Since it's maintaining a state (and saving it in an option) we can show all pending actions in a nice UI later on if we want to.

Additional hooks

wc_batch_processing_batch_size

I am not sure about this, because the way I am imagining this, we will control the batch size ourselves depending upon how much time a process eventually. Default batch size is just a suggestion to start with, adding a filter complicates it as we won't have control over that algorithm in the future.

wc_batch_processing_finished

Let's add this one when we need it, right now we have a method that is called when a batch completes. I am hoping that we won't need this action because there is already a method which will be called in the BatchProcessor class when it's completed.

@Konamiman
Copy link
Contributor

Thanks for the changes, and what you say about my "Enqueueing vs registering" and "Additional hooks" comments makes sense. I still miss the "Robustness" part, though; what's going to happen if a batch times out or throws an exception? Won't that interrupt the entire process?

*
* @return array Batch of records.
*/
abstract public function get_batch_data( int $size, $last_processed ) : array;
Copy link
Contributor

@Konamiman Konamiman Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the method name: I still thing that get_next_batch_to_process better conveys the purpose of the method, as get_batch_data sounds too generic (batch data? Which batch data?)

About the $last_processed argument: I get the intent, you want to shield the processor class from having to store any kind of intermediate state. However I think there's a better way to achieve this: a &$state parameter passed by reference. Pass it as null the first time, let the processor set/alter it as it wishes, and just store it treating it as an opaque value. Bonus points if you pass the same parameter to process_for_batch and to get_total_pending_count too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the method name in https://github.com/woocommerce/woocommerce/pull/33233/files#diff-1e070cc170d3b65cd6ba61b60cbd0697c8431fda674b4720926c8fbc1afecce7R297. I agree about the state param, but then we would have to manage it's safe serialization and vice-versa, so let's add it when we need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case maybe we can just remove the parameter, since:

  1. We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.
  2. The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).
  3. The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).

So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and lean more towards not having a state to pass at all. I have removed the last_processed argument in 311f014

@Konamiman
Copy link
Contributor

Konamiman commented Jun 29, 2022

One more thing: maybe these new classes should go into a separate directory/namespace, e.g. Internal\BatchProcessing. Utilities is more suited for classes with small and often static utility methods to be used freely across the codebase.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 11, 2022

Test Results Summary

Commit SHA: b2236a8

Test 🧪Passed ✅Failed 🚨Broken 🚧Skipped ⏭️Unknown ❔Total 📊Duration ⏱️
API Tests11500201171m 0s
E2E Tests185001018614m 19s
To view the full API test report, click here.
To view the full E2E test report, click here.
To view all test reports, visit the WooCommerce Test Reports Dashboard.

@vedanshujain
Copy link
Contributor Author

I still miss the "Robustness" part, though; what's going to happen if a batch times out or throws an exception?

So to make things robust, we currently run two processes, one is the normal batch process SINGLE_BATCH_PROCESS_ACTION which schedules itself after processing the batch. However, we run another process CONTROLLER_CRON_NAME which periodically checks if whether there is a pending process that is not scheduled, (and schedules if not.

@vedanshujain
Copy link
Contributor Author

hey @Konamiman this is ready for another review (subjected to tests passing)

@Konamiman
Copy link
Contributor

So to make things robust, we currently run two processes, one is the normal batch process SINGLE_BATCH_PROCESS_ACTION which schedules itself after processing the batch. However, we run another process CONTROLLER_CRON_NAME which periodically checks if whether there is a pending process that is not scheduled, (and schedules if not.

So if I understood correctly:

  1. If a batch processing throws an exception, that's going to be caught and logged, then the next processing will be scheduled as usual.
  2. If a batch processing causes the action to timeout, the CONTROLLER_CRON_NAME will detect that in at most one hour, and will reschedule the action.

Even then, I think it would be convenient to detect timeouts (I guess AS has some dedicated API for that?) and maybe trigger a dedicated action passing the processor class name and the offending batch, so that controllers can throttle back the batch size.

/**
* Filters the instance of processor for current class name.
*
* @since 6.7.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be more like 6.8.0 at this point.

*
* @return array Batch of records.
*/
abstract public function get_batch_data( int $size, $last_processed ) : array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case maybe we can just remove the parameter, since:

  1. We don't know what the processors will return for processing (each item of a "batch" can be anything) so we'll have the same problem with (de)serialization.
  2. The processor itself can store any needed state data by itself (same reasoning as in the case of the cleanup after no more items are left to process).
  3. The only processor instance we have so far, the one for orders, doesn't use it at all (and this is a good example of point 2).

So I think that either we provide an opaque state object that we pass around, dealing with any (de)serialization that might be needed, or we don't provide anything at all. That "last processed" argument feels like we are assuming how the processor behaves (what if the batch data it returns, or the processing it performs, isn't sequential?)

$processor = new $processor_class_name();
}
if ( ! is_a( $processor, BatchProcessorInterface::class ) ) {
throw new \Exception( 'Unable to initialize batch processor instance.' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the class name to the exception message to make troubleshooting easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done in 311f014

* @param array $batch Batch that finished processing.
*/
protected function log_error( \Exception $error, BatchProcessorInterface $batch_processor, array $batch ) : void {
$batch_detail_string = '';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for composing this string:

  1. Check if all the batch items are numbers, and if so, use ArrayUtil::to_ranges_string
  2. If not, check if first and last items are numbers or strings, and include them.
  3. If not, revert to print_r or maybe even do not include nothing.

We can add some documentation recommending using numbers whenever possible, or strings if not, for the returned batches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if going through all data in batches and type checking will be a bit of a performance issue? I have added a filter on the final log message in efb29e5, maybe that's enough for this PR. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that this code is going to run only in case of error, so it shouldn't run often and it it runs it means that something is broken and thus performance won't really matter. But sure, a filter will do for now.

@woocommercebot woocommercebot added the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jul 13, 2022
@Konamiman Konamiman self-requested a review July 18, 2022 06:47
@Konamiman
Copy link
Contributor

I think this is ready to go as soon as the errors in the build are fixed (if these are really bugs in this PR)

@woocommercebot woocommercebot removed the release: highlight Issues that have a high user impact and need to be discussed/paid attention to. label Jul 19, 2022
@Konamiman Konamiman merged commit df1937d into trunk Jul 19, 2022
@Konamiman Konamiman deleted the enhancement/32922 branch July 19, 2022 12:42
@github-actions github-actions bot added this to the 6.9.0 milestone Jul 19, 2022
@github-actions
Copy link
Contributor

Hi @Konamiman, thanks for merging this pull request. Please take a look at these follow-up tasks you may need to perform:

  • Add the release: add testing instructions label

@Konamiman Konamiman added the release: add testing instructions PRs that have not had testing instructions added to the wiki. [auto] label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: woocommerce Issues related to the WooCommerce Core plugin. release: add testing instructions PRs that have not had testing instructions added to the wiki. [auto]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[COT] Explore adding more robustness to the action-based migration process that we can use 99% of time.
3 participants