New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate report tables using Action Scheduler #1291

Merged
merged 15 commits into from Jan 16, 2019

Conversation

3 participants
@jeffstieler
Copy link
Contributor

jeffstieler commented Jan 11, 2019

Fixes #860.

This PR supplants work done in #1050.

This PR introduces (re)generating report tables in batches. The batch size can be customized based on the type (customer, order, etc). In order to avoid timeouts and memory issues when creating many batches (for large databases), the batch creation itself is batched (and that size can be customized).

This PR also introduces a mechanism that allows for job to be delayed until another one is finished. This is used to ensure that the orders and products lookup tables are (re)generated after the customers lookup is finished.

I tested this against a store with ~6500 customers and ~19000 orders.

To test:

  • Verify that on install, reports tables are generated (can check Tools > Scheduled Actions if it's taking a while)
  • Verify that WooCommerce > Status > Tools > Regenerate Reports works as it did before
  • Test against a large dataset (lots of customers, products, orders) - verify that actions get scheduled
@timmyc
Copy link
Collaborator

timmyc left a comment

This is rad. Couple of quick questions to share while i get a wc.com vagrant instance up and running and take it for a spin.

add_action( 'wp_loaded', array( 'WC_Admin_Api_Init', 'customers_report_data_store_init' ) );
// Initialize scheduled action handlers.
add_action( self::QUEUE_BATCH_ACTION, array( __CLASS__, 'queue_batches' ), 10, 3 );

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Should we comment why the increased priority on these two actions?

This comment has been minimized.

@jeffstieler

jeffstieler Jan 14, 2019

Author Contributor

I'm not sure what you mean - 10 is the default priority - this is really to make sure the 3 args are passed to the callback.

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

gah, mondays.

This comment has been minimized.

@peterfabian

peterfabian Jan 15, 2019

Contributor

Maybe the question was meant for the previous hook (WC_Admin_Api_Init::customers_report_data_store_init) which changed the firing action from wp prio 9 to wp_loaded prio 10?
Anyway, I think it should be fine as wp_loaded should run before wp.

'paginate' => true,
)
);
$result = $order_query->get_orders();

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Minor but should we return/break out of this logic if $result->total === 0?

This comment has been minimized.

@jeffstieler

jeffstieler Jan 14, 2019

Author Contributor

Done in 10a686b.

global $wpdb;
public static function get_batch_size( $action ) {
$batch_sizes = array(
self::QUEUE_BATCH_ACTION => 100,

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Since we impose a limit of page sizes on the REST API to 100, should we comment here around this number essentially equaling to page size / recommend care if increasing it?

This comment has been minimized.

@jeffstieler

jeffstieler Jan 14, 2019

Author Contributor

These were completely arbitrary choices for batch sizes - I didn't realize this was the same as REST API result sizes 😛. This number relates to the number of Scheduled Actions that will be queued in any given action run. Limiting this should avoid problems for sites that can't handle more than 100 posts being created at a time.

/**
* Process a batch of orders to update (stats and products).
*
* @param int $batch_number Batch number to process (essentially a query page number).

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Really love how you used the page number here, creative solution!

self::CUSTOMERS_BATCH_ACTION => 25,
self::ORDERS_BATCH_ACTION => 10,
);
$batch_size = isset( $batch_sizes[ $action ] ) ? $batch_sizes[ $action ] : 25;

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Curious how/when isset( $batch_sizes[ $action ] ) would be falsey here?

This comment has been minimized.

@jeffstieler

jeffstieler Jan 14, 2019

Author Contributor

If we try to get_batch_size() for a new action hook that hasn't been added to the above array. Probably a bit defensive, but seems harmless enough. 😄

* @param int $batch_size Batch size.
* @param string $action Batch action name.
*/
return apply_filters( 'wc_admin_report_regenerate_batch_size', $batch_size, $action );

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Nice.

'number' => 1,
)
);
$total_customers = $customer_query->get_total();

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Same comment here about bailing if there are zero customers / new install.

This comment has been minimized.

@jeffstieler

jeffstieler Jan 14, 2019

Author Contributor

Done in 10a686b.

* @param string $action Action.
* @return int
*/
public function filter_batch_size( $batch_size, $action ) {

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Cool trick right here.

/**
* Test that large batches get split properly.
*/
public function test_queue_batches_splits_into_batches_correctly() {

This comment has been minimized.

@timmyc

timmyc Jan 14, 2019

Collaborator

Love seeing these tests against the queue sizes.

@timmyc

timmyc approved these changes Jan 15, 2019

Copy link
Collaborator

timmyc left a comment

Code is looking good to me, and this tested out well against my local woocommerce.com snapshot. I believe it took around 40 minutes to run on the default action scheduler settings, and no changes to vvv from a memory standpoint.

@peterfabian
Copy link
Contributor

peterfabian left a comment

Great work, Jeff. I have a couple of concerns around dependencies and error handling as commented, but otherwise a solid one.
Really like i) that you added tests, ii) your clear style of coding.

I'm generating a bunch of orders to test it out now.

Show resolved Hide resolved includes/class-wc-admin-api-init.php Outdated
Show resolved Hide resolved includes/class-wc-admin-api-init.php
Show resolved Hide resolved includes/class-wc-admin-api-init.php
$order_ids = $order_query->get_orders();
foreach ( $order_ids as $order_id ) {
// TODO: schedule single order update if this fails?

This comment has been minimized.

@peterfabian

peterfabian Jan 15, 2019

Contributor

This is a valid concern, a bunch of orders might be missing if one fails. Catching the error and scheduling events for individual orders is a clever workaround, if that would be possible.

Show resolved Hide resolved includes/class-wc-admin-api-init.php

@jeffstieler jeffstieler force-pushed the update/860-regenerate-reports-action-scheduler branch from 19cc37d to 9c2bd9a Jan 15, 2019

@peterfabian
Copy link
Contributor

peterfabian left a comment

I've tested regenerating the data and it seems to work fine with the following exceptions:

  • admin with user_id 1 was not present in wp_wc_customer_lookup for some reason after regeneration
  • customer_id in wp_wc_order_stats was 0 for all orders made by admin
  • returning_customer seems to not work correctly

Have you encountered similar problems by any chance?

Please see below a couple of records from the beginning of the wp_wc_order_stats after regeneration:
screenshot 2019-01-16 at 11 39 20

@jeffstieler

This comment has been minimized.

Copy link
Contributor Author

jeffstieler commented Jan 16, 2019

I've tested regenerating the data and it seems to work fine with the following exceptions:

  • admin with user_id 1 was not present in wp_wc_customer_lookup for some reason after regeneration
  • customer_id in wp_wc_order_stats was 0 for all orders made by admin

That's due to 'role' => 'customer' in the WP_User_Query, which I thought was reasonable. Do you think we should create "customers" for all users of a store, or just customers?

  • returning_customer seems to not work correctly

The logic that generates this value wasn't changed - are you OK with opening a new issue for this separate from this PR?

@peterfabian

This comment has been minimized.

Copy link
Contributor

peterfabian commented Jan 16, 2019

That's due to 'role' => 'customer' in the WP_User_Query, which I thought was reasonable. Do you think we should create "customers" for all users of a store, or just customers?

Ah, you're right, of course. But then, when I update the order later, it created the entry for admin user in wp_wc_customer_lookup, so, it's inconsistent.

The logic that generates this value wasn't changed - are you OK with opening a new issue for this separate from this PR?

Sure, I can open a new issue. I just thought it seemed to work ok before and the logic seemed sound, so I wasn't sure if it wasn't related to the way how the data is created here or something else.

@jeffstieler

This comment has been minimized.

Copy link
Contributor Author

jeffstieler commented Jan 16, 2019

when I update the order later, it created the entry for admin user in wp_wc_customer_lookup, so, it's inconsistent.

Ah.. hmm that's a problem. Was the admin user the actual customer placing that order, or did you just update it? If the admin was the customer - what behavior would you expect? How likely of a scenario do you think it is that non-customer role users will place real orders on a store?

@jeffstieler jeffstieler force-pushed the update/860-regenerate-reports-action-scheduler branch from 9c2bd9a to 6c2e4bc Jan 16, 2019

@peterfabian

This comment has been minimized.

Copy link
Contributor

peterfabian commented Jan 16, 2019

The order was originally created by the admin, then updated by adding a private note after the report tables got regenerated, at which point a new entry in the wp_wc_customer_lookup appeared.
Probably not a big problem here, but I'm not sure, I would assume there won't be that many non-customer users on a typical e-shop, so why not include all users in the job? Then we just avoid these headaches.

I'm really not sure how often this would happen.

On a related note, we also need to keep in mind that some admins/store managers can create orders on behalf of customers. Not 100% sure if there is a recommended workflow for that scenario and if those store admins would create new users to submit order on their users' behalf? Again, something that should perhaps be solved in a separate issue.

@jeffstieler

This comment has been minimized.

Copy link
Contributor Author

jeffstieler commented Jan 16, 2019

Probably not a big problem here, but I'm not sure, I would assume there won't be that many non-customer users on a typical e-shop, so why not include all users in the job? Then we just avoid these headaches.

Fair point.. I'll remove the role restriction from the query. 👍

On a related note, we also need to keep in mind that some admins/store managers can create orders on behalf of customers. Not 100% sure if there is a recommended workflow for that scenario and if those store admins would create new users to submit order on their users' behalf? Again, something that should perhaps be solved in a separate issue.

I'll test this flow, but a guest customer should be created so long as the store admin specifies a billing email for the customer when they manually create the order - from the backend of course. If the store admin checks out through the front end the order will be associated with their user.

@jeffstieler

This comment has been minimized.

Copy link
Contributor Author

jeffstieler commented Jan 16, 2019

I'll test this flow

Confirmed - manual order created in the backend with a guest results in a new customer lookup row.

@jeffstieler

This comment has been minimized.

Copy link
Contributor Author

jeffstieler commented Jan 16, 2019

@peterfabian - removed the role restriction in 476c632.

@peterfabian
Copy link
Contributor

peterfabian left a comment

LGTM.

@jeffstieler jeffstieler merged commit d0bb271 into master Jan 16, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jeffstieler jeffstieler added this to 🥋Sprint 11 Done in Isotope via automation Jan 16, 2019

@jeffstieler jeffstieler deleted the update/860-regenerate-reports-action-scheduler branch Jan 16, 2019

@peterfabian peterfabian referenced this pull request Jan 22, 2019

Closed

Updated report regeneration to use Action Scheduler #1050

0 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment