Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling the failed preload items #6058

Closed
4 tasks
piotrbak opened this issue Jul 19, 2023 · 4 comments · Fixed by #6084
Closed
4 tasks

Improve handling the failed preload items #6058

piotrbak opened this issue Jul 19, 2023 · 4 comments · Fixed by #6084
Assignees
Labels
effort: [S] 1-2 days of estimated development time module: preload priority: medium Issues which are important, but no one will go out of business. type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Milestone

Comments

@piotrbak
Copy link
Contributor

piotrbak commented Jul 19, 2023

Before submitting an issue please check that you’ve completed the following steps:

  • Made sure you’re on the latest version
  • Used the search feature to ensure that the bug hasn’t been reported before

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Set cron to run each 10 minutes
  2. Set the batch to be bigger, let's say 100 to make problem more visible
  3. Run the preload on the website with at least 200 URLs
  4. 100 Actions are added to the AS, 100 URLs are set to in-progress
  5. Some URLs are preloaded, the others are waiting for the next CRON run
  6. Wait 10 minutes, so the cron can pick up the next batch
  7. The remaining URLs (in-progress older than 100/15 minutes) in the database are set to failed while corresponding async actions are still in the AS
  8. We're setting to in-progress another 100 URLs, and registering another 100 Actions

Expected behavior
We should avoid exceeding number of Preload Actions in the AS defined by the batch size filter.

I'm not sure what would be the best solution here, I can think about some options, but further R&D is needed:

1 Check the number of in-progress items that we have in the queue before setting old ones to the failed status

  • The problem here is that after more than 2 iterations, we might end up in the same situation

2 Set the failed status only for URLs that doesn't have corresponding Preload Action registered in the AS

  • This one most likely will be resource consuming, but in terms of quality it's the best I can imagine

3 Our check about in-progress items could also take into the consideration the number of registered AS actions related to Preload.

Additional context
This will help with issues related to the preload affecting the database size:
#6052
#6057

Backlog Grooming (for WP Media dev team use only)

  • Reproduce the problem
  • Identify the root cause
  • Scope a solution
  • Estimate the effort
@piotrbak piotrbak added type: enhancement Improvements that slightly enhance existing functionality and are fast to implement module: preload priority: medium Issues which are important, but no one will go out of business. needs: r&d Needs research and development (R&D) before a solution can be proposed and scoped. labels Jul 19, 2023
@piotrbak
Copy link
Contributor Author

piotrbak commented Jul 25, 2023

The main objective is:

  • Avoid as much as possible exceeding number of Preload Actions in the AS defined by the batch size filter.

Acceptance Criteria based on thread.

  1. When clearing the whole cache while having some in-progress items, the next cron iteration should not add more AS actions than specified by batch size fitler.
  2. If the item is in-progress for more than 100/15 minutes) but there's a Preload AS action registered related to this URL, it should not be set to failed
  3. If the item is in-progress for more than 100/15 minutes) and there's no Preload AS action registered related to this URL, it should be set to failed

@piotrbak piotrbak removed the needs: r&d Needs research and development (R&D) before a solution can be proposed and scoped. label Jul 27, 2023
@CrochetFeve0251
Copy link
Contributor

Reproduce the problem

The issue is reproducible by disabling the AS executions and loading multiple batches.

Identify the root cause

The root cause from that issue is that we are not checking the AS queue status.

Scope a solution

To solve this issue we will have to create a new method get_waiting_actions on the Queue class that returns all AS actions that are pending:

public function get_waiting_actions() {
 return $this->search(
			[
				'hook'   => 'rocket_preload_job_preload_url',
				'status' => ActionScheduler_Store::STATUS_PENDING,
			]
		);
]

Then using that method inside the method process_pending_jobs from PreloadUrl class to fetch pending actions.
Then from there we could filter the URLs from $stuck_rows which are inside pending actions to remove them which will solve the first issue.
Then we could count the number of actions inside the AS queue by checking the size from that array.
We could then subtract that number to the $count variable which will make sure we won't overpass the number of elements inside the AS queue.

Then we should add new tests for the new method and adapt tests from process_pending_jobs.

Estimate the effort

Effort S

@CrochetFeve0251 CrochetFeve0251 added the effort: [S] 1-2 days of estimated development time label Jul 31, 2023
@Tabrisrp
Copy link
Contributor

Looks good

@vmanthos
Copy link
Contributor

vmanthos commented Aug 3, 2023

@CrochetFeve0251

It seems that the first acceptance criterion isn't met.

  1. Clear the whole cache.
  2. Using WP Crontrol trigger the rocket_preload_process_pending and then the action_scheduler_run_queue event.
  3. Monitor the Action Scheduler (AS) table:
SELECT * FROM `wp_actionscheduler_actions` WHERE `hook` = 'rocket_preload_job_preload_url' AND `status` != 'complete' LIMIT 50
  1. After a minute or so, and while there are still items in the AS table, repeat step 2.

The whole batch of URLs will be added to the AS table. For example, if the batch size is 100 and there were 45 jobs there you'll have a total of 145.


About the second acceptance criterion, in the following screenshot you'll see that the status for the post-57 is set to failed (left half) despite the fact its status is pending in the AS table (right half).
image

Can you please look into these? 🙏

@piotrbak piotrbak added this to the 3.14.4 milestone Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: [S] 1-2 days of estimated development time module: preload priority: medium Issues which are important, but no one will go out of business. type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants