Limit numbers of pending RUCSS jobs to avoid being late on fetching the result and hence getting a 400 Bad Json #6213

MathieuLamiot · 2023-10-10T10:47:08Z

Context
Several users with an high number of pages on their websites reported issues with RUCSS feature, resulting in "400: Bad JSON".
Some investigations showed that, when starting the process of RUCSS optimization on the plugin for the whole website, the first URLs were correctly optimized but at some point, all jobs started to return 400: Bad JSON. This seem linked to the fact that the website does not manage to fetch all jobs in the expected schedule: too many actions to fecth results are planned and a good part of them take place to late, when the job is stale on the server side.

Expected behavior
The plugin should not send new jobs if it is not capable of fetching them back in the appropriate time window. Therefore, we should keep the number of pending jobs under control:

If the website is fast enough, jobs start in pending and are quickly completed, allowing for more jobs to be created immediately.
If the website is slow, jobs start in pending and we wait for them to be completed. There is no use putting more jobs in the queue as the website won't be able to fetch them in time anyway.
The maximum number of pending + in progress job must be configurable with a filter for instance. With a special value (for instance 0?) the configuration means that the limit must be disable (come back to the current state).

Acceptance Criteria

No regression on the RUCSS feature. Especially in case of 408: the retry job must be created no matter what.
Emulate a website that never tries to fetch results. Start RUCSS on all website. The number of pending jobs must stop at the given threshold.
Then, complete and/or fail one job. A new job must be created for a new page.
The filter allows to change the number of pending jobs authorized, and also to remove the limitation.

MathieuLamiot · 2023-10-10T10:48:24Z

From @wordpressfan: (Slack)

Let's think about this, we need a way to stop sending the request of registering a new job when we have enough pending jobs (we can define the word enough later)
to do that we can easily create a new filter here:

wp-rocket/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php

Line 269 in 06381b8

public function add_url_to_the_queue( string $url, bool $is_mobile ) {

As a switch to enable/disable registering new jobs from the plugin, then we can add a callback to change this switch based on the number of pending+inprogress jobs so for example we get 100 pending rows to process every minute, then we need to stop registering new jobs once we reach 500 rows (pending+inprogress)

MathieuLamiot · 2023-10-10T10:56:28Z

@wordpressfan I agree with the approach. However, it should not impact the 408 retry mechanism here
What do you think of adding an optional argument force to add_url_to_the_queue, set to False by default.
And in the method add_url_to_the_queue:

if ( force == False ) {
   $max_pending_jobs = apply_filters( 'rocket_rucss_max_pending_jobs', MAGIC_CONSTANT )
   if ( 0 < $max_pending_jobs) {
      $nb_pending_jobs = count_pending_jobs()
      if ( nb_pending_jobs >= max_pending_jobs) {
         return false;
      }
   }
}

Where MAGIC_CONSTANT is to be defined (> 100 not to block the default batches, and not too big to avoid the problem we currently have. Maybe 300?)
And where count_pending_jobs is just a query on the RUCSS DB.

@piotrbak, what do you think in terms of product impact?

MathieuLamiot · 2023-10-10T14:16:13Z

@wordpressfan @vmanthos what was discussed during the daily is that we could replace MAGIC_CONSTANT in the above suggestion with a calculated value?
The one that could be relevant would be this one: n * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 )
What do you think?
This way, we would scale how many jobs we send based on how many jobs we are trying to collect.

I would suggest n to be ~3 (it has to be more than 2 and not to big...)
So something like:

$max_pending_jobs = apply_filters('rocket_rucss_max_pending_jobs', 3 * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 ) )

(And please, let's not hard code 3 and 100 but use (shared) constants instead? ; maybe put the inner apply_filter in a method as we use it at different places in the codebase)

MathieuLamiot · 2023-10-10T18:07:23Z

I'm just worried about hammering the DB with the COUNT query every time a job is added 🤔
It's not a costly query and the DB is small... I don't really know

wordpressfan · 2023-10-11T07:20:40Z

Yes it's a small query and the table is indexed well so I believe we are fine but this is a good concern, thanks for discussing it.

The big picture that I was thinking of last night is:
What if we made the process of registering the new job is happening through action scheduler and CRON too?
So the process will be as follows:

The not cached page is visited for the first time, we will create a row in RUCSS table with the status pending_create for example and in this step we will not send the RUCSS SaaS job creation request.
Will have a CRON to get some of those pending_create rows and create async job in action scheduler to create them.
Once the job is created, the status will be pending and get into the normal process.

This will give us the control of how many jobs are created at once and not to miss any job but also will add a layer of complexity.

wordpressfan · 2023-10-11T11:02:05Z

Reproduce the issue

Yes I can see the issue in one of our customer's site.

Root cause

Jobs are kept in plugin's database even when they are removed from SaaS side so every check status after that time will fail because SaaS at this point returns empty response.

Scope a solution

As We mentioned above we need to add a new status pending_create and stop registering jobs into SaaS directly.

Here

wp-rocket/inc/Engine/Optimization/RUCSS/Database/Queries/UsedCSS.php

Lines 164 to 179 in 8f94841

    
           public function create_new_job( string $url, string $job_id, string $queue_name, bool $is_mobile = false ) { 
        
           	if ( ! self::$table_exists && ! $this->table_exists() ) { 
        
           		return false; 
        
           	} 
        
           	$item = [ 
        
           		'url'           => untrailingslashit( $url ), 
        
           		'is_mobile'     => $is_mobile, 
        
           		'job_id'        => $job_id, 
        
           		'queue_name'    => $queue_name, 
        
           		'status'        => 'pending', 
        
           		'retries'       => 0, 
        
           		'last_accessed' => current_time( 'mysql', true ), 
        
           	]; 
        
           	return $this->add_item( $item ); 
        
           }

we need to make $job_id and $queue_name optional arguments, also change the status here from pending to pending_create

Make $job_id argument optional here with default value as empty string:

wp-rocket/inc/Engine/Optimization/RUCSS/Database/Queries/UsedCSS.php

Lines 209 to 225 in 8f94841

    
           public function reset_job( int $id, string $job_id ) { 
        
           	if ( ! self::$table_exists && ! $this->table_exists() ) { 
        
           		return false; 
        
           	} 
        
           	return $this->update_item( 
        
           		$id, 
        
           		[ 
        
           			'job_id'        => $job_id, 
        
           			'status'        => 'pending', 
        
           			'error_code'    => '', 
        
           			'error_message' => '', 
        
           			'retries'       => 0, 
        
           			'modified'      => current_time( 'mysql', true ), 
        
           		] 
        
           	); 
        
           }

And change the status to pending_create.

Remove this part of code from here

wp-rocket/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php

Lines 219 to 236 in 06381b8

    
           			if ( false === $add_to_queue_response || ! isset( $add_to_queue_response['contents'], $add_to_queue_response['contents']['jobId'], $add_to_queue_response['contents']['queueName'] ) ) { 
        
           				return $html; 
        
           			} 
        
           			/** 
        
           			 * Lock preload URL. 
        
           			 * 
        
           			 * @param string $url URL to lock 
        
           			 */ 
        
           			do_action( 'rocket_preload_lock_url', $url ); 
        
           			// We got jobid and queue name so save them into the DB and change status to be pending. 
        
           			$this->used_css_query->create_new_job( 
        
           				$url, 
        
           				$add_to_queue_response['contents']['jobId'], 
        
           				$add_to_queue_response['contents']['queueName'], 
        
           				$is_mobile 
        
           			);

Change the functionality of this method add_url_to_the_queue to only:

$used_css_row = $this->used_css_query->get_row( $url, $is_mobile );
if ( empty ( $used_css_row ) ) {
    $this->used_css_query->create_new_job( $url, '', '', $is_mobile );
} else {
    $this->used_css_query->reset_job( $used_css_row->id );
}

Remove this part:

wp-rocket/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php

Lines 684 to 687 in 06381b8

    
           if ( false !== $add_to_queue_response ) { 
        
           	$new_job_id = $add_to_queue_response['contents']['jobId']; 
        
           	$this->used_css_query->update_job_id( $id, $new_job_id ); 
        
           }

Remove this part:

wp-rocket/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php

Lines 866 to 869 in 06381b8

    
           if ( false !== $add_to_queue_response ) { 
        
           	$new_job_id = $add_to_queue_response['contents']['jobId']; 
        
           	$this->used_css_query->reset_job( $id, $new_job_id ); 
        
           }

For the CRON we can use the current CRON rocket_rucss_pending_jobs to process also pending_create jobs by adding a new callback here:

wp-rocket/inc/Engine/Optimization/RUCSS/Cron/Subscriber.php

Line 44 in e0c2e71

'rocket_rucss_pending_jobs' => 'process_pending_jobs',
Here: https://github.com/wp-media/wp-rocket/blob/06381b8860eb46e570911b2812652f54a676dd6d/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php#L871-L870
Create a new method process_pending_create_jobs that will be called exactly like process_pending_jobs
In this new method process_pending_create_jobs we will get rows with pending_create status and do the same functionality we already do inside process_pending_jobs with a filter to change the number of rows to handle, with a default value (we need to agree on that default value because we only need to keep the number of pending rows below a specific number and use this max_pending_count to decide the number of pending_create to process)
Here we are between two approaches of creating async action scheduler's task for each row to call SaaS to create the job.

Effort

[M] or [L]

@CrochetFeve0251 @wp-media/php to review plz.

MathieuLamiot · 2023-10-11T12:06:02Z

Looks good overall. Some comments:
On point 1:

we need to make $job_id and $queue_name optional arguments -> Let's just remove those arguments in the method, and pass an empty string to the DB, no?
To ease discussion, let's not re-use pending in the status name. Maybe 'to_submit'?

Point 2: Same here, let's just remove the job_id argument.

Point 3 & 4:
Here again, for naming clarity we should properly name methods to avoid confusion.
If I recap your suggestion, my understanding would be:

public function prepare_job( $url, $is_mobile ) {
    $used_css_row = $this->used_css_query->get_row( $url, $is_mobile );
    if ( empty ( $used_css_row ) ) {
        $this->used_css_query->create_new_job( $url, '', '', $is_mobile );
    } else {
        $this->used_css_query->reset_job( $used_css_row->id );
    }
}

public function send_job( $used_css_row ) {
$url = $used_css_row->url;
$is_mobile = $used_css_row->is_mobile; 
// CURRENT CONTENT OF add_url_to_the_queue
 /**
		 * Filters the RUCSS safelist
		 *
		 * @since 3.11
		 *
		 * @param array $safelist Array of safelist values.
		 */
		$safelist = apply_filters( 'rocket_rucss_safelist', $this->options->get( 'remove_unused_css_safelist', [] ) );

		/**
		 * Filters the styles attributes to be skipped (blocked) by RUCSS.
		 *
		 * @since 3.14
		 *
		 * @param array $skipped_attr Array of safelist values.
		 */
		$skipped_attr = apply_filters( 'rocket_rucss_skip_styles_with_attr', [] );
		$skipped_attr = ( is_array( $skipped_attr ) ) ? $skipped_attr : [];

		$config = [
			'treeshake'      => 1,
			'rucss_safelist' => $safelist,
			'skip_attr'      => $skipped_attr,
			'is_mobile'      => $is_mobile,
			'is_home'        => $this->is_home( $url ),
		];

		$add_to_queue_response = $this->api->add_to_queue( $url, $config );
		if ( 200 !== $add_to_queue_response['code'] ) {
			Logger::error(
				'Error when contacting the RUCSS API.',
				[
					'rucss error',
					'url'     => $url,
					'code'    => $add_to_queue_response['code'],
					'message' => $add_to_queue_response['message'],
				]
			);

			return false;
		}
// END OF CURRENT CONTENT OF add_url_to_the_queue

			/**
			 * Lock preload URL.
			 *
			 * @param string $url URL to lock
			 */
			do_action( 'rocket_preload_lock_url', $url );

			// We got jobid and queue name so save them into the DB and change status to be pending.
			$this->used_css_query->add_job_reference(
				$add_to_queue_response['contents']['jobId'],
				$add_to_queue_response['contents']['queueName'],
			);

add_job_reference has to be created.
We can delete add_url_to_the_queue and replace occurrences with prepare_job

in the if ( empty( $used_css ) ) { we would just call prepare_job.

Point 5: Make sure to add do_action( 'rocket_preload_unlock_url', $row_details->url ); after calling prepare_job: the job will not be immediately sent to the server, so we should unlock preload in the meantime.

Point 9: We can use this limit for the number of "to_submit" jobs:
$max_pending_jobs = apply_filters('rocket_rucss_max_pending_jobs', 3 * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 ) )

Point 10: Do you foresee an issue in using a for loop to go through the (up to) max_pending_jobs RUL and making the API calls? for fetching the results, we use async tasks for each API call. Is there a reason for this approach?

CrochetFeve0251 · 2023-10-11T12:26:15Z

@wordpressfan The grooming looks good but there is one point I am not sure about is the rocket_preload_lock_url filter disapearing. Why are we sure we do not need to lock the URLs for the preload anymore?

Another thing this issue is working on the exact same methods with different AC. I guess we need some communication to not end up with monster conflicts.

MathieuLamiot · 2023-10-11T12:34:03Z

rocket_preload_lock_url I think it should be kept and used in the CRON job part, when actually sending the job to the SaaS. that could be in the send_job method as suggested above

wordpressfan · 2023-10-11T14:16:58Z

After a discussion with @MathieuLamiot and @CrochetFeve0251

Yes we will keep the rocket_preload_lock_url action but will move it from its current place to be in the method that will run for each url after sending the job to SaaS.
For the action scheduler part we need to do the same functionality we do now for pending rows, so we will create Async job for every database row to send the request to SaaS.

MathieuLamiot · 2023-10-19T09:36:41Z

@CrochetFeve0251 Can you provide a documentation for the new submission system and the related filters, so that support can learn how to use them? Thanks!

piotrbak added type: enhancement Improvements that slightly enhance existing functionality and are fast to implement priority: high Issues which should be resolved as quickly as possible needs: grooming module: remove unused css labels Oct 10, 2023

CrochetFeve0251 added effort: [M] 3-5 days of estimated development time and removed needs: grooming labels Oct 11, 2023

CrochetFeve0251 self-assigned this Oct 12, 2023

CrochetFeve0251 mentioned this issue Oct 17, 2023

Enhancement/6213 prevent 400 #6230

Merged

8 tasks

MathieuLamiot added this to the 3.15.3 milestone Oct 18, 2023

vmanthos closed this as completed in #6230 Oct 23, 2023

piotrbak added the needs: documentation Issues which need to create or update a documentation label Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit numbers of pending RUCSS jobs to avoid being late on fetching the result and hence getting a 400 Bad Json #6213

Limit numbers of pending RUCSS jobs to avoid being late on fetching the result and hence getting a 400 Bad Json #6213

MathieuLamiot commented Oct 10, 2023

MathieuLamiot commented Oct 10, 2023

MathieuLamiot commented Oct 10, 2023 •

edited

Loading

MathieuLamiot commented Oct 10, 2023 •

edited

Loading

MathieuLamiot commented Oct 10, 2023 •

edited

Loading

wordpressfan commented Oct 11, 2023

wordpressfan commented Oct 11, 2023

MathieuLamiot commented Oct 11, 2023

CrochetFeve0251 commented Oct 11, 2023

MathieuLamiot commented Oct 11, 2023

wordpressfan commented Oct 11, 2023

MathieuLamiot commented Oct 19, 2023

Limit numbers of pending RUCSS jobs to avoid being late on fetching the result and hence getting a 400 Bad Json #6213

Limit numbers of pending RUCSS jobs to avoid being late on fetching the result and hence getting a 400 Bad Json #6213

Comments

MathieuLamiot commented Oct 10, 2023

MathieuLamiot commented Oct 10, 2023

MathieuLamiot commented Oct 10, 2023 • edited Loading

MathieuLamiot commented Oct 10, 2023 • edited Loading

MathieuLamiot commented Oct 10, 2023 • edited Loading

wordpressfan commented Oct 11, 2023

wordpressfan commented Oct 11, 2023

Reproduce the issue

Root cause

Scope a solution

Effort

MathieuLamiot commented Oct 11, 2023

CrochetFeve0251 commented Oct 11, 2023

MathieuLamiot commented Oct 11, 2023

wordpressfan commented Oct 11, 2023

MathieuLamiot commented Oct 19, 2023

MathieuLamiot commented Oct 10, 2023 •

edited

Loading

MathieuLamiot commented Oct 10, 2023 •

edited

Loading

MathieuLamiot commented Oct 10, 2023 •

edited

Loading