Add locking mechinism class #15967

mikejolley · 2017-07-04T17:15:43Z

Implements a class to handle locking, and includes tests.

This could in theory be used to lock something so other concurrent requests cannot update the same thing until it’s finished.

Getting feedback before moving to implementation.

claudiosanches

LGTM

Just suggested two small tweaks, nothing blocking to be merged.

claudiosanches · 2017-07-04T17:50:39Z

includes/class-wc-database-locks.php

+
+		$lock = $wpdb->get_row( $wpdb->prepare( "SELECT * FROM {$wpdb->prefix}woocommerce_locks WHERE name = %s;", $lock_name ) );
+
+		return $lock ? $lock : false;


Why not null so no need for mixed values?

Only thinking like PHP 7.1.

public static function get_lock( string $lock_name ): ?stdClass {}

justinshreve · 2017-07-04T22:55:13Z

I didn't have a chance to review this today, but I would like to before merge. I'll put it on my list for tomorrow :).

mikejolley · 2017-07-05T15:01:03Z

There is this too which should be merged to this branch first. #15978

justinshreve

A few comments and questions, but overall looking good.

justinshreve · 2017-07-05T20:12:12Z

includes/class-wc-database-locks.php

+	 * @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.
+	 */
+	public static function aquire_lock( $lock_name, $max_wait_timeout = 10 ) {
+		global $wpdb;


$wpdb is not used in this function

justinshreve · 2017-07-05T20:16:10Z

includes/class-wc-database-locks.php

+	public static function get_request_id() {
+		if ( ! self::$request_id ) {
+			self::$request_id = uniqid( wp_rand(), true );
+			add_action( 'shutdown', __CLASS__, 'release_all_locks' );


What is the purpose of tying a request id to these and releasing locks like this? Will we not have use cases where we would want a lock to possibly persist across requests? Is it because we don't trust people to call release_lock when they are done?

I don't think so, I think we should limit a lock to a certain request and do cleanup. This prevents it being permanently locked and scripts timing out?

To prevent something from permanently locking, we would/should have something that runs and looks for expired locks and releases based on that. i.e., on shutdown: release_expired_locks.

If we tie these to a specific request, a default $release_timeout of an hour seems really high.

justinshreve · 2017-07-05T20:18:49Z

includes/class-wc-database-locks.php

+	 *
+	 * @param  string $lock_name The name of this unique lock.
+	 * @param  int $max_wait_timeout The duration in seconds to wait for the lock to be freed.
+	 * @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.


This comment is incorrect.

justinshreve · 2017-07-05T20:29:06Z

includes/class-wc-database-locks.php

+	/**
+	 * Create a lock.
+	 *
+	 * @param  string $lock_name The name of this unique lock.


We should document that there is a length limit on lock names. They would have to look at the db schema to see that right now.

What would you limit this to? I made it up.

Update/paypal update locks

diablodale · 2017-07-28T12:11:06Z

The implementation of locks appears to me to be inadequate. This is alignment with no previous caution that locks need to be implemented by core OS level components to be successful.

For example, your new woocommerce_locks table has 'name' as the primary key and appears to be relying on the SQL engine to enforce key uniqueness. There exists a scenario that when two threads on two peer CPUs run at similar intervals that locks will not be accurately managed. It is the normal gotchas for implementing locks and best left to the volumes of doc on the internet.

Part of the core challenge is that to make adequate locks, one needs to have an atomic check/set. Looking at the current codebase, the set (implemented by the INSERT IGNORE) and related check (IF (! $lock_result)) are not together atomic. You might have more luck creating a more complex bundled transaction including the time/expire management. Still, these problems have long been discussed and solved -- google is the friend on this.

A trivial example is the two threads/processes above. They both run create_lock() at the same time with the same lock_name parameter. They both run the wpdb->query(prepare(INSERT IGNORE)) at exactly the same time. Then the SQL database chooses because it does have atomic OS support. Lets say that P1 got a "true" result from the database and P2 did not.

Next, the thread for P1 is paused for any number of reasons by apache, OS, CPU, etc. Yet, P2 continues and runs the next line of PHP code if ( ! $lock_result ) {.

P2 didn't get a 'true' from the db so it enters the big clause. P2 calls get_lock with the lock_name. That lock does exist because P1 was successful with the INSERT above. FYI, P1 thread is still paused. P2 runs the second if ( ! $lock_result ) {. $lock_result is non-false (because the lock with that name does exist) so it continues past that clause.

Next, P2 thread is paused by apache, os, cpu, etc. Both P1 and P2 are now paused. They are paused long enough that the expiration time passes.

Next, P2 is awakened from its pause (P1 is still paused). P2 runs if ( strtotime( $lock_result->expires ) > time() ) { and that is a false statement...it is expired. So P2 moves down and does the release_lock, then create_lock recursively. The boundless create_lock recursion worries me but lets ignore that for now and assume that in some deep recursion create_lock finally returns "true". And, at the top recursion this P2 thread returns TRUE as its result from create_lock(). This is bad, the lock architecture has failed. The result should be FALSE.

Yet its even worse...remember at the beginning it was P1 that was able to insert the row into the db. It should be P1 that returns TRUE. Our thread P1 is awakened from its slumber and does the first if ( ! $lock_result ) { test. Since it did get a row from the INSERT IGNORE, it skips the big clause and returns true. That means that both P1 and P2 have returned TRUE. The lock architecture has failed.

I do not have a solution for this. Only that the solution is non-trivial and best solved by core components who can guarantee atomic operation.

Next, feedback on the use of locks. I am unsure that this adequately addresses the pains people are experiencing. If the pain is PDT/IPN duplicates, then change the payment gateways so that duplicates are not possible. For example, changing the PayPal gateway to have a radio "either-or...not both" possibility for IPN and PDT. That will address the specific pain more quickly than lock architectures.

When using locks, I believe it important to consider what is being locked. For this scenario, is the lock on the order? On the stock? On the order and the stock? A new example is P1 is an admin in the admin UI. P2 is a customer making a paypal payment. At the same time:

the admin in P1 uses the admin UI to change the order to some non-payment complete state, e.g. cancelled.
the customer makes a payment, paypal sends an IPN payment complete, and P2 receives that payment complete.
Now we have a race condition for what the status will be, what the stock will be, etc.
In that scenario, we need a lock on the order and the stock so that the state and stock can be correctly coordinated for each independent thread. Otherwise, due to slight differences in thread execution, it is possible for P1 to make the state cancelled and P2 to reduce the stock.

mikejolley · 2017-07-28T15:41:38Z

@diablodale PDT is a fallback for IPN and vice versa. Thats why you'd enable both.

Yet its even worse...remember at the beginning it was P1 that was able to insert the row into the db. It should be P1 that returns TRUE. Our thread P1 is awakened from its slumber and does the first if ( ! $lock_result ) { test. Since it did get a row from the INSERT IGNORE, it skips the big clause and returns true. That means that both P1 and P2 have returned TRUE. The lock architecture has failed.

If both return true, but 1 is delayed until the lock is release, why exactly is this a problem?

I've modelled this class after the one in Drupal FWIW.

diablodale · 2017-07-31T11:23:04Z

Please keep in mind that I'm coming with good intentions. I have a monetary invested interest that Woo is reliable and performant.

Mmm, what you write is interesting in that you have a want/need of backups, yet that is not what IPN and PDT are to each other. IPN and PDT have significantly different feature sets and only one feature overlaps. Here is a nice overview by PayPal. (They've significantly improved their docs since I last deeply read them 4 years ago). https://developer.paypal.com/docs/classic/ipn/integration-guide/IPNPDTAnAlternativetoIPN/

PayPal strongly cautions that a site could implement both IPN and PDT, however it will cause problems that need to be managed. These are the known problems causing Woo pain.

Personally, I disabled PDN because I can wait 1 second. IPN is fast. How fast? Fast enough that it is overlapping with PDT. There is no benefit for locks or for PDT in these cases. Using only IPN solves the stock level problem and the duplicate email problem. IPN is designed to be more reliable than PDT (see the link above). In the cases where IPN is slower (1 minute) or 1 hour (needs 2nd level review by PayPal staff -or- different currencies without automatic convert), IPN will work while PDT will always fail. IPN wins repeatedly.

From what I read in the pain expressed by Woo customers and my understanding of IPN/PDT, there is a very small set of customers who need PDT. It is only those customers that... sell a virtual product + need stock management of that virtual product + must have immediate yes/no paid + don't care about the reliability of that yes/no paid. For example, it is a Woo customer that sells a PDF download + that PDF is limited edition of only 100 downloads + must provide the download link immediately (not in followup emails) + doesn't care when payment delayed/fails + puts the responsibility on the customer to retry delayed/failed payments. That last part is important...IPN does the retrying automatically making a nice customer experience. PDT does not retry and can result in lost sales if the customer doesn't want to go through the process again.

I'm not suggesting to have only IPN. Instead, the expectation of the two and what they actually are by PayPal's design may have been misunderstood in the past. A pause to consider this possibility and how that led to the pains some Woo customers are experiencing could lead to a easy to implement solution (e.g. turn off PDT) that is already well tested for years. It seems the most vocal pain that lead to this lock idea was stock management. Stock management usually isn't virtual; they aren't downloads. These customers don't need the only feature which is unique to PDT; that is, the immediately yes/no. I suggest these customers can wait 0.5 seconds for IPN to respond. Or a minute so a reliable payment can be made. If it doesn't, PayPal itself has problems which is a separate issue and PayPal is best enabled to resolve with their own support processes.

You ask "If both return true...". It is a problem because locks must be reliable and return accurate true/false results. That is the entire reason to exist for locks. Woo is at the user level and can not control thread execution; P1 and P2 of my example can enter the lock code and exit the lock code at the same time. Yet, inside the lock code run in uncontrollable sequence of staggered pauses as I described in my example. This is proven OS design for decades. From your questioning, you might have a knowledge gap in this specific area; no one can be everything everywhere :-). I can only point you to decades of docs on the Internet and my example demonstrating the failed code. Feel free to take my example to your local university professor. And just because Drupal did it, doesn't make it correct. ;-) See every bug everywhere to prove bugs and flawed design are copied between projects.

Have you also consider the cost? A high cost to deliver a feature that isn't reliable isn't a good direction for money generating systems like Woo. The high cost is in the number of roundtrip database accesses of this code implementation. It is very high:

a loop in a[c]quire_lock calling create_lock multiple times
a infinite recursion loop in create_lock with each level of recursion 3 roundtrips to db

This means it is an infinite loop in a loop with each loop in a loop hitting the database 3 times. This is always a red flag for design.

I still think using IPN and not PDT is the way to go for the majority of Woo customers experiencing the pain of stock mgmt w/ PayPal. Using only IPN is a solution tested for multiple years and works. The people that have a focused need for PDN are the non-vocal subset I described above. For them, I would recommend changing the Woo PayPal gateway itself to allow IPN or PDT but not both.

If you were to introduce this codebase, you will likely shift the problem. There will be some set of customers that still see the stock issues and duplicate emails; or maybe they see double failures. And it won't be all the time, only some times. All of this because the lock implementation isn't reliable, behaves differently due to timing bugs, and all the while increases database load by an infinite number of seconds.

There is still the separate issue of using locks. To discuss that, I have to pretend the lock implementation is correct and performant. In the WC_Order code, you a[c]quire a lock on line 114. Next, the real world occurs...on any line 115-135 this PHP thread halts. It halts because of some apache problem, OS problem, db hickup, network glitch, memory problem, action/filter/hook problem, stray gamma ray (it's a real thing), etc. That means the order is locked for one hour yet the status may not be marked complete/processing, payment_complete may not have been called, trans id and dates maybe/not marked, etc. There is no reasonable way a customer or an admin can fix this until the timeout expires in one hour. This is not a good scenario. And with an inadequate lock implementation it becomes a double-negative.

I thought about all this over the weekend to see if I could think of alternative suggestions. I keep coming back to the same conclusions. To remove the source of the problem (don't use dual IPN and PDT) rather than incorrectly use an inadequate lock implementation.

FYI, you have a typo throughout. It is acquire. Not aquire.

mikejolley · 2017-09-28T14:45:57Z

Closed in favour of a queue #16971

dyszczo · 2018-05-09T20:00:10Z

includes/class-wc-database-locks.php

+	 * @param  int $max_wait_timeout The duration in seconds to wait for the lock to be freed.
+	 * @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.
+	 */
+	public static function aquire_lock( $lock_name, $max_wait_timeout = 10 ) {


I think I see a typo here. It should be acquire

This PR was not merged.

mikejolley added 4 commits July 4, 2017 17:26

Basic WC_Database_Locks class

421ecbd

Installer and uninstaller

5289e2e

Tests

ad41098

passing tests

1378b41

mikejolley requested review from justinshreve and claudiosanches July 4, 2017 17:16

mikejolley added this to the 3.3.0 milestone Jul 4, 2017

mikejolley added the Status: Needs Review label Jul 4, 2017

claudiosanches approved these changes Jul 4, 2017

View reviewed changes

mikejolley added 3 commits July 5, 2017 14:12

address feedback

58da18e

Callback fixes

c0e3f17

Add lock to payment complete method

5f74af6

justinshreve reviewed Jul 5, 2017

View reviewed changes

woocommerce deleted a comment from claudiosanches Jul 6, 2017

mikejolley and others added 2 commits July 6, 2017 15:19

return false; if lock was not aquired

cb24197

Merge pull request #15978 from woocommerce/update/paypal-update-locks

afab2d2

Update/paypal update locks

mikejolley mentioned this pull request Jul 27, 2017

Implement locking mechanism for data updates to work around race conditions #15780

Closed

2 tasks

mikejolley added status: in progress This is being worked on. and removed Status: Needs Review labels Aug 9, 2017

mikejolley mentioned this pull request Sep 28, 2017

Create an event queue for important, must run events #16971

Closed

2 tasks

mikejolley closed this Sep 28, 2017

dyszczo reviewed May 9, 2018

View reviewed changes

claudiosanches deleted the update/15780-locking-mechinism branch August 2, 2019 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add locking mechinism class #15967

Add locking mechinism class #15967

mikejolley commented Jul 4, 2017

claudiosanches left a comment

claudiosanches Jul 4, 2017

justinshreve commented Jul 4, 2017

mikejolley commented Jul 5, 2017

justinshreve left a comment

justinshreve Jul 5, 2017

justinshreve Jul 5, 2017

mikejolley Jul 6, 2017

justinshreve Jul 7, 2017

justinshreve Jul 5, 2017

justinshreve Jul 5, 2017

mikejolley Jul 6, 2017

diablodale commented Jul 28, 2017

mikejolley commented Jul 28, 2017

diablodale commented Jul 31, 2017 •

edited

mikejolley commented Sep 28, 2017

dyszczo May 9, 2018

mikejolley May 10, 2018


		$lock = $wpdb->get_row( $wpdb->prepare( "SELECT * FROM {$wpdb->prefix}woocommerce_locks WHERE name = %s;", $lock_name ) );

		return $lock ? $lock : false;

Add locking mechinism class #15967

Add locking mechinism class #15967

Conversation

mikejolley commented Jul 4, 2017

claudiosanches left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinshreve commented Jul 4, 2017

mikejolley commented Jul 5, 2017

justinshreve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diablodale commented Jul 28, 2017

mikejolley commented Jul 28, 2017

diablodale commented Jul 31, 2017 • edited

mikejolley commented Sep 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diablodale commented Jul 31, 2017 •

edited