Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locking mechinism class #15967

Closed
wants to merge 9 commits into from
Closed

Conversation

mikejolley
Copy link
Member

Part of #15780

Implements a class to handle locking, and includes tests.

This could in theory be used to lock something so other concurrent requests cannot update the same thing until it’s finished.

Getting feedback before moving to implementation.

Copy link
Contributor

@claudiosanches claudiosanches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just suggested two small tweaks, nothing blocking to be merged.


$lock = $wpdb->get_row( $wpdb->prepare( "SELECT * FROM {$wpdb->prefix}woocommerce_locks WHERE name = %s;", $lock_name ) );

return $lock ? $lock : false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not null so no need for mixed values?

Only thinking like PHP 7.1.

public static function get_lock( string $lock_name ): ?stdClass {}

@justinshreve
Copy link
Contributor

I didn't have a chance to review this today, but I would like to before merge. I'll put it on my list for tomorrow :).

@mikejolley
Copy link
Member Author

There is this too which should be merged to this branch first. #15978

Copy link
Contributor

@justinshreve justinshreve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and questions, but overall looking good.

* @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.
*/
public static function aquire_lock( $lock_name, $max_wait_timeout = 10 ) {
global $wpdb;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$wpdb is not used in this function

public static function get_request_id() {
if ( ! self::$request_id ) {
self::$request_id = uniqid( wp_rand(), true );
add_action( 'shutdown', __CLASS__, 'release_all_locks' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of tying a request id to these and releasing locks like this? Will we not have use cases where we would want a lock to possibly persist across requests? Is it because we don't trust people to call release_lock when they are done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I think we should limit a lock to a certain request and do cleanup. This prevents it being permanently locked and scripts timing out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent something from permanently locking, we would/should have something that runs and looks for expired locks and releases based on that. i.e., on shutdown: release_expired_locks.

If we tie these to a specific request, a default $release_timeout of an hour seems really high.

*
* @param string $lock_name The name of this unique lock.
* @param int $max_wait_timeout The duration in seconds to wait for the lock to be freed.
* @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incorrect.

/**
* Create a lock.
*
* @param string $lock_name The name of this unique lock.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document that there is a length limit on lock names. They would have to look at the db schema to see that right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you limit this to? I made it up.

@woocommerce woocommerce deleted a comment from claudiosanches Jul 6, 2017
@diablodale
Copy link
Contributor

The implementation of locks appears to me to be inadequate. This is alignment with no previous caution that locks need to be implemented by core OS level components to be successful.

For example, your new woocommerce_locks table has 'name' as the primary key and appears to be relying on the SQL engine to enforce key uniqueness. There exists a scenario that when two threads on two peer CPUs run at similar intervals that locks will not be accurately managed. It is the normal gotchas for implementing locks and best left to the volumes of doc on the internet.

Part of the core challenge is that to make adequate locks, one needs to have an atomic check/set. Looking at the current codebase, the set (implemented by the INSERT IGNORE) and related check (IF (! $lock_result)) are not together atomic. You might have more luck creating a more complex bundled transaction including the time/expire management. Still, these problems have long been discussed and solved -- google is the friend on this.

A trivial example is the two threads/processes above. They both run create_lock() at the same time with the same lock_name parameter. They both run the wpdb->query(prepare(INSERT IGNORE)) at exactly the same time. Then the SQL database chooses because it does have atomic OS support. Lets say that P1 got a "true" result from the database and P2 did not.

Next, the thread for P1 is paused for any number of reasons by apache, OS, CPU, etc. Yet, P2 continues and runs the next line of PHP code if ( ! $lock_result ) {.

P2 didn't get a 'true' from the db so it enters the big clause. P2 calls get_lock with the lock_name. That lock does exist because P1 was successful with the INSERT above. FYI, P1 thread is still paused. P2 runs the second if ( ! $lock_result ) {. $lock_result is non-false (because the lock with that name does exist) so it continues past that clause.

Next, P2 thread is paused by apache, os, cpu, etc. Both P1 and P2 are now paused. They are paused long enough that the expiration time passes.

Next, P2 is awakened from its pause (P1 is still paused). P2 runs if ( strtotime( $lock_result->expires ) > time() ) { and that is a false statement...it is expired. So P2 moves down and does the release_lock, then create_lock recursively. The boundless create_lock recursion worries me but lets ignore that for now and assume that in some deep recursion create_lock finally returns "true". And, at the top recursion this P2 thread returns TRUE as its result from create_lock(). This is bad, the lock architecture has failed. The result should be FALSE.

Yet its even worse...remember at the beginning it was P1 that was able to insert the row into the db. It should be P1 that returns TRUE. Our thread P1 is awakened from its slumber and does the first if ( ! $lock_result ) { test. Since it did get a row from the INSERT IGNORE, it skips the big clause and returns true. That means that both P1 and P2 have returned TRUE. The lock architecture has failed.

I do not have a solution for this. Only that the solution is non-trivial and best solved by core components who can guarantee atomic operation.

Next, feedback on the use of locks. I am unsure that this adequately addresses the pains people are experiencing. If the pain is PDT/IPN duplicates, then change the payment gateways so that duplicates are not possible. For example, changing the PayPal gateway to have a radio "either-or...not both" possibility for IPN and PDT. That will address the specific pain more quickly than lock architectures.

When using locks, I believe it important to consider what is being locked. For this scenario, is the lock on the order? On the stock? On the order and the stock? A new example is P1 is an admin in the admin UI. P2 is a customer making a paypal payment. At the same time:

  • the admin in P1 uses the admin UI to change the order to some non-payment complete state, e.g. cancelled.
  • the customer makes a payment, paypal sends an IPN payment complete, and P2 receives that payment complete.
    Now we have a race condition for what the status will be, what the stock will be, etc.
    In that scenario, we need a lock on the order and the stock so that the state and stock can be correctly coordinated for each independent thread. Otherwise, due to slight differences in thread execution, it is possible for P1 to make the state cancelled and P2 to reduce the stock.

@mikejolley
Copy link
Member Author

@diablodale PDT is a fallback for IPN and vice versa. Thats why you'd enable both.

Yet its even worse...remember at the beginning it was P1 that was able to insert the row into the db. It should be P1 that returns TRUE. Our thread P1 is awakened from its slumber and does the first if ( ! $lock_result ) { test. Since it did get a row from the INSERT IGNORE, it skips the big clause and returns true. That means that both P1 and P2 have returned TRUE. The lock architecture has failed.

If both return true, but 1 is delayed until the lock is release, why exactly is this a problem?

I've modelled this class after the one in Drupal FWIW.

@diablodale
Copy link
Contributor

diablodale commented Jul 31, 2017

Please keep in mind that I'm coming with good intentions. I have a monetary invested interest that Woo is reliable and performant.

Mmm, what you write is interesting in that you have a want/need of backups, yet that is not what IPN and PDT are to each other. IPN and PDT have significantly different feature sets and only one feature overlaps. Here is a nice overview by PayPal. (They've significantly improved their docs since I last deeply read them 4 years ago). https://developer.paypal.com/docs/classic/ipn/integration-guide/IPNPDTAnAlternativetoIPN/

PayPal strongly cautions that a site could implement both IPN and PDT, however it will cause problems that need to be managed. These are the known problems causing Woo pain.

Personally, I disabled PDN because I can wait 1 second. IPN is fast. How fast? Fast enough that it is overlapping with PDT. There is no benefit for locks or for PDT in these cases. Using only IPN solves the stock level problem and the duplicate email problem. IPN is designed to be more reliable than PDT (see the link above). In the cases where IPN is slower (1 minute) or 1 hour (needs 2nd level review by PayPal staff -or- different currencies without automatic convert), IPN will work while PDT will always fail. IPN wins repeatedly.

From what I read in the pain expressed by Woo customers and my understanding of IPN/PDT, there is a very small set of customers who need PDT. It is only those customers that... sell a virtual product + need stock management of that virtual product + must have immediate yes/no paid + don't care about the reliability of that yes/no paid. For example, it is a Woo customer that sells a PDF download + that PDF is limited edition of only 100 downloads + must provide the download link immediately (not in followup emails) + doesn't care when payment delayed/fails + puts the responsibility on the customer to retry delayed/failed payments. That last part is important...IPN does the retrying automatically making a nice customer experience. PDT does not retry and can result in lost sales if the customer doesn't want to go through the process again.

I'm not suggesting to have only IPN. Instead, the expectation of the two and what they actually are by PayPal's design may have been misunderstood in the past. A pause to consider this possibility and how that led to the pains some Woo customers are experiencing could lead to a easy to implement solution (e.g. turn off PDT) that is already well tested for years. It seems the most vocal pain that lead to this lock idea was stock management. Stock management usually isn't virtual; they aren't downloads. These customers don't need the only feature which is unique to PDT; that is, the immediately yes/no. I suggest these customers can wait 0.5 seconds for IPN to respond. Or a minute so a reliable payment can be made. If it doesn't, PayPal itself has problems which is a separate issue and PayPal is best enabled to resolve with their own support processes.

You ask "If both return true...". It is a problem because locks must be reliable and return accurate true/false results. That is the entire reason to exist for locks. Woo is at the user level and can not control thread execution; P1 and P2 of my example can enter the lock code and exit the lock code at the same time. Yet, inside the lock code run in uncontrollable sequence of staggered pauses as I described in my example. This is proven OS design for decades. From your questioning, you might have a knowledge gap in this specific area; no one can be everything everywhere :-). I can only point you to decades of docs on the Internet and my example demonstrating the failed code. Feel free to take my example to your local university professor. And just because Drupal did it, doesn't make it correct. ;-) See every bug everywhere to prove bugs and flawed design are copied between projects.

Have you also consider the cost? A high cost to deliver a feature that isn't reliable isn't a good direction for money generating systems like Woo. The high cost is in the number of roundtrip database accesses of this code implementation. It is very high:

  • a loop in a[c]quire_lock calling create_lock multiple times
  • a infinite recursion loop in create_lock with each level of recursion 3 roundtrips to db

This means it is an infinite loop in a loop with each loop in a loop hitting the database 3 times. This is always a red flag for design.

I still think using IPN and not PDT is the way to go for the majority of Woo customers experiencing the pain of stock mgmt w/ PayPal. Using only IPN is a solution tested for multiple years and works. The people that have a focused need for PDN are the non-vocal subset I described above. For them, I would recommend changing the Woo PayPal gateway itself to allow IPN or PDT but not both.

If you were to introduce this codebase, you will likely shift the problem. There will be some set of customers that still see the stock issues and duplicate emails; or maybe they see double failures. And it won't be all the time, only some times. All of this because the lock implementation isn't reliable, behaves differently due to timing bugs, and all the while increases database load by an infinite number of seconds.

There is still the separate issue of using locks. To discuss that, I have to pretend the lock implementation is correct and performant. In the WC_Order code, you a[c]quire a lock on line 114. Next, the real world occurs...on any line 115-135 this PHP thread halts. It halts because of some apache problem, OS problem, db hickup, network glitch, memory problem, action/filter/hook problem, stray gamma ray (it's a real thing), etc. That means the order is locked for one hour yet the status may not be marked complete/processing, payment_complete may not have been called, trans id and dates maybe/not marked, etc. There is no reasonable way a customer or an admin can fix this until the timeout expires in one hour. This is not a good scenario. And with an inadequate lock implementation it becomes a double-negative.

I thought about all this over the weekend to see if I could think of alternative suggestions. I keep coming back to the same conclusions. To remove the source of the problem (don't use dual IPN and PDT) rather than incorrectly use an inadequate lock implementation.

FYI, you have a typo throughout. It is acquire. Not aquire.

@mikejolley
Copy link
Member Author

Closed in favour of a queue #16971

@mikejolley mikejolley closed this Sep 28, 2017
* @param int $max_wait_timeout The duration in seconds to wait for the lock to be freed.
* @return bool False if a lock couldn't be created or if the lock is still valid. True otherwise.
*/
public static function aquire_lock( $lock_name, $max_wait_timeout = 10 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see a typo here. It should be acquire

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was not merged.

@claudiosanches claudiosanches deleted the update/15780-locking-mechinism branch August 2, 2019 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: in progress This is being worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants