New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache invalidation for scripts in symlinked folders #126

Closed
silentroach opened this Issue Aug 23, 2013 · 47 comments

Comments

Projects
None yet
@silentroach

silentroach commented Aug 23, 2013

Hi!

We are trying to use your opcode cache with our project.
Our code versions are managed in production with symlinks. Sometimes it seems that opcode cacher timestamp invalidation fails after project symlink is changed and we need to clear the cache manually.

/somefolder
    /production -> 20130823_2207
    /20130823_2207
    /20130823_2115
    ...

Can you help us with any advice? Thank you.

Versions:

PHP 5.5.1-1
Zend Engine v2.5.0, Copyright (c) 1998-2013 Zend Technologies
    with Zend OPcache v7.0.2-dev, Copyright (c) 1999-2013, by Zend Technologies

Config:

opcache.blacklist_filename => no value
opcache.consistency_checks => 0
opcache.dups_fix => Off
opcache.enable => On
opcache.enable_cli => Off
opcache.enable_file_override => Off
opcache.error_log => no value
opcache.fast_shutdown => 0
opcache.force_restart_timeout => 180
opcache.inherited_hack => On
opcache.interned_strings_buffer => 4
opcache.load_comments => 1
opcache.log_verbosity_level => 1
opcache.max_accelerated_files => 2000
opcache.max_file_size => 0
opcache.max_wasted_percentage => 5
opcache.memory_consumption => 64
opcache.optimization_level => 0xFFFFFFFF
opcache.preferred_memory_model => no value
opcache.protect_memory => 0
opcache.revalidate_freq => 2
opcache.revalidate_path => Off
opcache.save_comments => 1
opcache.use_cwd => On
opcache.validate_timestamps => On
@dstogov

This comment has been minimized.

Show comment
Hide comment
@dstogov

dstogov Aug 26, 2013

Member

Do you pass the real path names (all symlinks resolved) to opcache_invalidate()? Otherwise if you change the symlink, file names might be resolved to different real paths.

Member

dstogov commented Aug 26, 2013

Do you pass the real path names (all symlinks resolved) to opcache_invalidate()? Otherwise if you change the symlink, file names might be resolved to different real paths.

@silentroach

This comment has been minimized.

Show comment
Hide comment
@silentroach

silentroach Aug 26, 2013

I mean automatic invalidation, not via opcache_invalidate.

silentroach commented Aug 26, 2013

I mean automatic invalidation, not via opcache_invalidate.

@dstogov

This comment has been minimized.

Show comment
Hide comment
@dstogov

dstogov Aug 26, 2013

Member

In case you just change symlink, the scripts laying in old directory are
still valid :)

On Mon, Aug 26, 2013 at 11:22 AM, Kalashnikov Igor <notifications@github.com

wrote:

I mean automatic invalidation, not via opcache_invalidate.


Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/126#issuecomment-23247289
.

Member

dstogov commented Aug 26, 2013

In case you just change symlink, the scripts laying in old directory are
still valid :)

On Mon, Aug 26, 2013 at 11:22 AM, Kalashnikov Igor <notifications@github.com

wrote:

I mean automatic invalidation, not via opcache_invalidate.


Reply to this email directly or view it on GitHubhttps://github.com/zendtech/ZendOptimizerPlus/issues/126#issuecomment-23247289
.

@silentroach

This comment has been minimized.

Show comment
Hide comment
@silentroach

silentroach Aug 26, 2013

And it will be great if Zend OPCache will handle it.
I think it is a common way to deploy. It is used in Capistrano for example.

silentroach commented Aug 26, 2013

And it will be great if Zend OPCache will handle it.
I think it is a common way to deploy. It is used in Capistrano for example.

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Aug 26, 2013

Contributor

The opcode cache uses the realpath of the files, so if a different symlink points to the same file you will get the same set of opcodes. You also have it configured to only check every 2 seconds, so for 2 seconds after pointing your "production" symlink at another target you are going to get the old target files. You can read about how to properly manage a symlinked docroot with opcache here: http://codeascraft.com/2013/07/01/atomic-deploys-at-etsy/

Contributor

rlerdorf commented Aug 26, 2013

The opcode cache uses the realpath of the files, so if a different symlink points to the same file you will get the same set of opcodes. You also have it configured to only check every 2 seconds, so for 2 seconds after pointing your "production" symlink at another target you are going to get the old target files. You can read about how to properly manage a symlinked docroot with opcache here: http://codeascraft.com/2013/07/01/atomic-deploys-at-etsy/

@silentroach

This comment has been minimized.

Show comment
Hide comment
@silentroach

silentroach Aug 26, 2013

Too complicated, to be the best solution :)

And realpath makes symlinks resolved.

silentroach commented Aug 26, 2013

Too complicated, to be the best solution :)

And realpath makes symlinks resolved.

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Aug 26, 2013

Contributor

Well, then just turn off opcache.revalidate_freq so it will revalidate on every request. Your deploys won't be atomic, but it should never load the wrong file.

Contributor

rlerdorf commented Aug 26, 2013

Well, then just turn off opcache.revalidate_freq so it will revalidate on every request. Your deploys won't be atomic, but it should never load the wrong file.

@kayue

This comment has been minimized.

Show comment
Hide comment
@kayue

kayue Sep 8, 2013

+1 on this, we are using Capifony.org / Capistrano to deploy our project, and it uses symlink...

kayue commented Sep 8, 2013

+1 on this, we are using Capifony.org / Capistrano to deploy our project, and it uses symlink...

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Sep 8, 2013

Contributor

+1 on what? There is no bug here. What is most likely happening is that the failed requests are the ones that get screwed over when the symlink switch happens while they are executing, or at least within the revalidate_freq window. You can shrink this window by setting opcache.revalidate_freq to 0. It doesn't entirely eliminate the problem, but it comes very close, at least if your site isn't very busy. To completely eliminate the problem, read on:

opcache has no concept of the start of a request. It works on individual opcode arrays. An opcode array is what is generated and cached for each included file and the key for each included file is the fully qualified path for the script that was compiled. How you got there, via a symlink, various relative path specifiers (think ../path/file.php or ../../other/path/file.php) is irrelevant, the fully qualified path (or the realpath) to that file is the same and the access mechanism is not maintained.

So, when you deploy via something like Capistrano which does a symlink swap on the document root, you want all new requests to get the new files, but you don't want to screw over requests that are currently executing as the deploy is happening. What you really need to create a robust deploy environment is to have your web server be in charge of this. The web server is the piece of the stack that understands when a new request is starting. The opcode cache is too deep in the stack to know or care about that.

With nginx this is quite simple. Just add this to your config:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

This tells nginx to realpath resolve the docroot symlink meaning that as far as your PHP application knows, the target of the symlink if the real document_root. Now, once a request starts, nginx will resolve the symlink as it stands at that point and for the duration of the request it will use the same docroot directory, even if the symlink switch happening mid-request. This entirely eliminates the symptoms described here and it is the correct approach. This isn't something that can be solved at the opcache level.

Apache doesn't have this same mechanism to resolve a docroot symlink at the start of the request, but I have written an Apache module that does it. See https://github.com/etsy/mod_realdoc
It is slightly more efficient than the nginx approach because it includes a configurable realpath cache so you don't have to do this somewhat expensive realpath on every request.

Contributor

rlerdorf commented Sep 8, 2013

+1 on what? There is no bug here. What is most likely happening is that the failed requests are the ones that get screwed over when the symlink switch happens while they are executing, or at least within the revalidate_freq window. You can shrink this window by setting opcache.revalidate_freq to 0. It doesn't entirely eliminate the problem, but it comes very close, at least if your site isn't very busy. To completely eliminate the problem, read on:

opcache has no concept of the start of a request. It works on individual opcode arrays. An opcode array is what is generated and cached for each included file and the key for each included file is the fully qualified path for the script that was compiled. How you got there, via a symlink, various relative path specifiers (think ../path/file.php or ../../other/path/file.php) is irrelevant, the fully qualified path (or the realpath) to that file is the same and the access mechanism is not maintained.

So, when you deploy via something like Capistrano which does a symlink swap on the document root, you want all new requests to get the new files, but you don't want to screw over requests that are currently executing as the deploy is happening. What you really need to create a robust deploy environment is to have your web server be in charge of this. The web server is the piece of the stack that understands when a new request is starting. The opcode cache is too deep in the stack to know or care about that.

With nginx this is quite simple. Just add this to your config:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

This tells nginx to realpath resolve the docroot symlink meaning that as far as your PHP application knows, the target of the symlink if the real document_root. Now, once a request starts, nginx will resolve the symlink as it stands at that point and for the duration of the request it will use the same docroot directory, even if the symlink switch happening mid-request. This entirely eliminates the symptoms described here and it is the correct approach. This isn't something that can be solved at the opcache level.

Apache doesn't have this same mechanism to resolve a docroot symlink at the start of the request, but I have written an Apache module that does it. See https://github.com/etsy/mod_realdoc
It is slightly more efficient than the nginx approach because it includes a configurable realpath cache so you don't have to do this somewhat expensive realpath on every request.

@kayue

This comment has been minimized.

Show comment
Hide comment
@kayue

kayue Sep 8, 2013

The $realpath_root is a great solution. Thanks.

FYI the symlink issue doesn't only happen in the 2 seconds opcache.revalidate_freq window; in my case it doesn't update until I restart php-fpm.

Thanks a lot, this should go into FAQ in my opinion. I would submit a pull request if my English is good enough.

kayue commented Sep 8, 2013

The $realpath_root is a great solution. Thanks.

FYI the symlink issue doesn't only happen in the 2 seconds opcache.revalidate_freq window; in my case it doesn't update until I restart php-fpm.

Thanks a lot, this should go into FAQ in my opinion. I would submit a pull request if my English is good enough.

@silentroach

This comment has been minimized.

Show comment
Hide comment
@silentroach

silentroach Sep 8, 2013

Yes, only service restart or invalidate function call helps.
(closed by mistake, sorry)

$realpath_root isn't a great solution cause it is undocumented.

silentroach commented Sep 8, 2013

Yes, only service restart or invalidate function call helps.
(closed by mistake, sorry)

$realpath_root isn't a great solution cause it is undocumented.

@silentroach silentroach closed this Sep 8, 2013

@silentroach silentroach reopened this Sep 8, 2013

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Sep 8, 2013

Contributor

Actually, invalidating the cache doesn't solve anything. Requests that are already running which started on version A of the code, if you suddenly deploy and invalidate the cache before that request has finished, may very well do another include and at that point it will be including files from version B and you are back to an unknown state for that request. A web server restart, assuming it is a graceful restart that lets existing requests finish, can be made to work, but the timing is a bit tricky. You have to combine the graceful restart with a config change so the new requests will see the new docroot and the requests that are finishing up continue on the previous.

Or you need to get into load balancer tricks where you stop sending new requests to a subset of your servers. Then you wait a while to let existing requests finish, then deploy to that subset and repool them. You may not have enough servers to do this without seriously affecting site performance though, and it also slows down deploys significantly. So I still think having the web server realpath the docroot symlink and setting the effective docroot to the target of that symlink is the slickest and most complete solution to this problem.

Contributor

rlerdorf commented Sep 8, 2013

Actually, invalidating the cache doesn't solve anything. Requests that are already running which started on version A of the code, if you suddenly deploy and invalidate the cache before that request has finished, may very well do another include and at that point it will be including files from version B and you are back to an unknown state for that request. A web server restart, assuming it is a graceful restart that lets existing requests finish, can be made to work, but the timing is a bit tricky. You have to combine the graceful restart with a config change so the new requests will see the new docroot and the requests that are finishing up continue on the previous.

Or you need to get into load balancer tricks where you stop sending new requests to a subset of your servers. Then you wait a while to let existing requests finish, then deploy to that subset and repool them. You may not have enough servers to do this without seriously affecting site performance though, and it also slows down deploys significantly. So I still think having the web server realpath the docroot symlink and setting the effective docroot to the target of that symlink is the slickest and most complete solution to this problem.

@TerryE

This comment has been minimized.

Show comment
Hide comment
@TerryE

TerryE Sep 8, 2013

+1 on Rasmus's points here. The issue that you face is that the PHPs Zend VM maps in new sources at runtime as it executes the INCLUDE_OR_EVAL opcodes which reference the source. If you want to avoid scripts barfing in indeterminate ways during version cutover, you must maintain path integrity across all of these includes (and the requested script). If you defer symbolic resolution of paths to the runtime system then you will occasionally hit the asynchronous edge effects, unless you take the steps that Rasmus describes.

Of course, this is also easy to implement at an application level -- but only if you can set coding standards or modify code, e.g. using tricks like using complete install hierarchies and setting:

define('ROOT_DIR', dirname(__FILE__)); 

(or using a realpath(someSymlink) instead of __FILE__) on immediate entry to every request then using ROOT_DIR relative pathing for all includes.

__FILE__ of the requested script is pretty bomb-proof because this is fully resolved by the SAPI and the requested script is loaded from this resolved path, so any paths (which don't embed further symlinks) relative to this will always be consistent with the above. With the someSymlink variant, there is still a small execution window (< a few mSec) where these might get out of step but this is typically orders of magnitude less than the window that a JiT autoloader would experience.

@rasmus, would you agree with this analysis or have I missed something?

TerryE commented Sep 8, 2013

+1 on Rasmus's points here. The issue that you face is that the PHPs Zend VM maps in new sources at runtime as it executes the INCLUDE_OR_EVAL opcodes which reference the source. If you want to avoid scripts barfing in indeterminate ways during version cutover, you must maintain path integrity across all of these includes (and the requested script). If you defer symbolic resolution of paths to the runtime system then you will occasionally hit the asynchronous edge effects, unless you take the steps that Rasmus describes.

Of course, this is also easy to implement at an application level -- but only if you can set coding standards or modify code, e.g. using tricks like using complete install hierarchies and setting:

define('ROOT_DIR', dirname(__FILE__)); 

(or using a realpath(someSymlink) instead of __FILE__) on immediate entry to every request then using ROOT_DIR relative pathing for all includes.

__FILE__ of the requested script is pretty bomb-proof because this is fully resolved by the SAPI and the requested script is loaded from this resolved path, so any paths (which don't embed further symlinks) relative to this will always be consistent with the above. With the someSymlink variant, there is still a small execution window (< a few mSec) where these might get out of step but this is typically orders of magnitude less than the window that a JiT autoloader would experience.

@rasmus, would you agree with this analysis or have I missed something?

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Sep 8, 2013

Contributor

Yes, you can do it entirely in userspace if have a front controller that is always run first and you are very strict about always using includes relative to that initial path. It is almost exactly the same thing as my approach. I simply do it at the web server level instead of in PHP which means you have a bit more flexibilty at the PHP level and don't have to be quite as vigilant about how you write your code. You still can't refer to files via the symlink, of course. It has to be relative to DOCUMENT_ROOT at all times.

Contributor

rlerdorf commented Sep 8, 2013

Yes, you can do it entirely in userspace if have a front controller that is always run first and you are very strict about always using includes relative to that initial path. It is almost exactly the same thing as my approach. I simply do it at the web server level instead of in PHP which means you have a bit more flexibilty at the PHP level and don't have to be quite as vigilant about how you write your code. You still can't refer to files via the symlink, of course. It has to be relative to DOCUMENT_ROOT at all times.

@silentroach

This comment has been minimized.

Show comment
Hide comment
@silentroach

silentroach Sep 9, 2013

We use realpath function to determine the project root and all includes are made within it.
English is not my best so as you think it is not a bug, I just close the ticket.

silentroach commented Sep 9, 2013

We use realpath function to determine the project root and all includes are made within it.
English is not my best so as you think it is not a bug, I just close the ticket.

@ebuildy

This comment has been minimized.

Show comment
Hide comment
@ebuildy

ebuildy Jul 8, 2014

I got lof of strange behavior with symlink, for instances here a piece of opcache_get_status() result :

/var/www/production/20140708211450/www/index.php: {
full_path: "/var/www/production/20140708211450/www/index.php",
hits: 30742,
memory_consumption: 20944,
last_used: "Tue Jul 8 22:41:25 2014",
last_used_timestamp: 1404852085,
timestamp: 0
},
/var/www/production/20140708212446/www/system/core/Model.php: {
full_path: "/var/www/production/20140708212446/www/system/core/Model.php",
hits: 32168,
memory_consumption: 3440,
last_used: "Tue Jul 8 22:46:29 2014",
last_used_timestamp: 1404852389,
timestamp: 1404847503
},

I don't know if it's related to symlink or not, basically I use Nginx with

root = /var/www/production/current

A symlink that goes to the last version (here /var/www/production/20140708212446). After create the last version folder and changed the symlink destination, I call opcache_reset() (by HTTP curl).

But my index.php stays always at the previous version, with a timestamp of 0. My web site is very high traffic (about 100 request/secondes).

ebuildy commented Jul 8, 2014

I got lof of strange behavior with symlink, for instances here a piece of opcache_get_status() result :

/var/www/production/20140708211450/www/index.php: {
full_path: "/var/www/production/20140708211450/www/index.php",
hits: 30742,
memory_consumption: 20944,
last_used: "Tue Jul 8 22:41:25 2014",
last_used_timestamp: 1404852085,
timestamp: 0
},
/var/www/production/20140708212446/www/system/core/Model.php: {
full_path: "/var/www/production/20140708212446/www/system/core/Model.php",
hits: 32168,
memory_consumption: 3440,
last_used: "Tue Jul 8 22:46:29 2014",
last_used_timestamp: 1404852389,
timestamp: 1404847503
},

I don't know if it's related to symlink or not, basically I use Nginx with

root = /var/www/production/current

A symlink that goes to the last version (here /var/www/production/20140708212446). After create the last version folder and changed the symlink destination, I call opcache_reset() (by HTTP curl).

But my index.php stays always at the previous version, with a timestamp of 0. My web site is very high traffic (about 100 request/secondes).

@pmoust

This comment has been minimized.

Show comment
Hide comment
@pmoust

pmoust Sep 10, 2014

I experience the same as @ebuildy . The mktime() of index.php is different, but somehow it is regarded as 0. This is entrypoint making everything resolve to the 'previous' opcodes cached.

pmoust commented Sep 10, 2014

I experience the same as @ebuildy . The mktime() of index.php is different, but somehow it is regarded as 0. This is entrypoint making everything resolve to the 'previous' opcodes cached.

@pmoust

This comment has been minimized.

Show comment
Hide comment
@pmoust

pmoust Sep 10, 2014

I just re-read #126 (comment) by @rlerdorf .
Thanks for clearing it up Rasmus, once again you 've been very helpful.
Cheers.

pmoust commented Sep 10, 2014

I just re-read #126 (comment) by @rlerdorf .
Thanks for clearing it up Rasmus, once again you 've been very helpful.
Cheers.

@cirpo

This comment has been minimized.

Show comment
Hide comment
@cirpo

cirpo Sep 22, 2014

Reloading nginx using $real_path doesn't solve the issue for me: I still have to reload php5-fpm as well otherwise php5-fpm is still pointing to the previous $real_path.
Even reloading both nginx and php5-fpm I still get some failing requests.
Am I missing anything?

cirpo commented Sep 22, 2014

Reloading nginx using $real_path doesn't solve the issue for me: I still have to reload php5-fpm as well otherwise php5-fpm is still pointing to the previous $real_path.
Even reloading both nginx and php5-fpm I still get some failing requests.
Am I missing anything?

@kayue

This comment has been minimized.

Show comment
Hide comment
@kayue

kayue Sep 22, 2014

You have to reload php5-fpm. Reloading Nginx will not help.

On Mon, Sep 22, 2014 at 9:40 PM, cirpo notifications@github.com wrote:

Reloading nginx using $real_path doesn't solve the issue for me: I still
have to reload php5-fpm as well otherwise php5-fpm is still pointing to the
previous $real_path.
Even reloading both nginx and php5-fpm I still get some failing requests.
Am I missing anything?


Reply to this email directly or view it on GitHub
#126 (comment)
.

kayue commented Sep 22, 2014

You have to reload php5-fpm. Reloading Nginx will not help.

On Mon, Sep 22, 2014 at 9:40 PM, cirpo notifications@github.com wrote:

Reloading nginx using $real_path doesn't solve the issue for me: I still
have to reload php5-fpm as well otherwise php5-fpm is still pointing to the
previous $real_path.
Even reloading both nginx and php5-fpm I still get some failing requests.
Am I missing anything?


Reply to this email directly or view it on GitHub
#126 (comment)
.

@cirpo

This comment has been minimized.

Show comment
Hide comment
@cirpo

cirpo Sep 22, 2014

Reloading fpm I still got some requestes failing...
But from what @rlerdorf said, it should suffice an nginx reload: the ongoing fpm requests will still have the current path, while the new request will get the new path after a nginx reload.
It might be that the real_path in nginx is still cached.

What if I create a new nginx conf during the deployment with the real path and then do an nginx reload?

cirpo commented Sep 22, 2014

Reloading fpm I still got some requestes failing...
But from what @rlerdorf said, it should suffice an nginx reload: the ongoing fpm requests will still have the current path, while the new request will get the new path after a nginx reload.
It might be that the real_path in nginx is still cached.

What if I create a new nginx conf during the deployment with the real path and then do an nginx reload?

@marcmillien

This comment has been minimized.

Show comment
Hide comment
@marcmillien

marcmillien Feb 20, 2015

The $realpath_root solution in nginx doesn't work if you run fpm on different nodes than your nginx nodes, because nginx nodes doesn't have access to the fpm nodes directory tree.

In this case, the solutions seem to be:

  • Using opcache_reset(), although it isn't atomic as described.
  • Reload fpm, although reload may cause some request to fail.

marcmillien commented Feb 20, 2015

The $realpath_root solution in nginx doesn't work if you run fpm on different nodes than your nginx nodes, because nginx nodes doesn't have access to the fpm nodes directory tree.

In this case, the solutions seem to be:

  • Using opcache_reset(), although it isn't atomic as described.
  • Reload fpm, although reload may cause some request to fail.
@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Aug 11, 2015

Does anyone know whether nginx dropped support for $realpath_root? Because we tested a number of variations just today and it always seems to resolve to the same value as $document_root, read, the unresolved symlink.

jportoles commented Aug 11, 2015

Does anyone know whether nginx dropped support for $realpath_root? Because we tested a number of variations just today and it always seems to resolve to the same value as $document_root, read, the unresolved symlink.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Aug 11, 2015

@jportoles Which version of nginx are you using? I just recently migrated from apache2 to nginx and for me $realpath_root works perfectly. But as I'm still on debian 7 my nginx version is 1.2.1 and thus quite outdated...

The official changelog only mentions the feature addition in version 0.7.18:

http://nginx.org/en/CHANGES

dmaicher commented Aug 11, 2015

@jportoles Which version of nginx are you using? I just recently migrated from apache2 to nginx and for me $realpath_root works perfectly. But as I'm still on debian 7 my nginx version is 1.2.1 and thus quite outdated...

The official changelog only mentions the feature addition in version 0.7.18:

http://nginx.org/en/CHANGES

@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Aug 13, 2015

@dmaicher 1.6.3 here, not quite the lastest since we had some issues migrating. From what I understand the version shouldn't be an issue, but it's strange. We tried the following:

fastcgi_param REALPATHTEST $realpath_root;

Then on the actual PHP request var_dump($_SERVER['REALPATHTEST']) suggests the variable is properly set, but its value is the same unresolved document root as usual, even though there is a symlink in the path. We tested with no aliases, rewrites or other nginx location sheningans, just a plain location / block with fastcgi_pass inside. So from there we can only derive that it's not a matter of misconfiguration on our part, but something with the how/when Nginx sets $realpath_root. Perhaps there is something in our stack that somehow conflicts with how nginx sets this variable, but since there is no documentation we have no idea where to look :/

@marcmillien could you perhaps elaborate the case where you saw it didn't work a while ago? You mentioned $realpath_root doesn't work if you run fpm on different nodes, but I don't follow, what did you mean by that? As far as I understand php5-fpm always runs on separate processes, with nginx only acting as the middleman.

jportoles commented Aug 13, 2015

@dmaicher 1.6.3 here, not quite the lastest since we had some issues migrating. From what I understand the version shouldn't be an issue, but it's strange. We tried the following:

fastcgi_param REALPATHTEST $realpath_root;

Then on the actual PHP request var_dump($_SERVER['REALPATHTEST']) suggests the variable is properly set, but its value is the same unresolved document root as usual, even though there is a symlink in the path. We tested with no aliases, rewrites or other nginx location sheningans, just a plain location / block with fastcgi_pass inside. So from there we can only derive that it's not a matter of misconfiguration on our part, but something with the how/when Nginx sets $realpath_root. Perhaps there is something in our stack that somehow conflicts with how nginx sets this variable, but since there is no documentation we have no idea where to look :/

@marcmillien could you perhaps elaborate the case where you saw it didn't work a while ago? You mentioned $realpath_root doesn't work if you run fpm on different nodes, but I don't follow, what did you mean by that? As far as I understand php5-fpm always runs on separate processes, with nginx only acting as the middleman.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Aug 13, 2015

@jportoles I just tried exactly that and for me it works...

location ~ ^/app\.php(/|$) {
    fastcgi_pass unix:/var/run/php5-fpm.sock;
    fastcgi_split_path_info ^(.+\.php)(/.*)$;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
    fastcgi_param DOCUMENT_ROOT $realpath_root;
    fastcgi_param REALPATHTEST $realpath_root;
    internal;
}

And the var_dump($_SERVER['REALPATHTEST']) gives me the resolved path without symlinks.

Is nginx running on the same host/filesystem as php-fpm?

dmaicher commented Aug 13, 2015

@jportoles I just tried exactly that and for me it works...

location ~ ^/app\.php(/|$) {
    fastcgi_pass unix:/var/run/php5-fpm.sock;
    fastcgi_split_path_info ^(.+\.php)(/.*)$;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
    fastcgi_param DOCUMENT_ROOT $realpath_root;
    fastcgi_param REALPATHTEST $realpath_root;
    internal;
}

And the var_dump($_SERVER['REALPATHTEST']) gives me the resolved path without symlinks.

Is nginx running on the same host/filesystem as php-fpm?

@marcmillien

This comment has been minimized.

Show comment
Hide comment
@marcmillien

marcmillien Aug 13, 2015

@jportoles
I have php-fpm listening on port 9000 on server A, server B and C.
I have nginx on 2 servers X and Y that have the following upstream used in my vhost config:

upstream fpm {
    server A:9000;
    server B:9000;
    server C:9000;
}

realpath_root can't work in this case, just because this path is on A, B and C, but there is no php files or directories related to the php on servers X and Y.

marcmillien commented Aug 13, 2015

@jportoles
I have php-fpm listening on port 9000 on server A, server B and C.
I have nginx on 2 servers X and Y that have the following upstream used in my vhost config:

upstream fpm {
    server A:9000;
    server B:9000;
    server C:9000;
}

realpath_root can't work in this case, just because this path is on A, B and C, but there is no php files or directories related to the php on servers X and Y.

@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Aug 13, 2015

Ok, I think I understand the "problem" now. It's a bit silly in retrospect, but I'm documenting it just in case someone else faces it in the future. $realpath_root as the name suggests only resolves the root of path, so for it to work as intended your symlink end point must also be the root path as defined in nginx. So if your symlink is something such as the following:

/var/www/app -> /var/www/app-34f2faf45-123456

Pointing nginx to /var/www/app and then redefining SCRIPT_FILENAME/DOCUMENT_ROOT using $realpath_root in nginx will work as intended. But if your symlink is this:

/var/www/app/tools -> /var/www/app/tools-34f2faf45-123456

And your root on nginx is /var/www/app/, then you are SOL, because in this scenario $realpath_root is obviously /var/www/app/ and $fastcgi_script_name will stay as tools/file.php. Redefining your root location directives in nginx could probably help, but ultimately I guess it's best to just restructure the paths so that the symlink matches the root.

@marcmillien I see, that makes sense. I suppose you could put nginx instances or some other sort of middle man in front of the nodes running php-fpm to solve that, and then forward the requests via proxy_pass rather than fastcgi_pass.

@dmaicher thanks for testing that, just wanted to let you know it helped us found the issue.

jportoles commented Aug 13, 2015

Ok, I think I understand the "problem" now. It's a bit silly in retrospect, but I'm documenting it just in case someone else faces it in the future. $realpath_root as the name suggests only resolves the root of path, so for it to work as intended your symlink end point must also be the root path as defined in nginx. So if your symlink is something such as the following:

/var/www/app -> /var/www/app-34f2faf45-123456

Pointing nginx to /var/www/app and then redefining SCRIPT_FILENAME/DOCUMENT_ROOT using $realpath_root in nginx will work as intended. But if your symlink is this:

/var/www/app/tools -> /var/www/app/tools-34f2faf45-123456

And your root on nginx is /var/www/app/, then you are SOL, because in this scenario $realpath_root is obviously /var/www/app/ and $fastcgi_script_name will stay as tools/file.php. Redefining your root location directives in nginx could probably help, but ultimately I guess it's best to just restructure the paths so that the symlink matches the root.

@marcmillien I see, that makes sense. I suppose you could put nginx instances or some other sort of middle man in front of the nodes running php-fpm to solve that, and then forward the requests via proxy_pass rather than fastcgi_pass.

@dmaicher thanks for testing that, just wanted to let you know it helped us found the issue.

@marcmillien

This comment has been minimized.

Show comment
Hide comment
@marcmillien

marcmillien Aug 14, 2015

@jportoles this is one of the solutions yes :).

marcmillien commented Aug 14, 2015

@jportoles this is one of the solutions yes :).

@ifeltsweet

This comment has been minimized.

Show comment
Hide comment
@ifeltsweet

ifeltsweet Apr 15, 2016

But honestly, how is this not a bug?

Look at what @jportoles described. Nginx doesn't help there.

If you have something like:

/www
    /public
        /symlink -> a
        /a
        /b

So the way it works seems to be the following:

  1. A request for /www/public/symlink/test.php comes in.
  2. OPcache then does a realpath() on this pretty symlink and finds that the real file is in /www/public/a/test.php. It looks at what has been cached for this path and finds that it doesn't have anything.
  3. It runs the file and caches it under its real path.
  4. Now let's switch symlink to b.
  5. A request for /www/public/symlink/test.php comes in again.
  6. OPcache then does a realpath on this pretty symlink and somehow ends up at /www/public/a/test.php again. It realises that it does have this file in cache and then returns you the old /www/public/a/test.php instead of /www/public/b/test.php.

WAIT A SECOND! How did it connect /www/public/symlink/test.php to the /www/public/a/test.php?

So did it actually cache the pretty symlink path as well? It seems so to me.

Step 6 should be the following:
OPcache runs a realpath() on requested file which is /www/public/symlink/test.php and then gets /www/public/b/test.php as an answer. It notices that no such file is in cache and gives you fresh opcode.

I have not dived into the source code but the culprit is probably OPcache using internal PHP realpath cache for symlinks. So in step 6 it doesn't see that symlink is pointing to the new file now since it already cached the realpath for that location in step 2.

Does this sound right?

ifeltsweet commented Apr 15, 2016

But honestly, how is this not a bug?

Look at what @jportoles described. Nginx doesn't help there.

If you have something like:

/www
    /public
        /symlink -> a
        /a
        /b

So the way it works seems to be the following:

  1. A request for /www/public/symlink/test.php comes in.
  2. OPcache then does a realpath() on this pretty symlink and finds that the real file is in /www/public/a/test.php. It looks at what has been cached for this path and finds that it doesn't have anything.
  3. It runs the file and caches it under its real path.
  4. Now let's switch symlink to b.
  5. A request for /www/public/symlink/test.php comes in again.
  6. OPcache then does a realpath on this pretty symlink and somehow ends up at /www/public/a/test.php again. It realises that it does have this file in cache and then returns you the old /www/public/a/test.php instead of /www/public/b/test.php.

WAIT A SECOND! How did it connect /www/public/symlink/test.php to the /www/public/a/test.php?

So did it actually cache the pretty symlink path as well? It seems so to me.

Step 6 should be the following:
OPcache runs a realpath() on requested file which is /www/public/symlink/test.php and then gets /www/public/b/test.php as an answer. It notices that no such file is in cache and gives you fresh opcode.

I have not dived into the source code but the culprit is probably OPcache using internal PHP realpath cache for symlinks. So in step 6 it doesn't see that symlink is pointing to the new file now since it already cached the realpath for that location in step 2.

Does this sound right?

@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Apr 15, 2016

OPcache then does a realpath on this pretty symlink and somehow ends up at /www/public/a/test.php again.

Not OPcache's fault in this case, PHP has a built in realpath cache with a default TTL of 2 minutes, see here: http://php.net/manual/en/ini.core.php#ini.realpath-cache-size

You can disable it (set the TTL to 0) but even if the entry point is correct, you will still have issues with includes being desynchronized while in the middle of a symlink change. This is why it's better to let the server handle it beforehand and use $realpath_root on nginx where possible.

jportoles commented Apr 15, 2016

OPcache then does a realpath on this pretty symlink and somehow ends up at /www/public/a/test.php again.

Not OPcache's fault in this case, PHP has a built in realpath cache with a default TTL of 2 minutes, see here: http://php.net/manual/en/ini.core.php#ini.realpath-cache-size

You can disable it (set the TTL to 0) but even if the entry point is correct, you will still have issues with includes being desynchronized while in the middle of a symlink change. This is why it's better to let the server handle it beforehand and use $realpath_root on nginx where possible.

@ifeltsweet

This comment has been minimized.

Show comment
Hide comment
@ifeltsweet

ifeltsweet Apr 15, 2016

Just tested this and it seems that it is not connected to PHP's realpath cache. PHP's realpath gets updated after 2 minutes but OPcache still sees the old path.

ifeltsweet commented Apr 15, 2016

Just tested this and it seems that it is not connected to PHP's realpath cache. PHP's realpath gets updated after 2 minutes but OPcache still sees the old path.

@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Apr 15, 2016

I'm not too intimate with PHP internals so someone correct me if I'm wrong, but as far as I understand, when a request hits the interpreter, the opcode cache resolves the path first (which may or may not be cached in the realpath cache), and then proceeds to cache the resolved path. With the realpath cache off, the opcode cache should be hitting system calls to resolve the path before fetching a cached entry every time. So with the realpath cache off, the opcode cache shouldn't be the culprit for whatever is failing.

What could be happening is that your application crashes in the middle of a symlink change because you are referencing the unresolved symlink (e.g. /www/public/symlink/) within the code, which can cause the symlink to resolve to different end points within a single request, which is what I was trying to get at before. Turning off the realpath cache or the opcode cache won't help in that case, because it's not a cache issue.

jportoles commented Apr 15, 2016

I'm not too intimate with PHP internals so someone correct me if I'm wrong, but as far as I understand, when a request hits the interpreter, the opcode cache resolves the path first (which may or may not be cached in the realpath cache), and then proceeds to cache the resolved path. With the realpath cache off, the opcode cache should be hitting system calls to resolve the path before fetching a cached entry every time. So with the realpath cache off, the opcode cache shouldn't be the culprit for whatever is failing.

What could be happening is that your application crashes in the middle of a symlink change because you are referencing the unresolved symlink (e.g. /www/public/symlink/) within the code, which can cause the symlink to resolve to different end points within a single request, which is what I was trying to get at before. Turning off the realpath cache or the opcode cache won't help in that case, because it's not a cache issue.

@ifeltsweet

This comment has been minimized.

Show comment
Hide comment
@ifeltsweet

ifeltsweet Apr 15, 2016

  1. I've set realpath_cache_ttl = 0 and it doesn't change a thing. So the OPcache realpath cache does not use internal PHP realpath cache then.
  2. No, my application is not crashing in the middle of symlink change. I am performing a test outlined above without any requests happening during a symlink change. I understand what you are describing about "includes being desynchronized while in the middle of a symlink change", but if you follow my steps then you will see that this is not what I'm talking about at all.
  3. Someone above has suggested that you can fix this in userspace (application level) if you have a front controller. Guess what, you can, EXCEPT for the front controller file which gets called first! OPcache will always see /www/current/public/index.php and the same realpath cache (bug) will be there for that file. Yeh, sure, all the files that are included by index.php will be alright because you will include them relative to the realpath. But what if you change index.php itself? You'll still be getting the old copy of it.

ifeltsweet commented Apr 15, 2016

  1. I've set realpath_cache_ttl = 0 and it doesn't change a thing. So the OPcache realpath cache does not use internal PHP realpath cache then.
  2. No, my application is not crashing in the middle of symlink change. I am performing a test outlined above without any requests happening during a symlink change. I understand what you are describing about "includes being desynchronized while in the middle of a symlink change", but if you follow my steps then you will see that this is not what I'm talking about at all.
  3. Someone above has suggested that you can fix this in userspace (application level) if you have a front controller. Guess what, you can, EXCEPT for the front controller file which gets called first! OPcache will always see /www/current/public/index.php and the same realpath cache (bug) will be there for that file. Yeh, sure, all the files that are included by index.php will be alright because you will include them relative to the realpath. But what if you change index.php itself? You'll still be getting the old copy of it.
@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Apr 15, 2016

Contributor

@ifeltsweet the real way to address this is at the web server level as I explained above. There isn't anything we can do at the PHP level. Opcache is working as expected when it comes to resolving symlinks.

Contributor

rlerdorf commented Apr 15, 2016

@ifeltsweet the real way to address this is at the web server level as I explained above. There isn't anything we can do at the PHP level. Opcache is working as expected when it comes to resolving symlinks.

@jportoles

This comment has been minimized.

Show comment
Hide comment
@jportoles

jportoles Apr 15, 2016

@rlerdorf shouldn't disabling the realpath cache also work though? I was under the impression that you could solve this either at a web server level (e.g. $realpath_root on nginx) OR by disabling the realpath cache at the PHP level (albeit not recommended).

jportoles commented Apr 15, 2016

@rlerdorf shouldn't disabling the realpath cache also work though? I was under the impression that you could solve this either at a web server level (e.g. $realpath_root on nginx) OR by disabling the realpath cache at the PHP level (albeit not recommended).

@ifeltsweet

This comment has been minimized.

Show comment
Hide comment
@ifeltsweet

ifeltsweet Apr 15, 2016

@rlerdorf sure and I agree with you, but there is still something wrong with the way OPcache resolves symlinks, it seems to just cache resolved realpaths forever. What if Nginx also cached once resolved "$realpath_root" forever? You wouldn't be able to use "$realpath_root" then.

ifeltsweet commented Apr 15, 2016

@rlerdorf sure and I agree with you, but there is still something wrong with the way OPcache resolves symlinks, it seems to just cache resolved realpaths forever. What if Nginx also cached once resolved "$realpath_root" forever? You wouldn't be able to use "$realpath_root" then.

@rlerdorf

This comment has been minimized.

Show comment
Hide comment
@rlerdorf

rlerdorf Apr 15, 2016

Contributor

But it doesn't cache resolved paths forever at all. If it did, then the deploy strategy I described wouldn't work and it has worked on a very large site with 40+ deploys per day with a ton of traffic for a couple of years now. I think you need to go back and look at your assumptions and perhaps create some test scenarios to figure out what you are doing wrong.

Contributor

rlerdorf commented Apr 15, 2016

But it doesn't cache resolved paths forever at all. If it did, then the deploy strategy I described wouldn't work and it has worked on a very large site with 40+ deploys per day with a ton of traffic for a couple of years now. I think you need to go back and look at your assumptions and perhaps create some test scenarios to figure out what you are doing wrong.

@ifeltsweet

This comment has been minimized.

Show comment
Hide comment
@ifeltsweet

ifeltsweet Apr 15, 2016

The strategy that you are using at etsy relies on mod_realdoc to resolve your realpath. It also caches those paths for 2 seconds (only 2 seconds). The strategy that I am describing relies on OPcache to resolve the realpath. The difference is that OPcache doesn't seem to see that symlink is now pointing to the new location. Not after 2 seconds, 2 minutes or even an hour.

I will create a test and hopefully we can all take a look at it together.

ifeltsweet commented Apr 15, 2016

The strategy that you are using at etsy relies on mod_realdoc to resolve your realpath. It also caches those paths for 2 seconds (only 2 seconds). The strategy that I am describing relies on OPcache to resolve the realpath. The difference is that OPcache doesn't seem to see that symlink is now pointing to the new location. Not after 2 seconds, 2 minutes or even an hour.

I will create a test and hopefully we can all take a look at it together.

@jboffel

This comment has been minimized.

Show comment
Hide comment
@jboffel

jboffel May 23, 2016

@ifeltsweet

There is blacklist where you can specify a list of file you don't want to cache.

This list works based on my test even on symlink path.

Then if you set in this list something like /my/current/workspace even "current" is a symlink the file the symlink point to will never be included in the cache.

So for those who wants to control from userspace thanks to a front controller they may eventually achieve this if they are ok to afford the cost of one never cached PHP file...

See: opcache.blacklist_filename

jboffel commented May 23, 2016

@ifeltsweet

There is blacklist where you can specify a list of file you don't want to cache.

This list works based on my test even on symlink path.

Then if you set in this list something like /my/current/workspace even "current" is a symlink the file the symlink point to will never be included in the cache.

So for those who wants to control from userspace thanks to a front controller they may eventually achieve this if they are ok to afford the cost of one never cached PHP file...

See: opcache.blacklist_filename

@webdevilopers

This comment has been minimized.

Show comment
Hide comment
@webdevilopers

webdevilopers May 23, 2016

TLDR I had issues with OPcache using Capistrano for years. I tried a lot of suggested fixes and workarounds In the end this article helped:
http://jpauli.github.io/2015/03/05/opcache.html#understanding-the-opcache-memory-consumption

The final fix was adding this to php.ini:

opcache.use_cwd = 1
opcache.revalidate_path = 1

This finally solved all my problems. Maybe this helps someone else too.

webdevilopers commented May 23, 2016

TLDR I had issues with OPcache using Capistrano for years. I tried a lot of suggested fixes and workarounds In the end this article helped:
http://jpauli.github.io/2015/03/05/opcache.html#understanding-the-opcache-memory-consumption

The final fix was adding this to php.ini:

opcache.use_cwd = 1
opcache.revalidate_path = 1

This finally solved all my problems. Maybe this helps someone else too.

@jboffel

This comment has been minimized.

Show comment
Hide comment
@jboffel

jboffel May 23, 2016

Also, about the way the cache is storing the data and access them. One of your assumption is that it always try to perform a realpath on cache entry tentative and that it should then spot the change with a same symlink to invalidate the cache and cache the new one.

Turn out it is not true, it depends on the scenario and is rather complex.

The cache key is build in this way (in order specifically to avoid to have to perform a realpath anytime):

/* Instead of resolving full real path name each time we need to identify file,
 * we create a key that consist from requested file name, current working
 * directory, current include_path, etc */

Then in the case of you have let's say a script in a symlinked folder included by another script in a non symlinked folder the key will looks like:
"PARENT_SCRIPT_FOLDER:SYMLINKED_PATH(as in the include declaration):INCLUDE_:PATH:_CONTENT"

So technically, changing the symlink in that case is not going to change the key name and eventually could end in the original old script remains in the cache if non of the other invalidation mechanism is triggered and the realpath things has little or nothing to do with that use case...

Now another interesting use case, if the front controller is already in a symlinked folder, the key generated to identify the file in the cache for this very first file is just the path of the file still including the symlink.

Ex: /my/root/folder/symlinktoV1/frontController.php => key=/my/root/folder/symlinktoV1/frontController.php not /my/root/folder/V1/frontController.php

So without a cache reset it looks unlikely that the cache will as quickly as possible get updated.

Also don't get fool by the opcode status function. It will display things like:

   [scripts] => Array
        (
            [/home/test/V2/index.php] => Array
                (
                    [full_path] => /home/test/V2/index.php
                    [hits] => 0
                    [memory_consumption] => 1272
                    [last_used] => Mon May 23 18:26:27 2016
                    [last_used_timestamp] => 1463995587
                    [timestamp] => 1463994097
                )

            [/home/test/V2/V3/test.php] => Array
                (
                    [full_path] => /home/test/V2/V3/test.php
                    [hits] => 0
                    [memory_consumption] => 816
                    [last_used] => Mon May 23 18:26:27 2016
                    [last_used_timestamp] => 1463995587
                    [timestamp] => 1463966264
                )

        )

So you might trust the output to be like the array key in scripts match the cache key to identify a file where it's always a realpath returned but it's actually not always really the key used internally.

Well to be fair, there are actually several different ways to access the cache value and depending on how it has been cached in the first place but it will be retrieve later on in this order for a FastCGI request:

  1. Test if store with the full path (so the full path is the key actually)
  2. Test if store with the key (key in term of has been generated differently than the full path or the real path using the above rule like working_directory:filename...)
  3. Test if store with real path (so the key would have been the real path)

So if 1 or 2 is match before 3 then 3 is never called.

Which means it all depends on how it has been cache in the first place. FastCGI and require_once and include_once use similar algorithm to cache or access a cached file. So I tried to reproduce the FastCGI situation but using the cli by creating a test script in that way (in cli opcache is not persistent, it's just keep compiled script in opcache until end of the script execution):

opcache_compile_file(using_non_symlink_path);
include_once 'using_symlink_path';

And the include_once then hit the cache.

Now a little more closer to what happen in the first hit when the file is not yet cached with an include_once on a symlinked path.

First the include_once is trigger and actually try to resolve the path (default PHP behavior) and then the opcache take its chance by hooking the process there to identify the already eventually compiled script saved in the cache and a key as well as resolving the path.

As the script is not yet cached this step is simply memorizing that it will be needed to cache this script and returning the resolved path to PHP engine.

Then surprisingly persistent_zend_resolve_path got hit again still by the include_once but this time in the context of the php_stream_open_for_zend_ex, interestingly here we can note that the given filename is already the resolved path, not anymore the symlink. But this does not matter that much.

Finally we hit the persistent_compile_file in the context of the include_once. However the symlink information is gone already and we are just left with the real path that will be really use as the key to store the value.

So, sorry for the very long comment, in the case of an include_once, require_once or any FastCGI request as the trigger, the resolution of the symlink will be left actually to the PHP original engine function (phar_resolve_path=>phar_find_in_include_path=>phar_save_resolve_path=>php_resolve_path 🍡 ), it will happen systematically due to include_once/require_once behavior and the opcode will always get a resolved path, never a symlinked path if the script has not been cached previously using a different way (like opcache_compile_file). So if php_resolve_path return anytime the last resolved path then the opcache will create a new entry as soon as the link points to a new location as it will be a new key (which as few to do with the invalidation of the old keys that could persist depending on the cache ttl, which means that several version of the same file may coexist in the cache at the same time). However if for some reason the resolved path return by the PHP engine is still the old one then you will get the old script.

The only thing I could not analyze clearly here is for the very first file loaded by PHP from the webserver request as the entry point (the "front controller") if the internal key is really the resolved path or the symlinked path... But for any subsequent include_once, require_once (not include) the just above scenario should apply.

Hope it may help to understand a little more the inside of the opcache :)

I did reproduce apache prefork (most commonly used configuration for apache+php when php loaded as a module) and attached a gdb on it and I confirmed that for the very first file (I mean actually any files open up by Apache from the document root as the first entry point of your request out of the include_once, require_once mechanism etc...) is indeed stored in the cache by full path which means NOT the real path. So it will let you with the problem of the real path cache of php to deal with the entry point.

Then it seems if you really don't want to have to restart Apache but continue to deploy based on symlink that you'll have to disable the realpath_cache and accept performance consequences.
Also you most likely will want to avoid call to include as it cache by key. Then you should be able to deploy without the need to reset the opcache neither the realpath_cache.
However it is not going to prevent race conditions like request already in execution may then try to load some new file version in the context of the old version.
Of course you can use the mentioned module by @rlerdorf (https://github.com/etsy/mod_realdoc) as it resolve the symlink prior to PHP then it can solve most of your issues. However it seems people having issue in a virtual host configuration with this module...

By the way, if you want to get the "same" results than with @rlerdorf Apache extension but you rather prefer patch on PHP side (if you can compile it) to avoid the VirtualHost issue then here you go:

File: ./sapi/apache2handler/sapi_apache2.c
Function: php_handler (on recent source code around line 600)

Change:
                zfd.type = ZEND_HANDLE_FILENAME;
                zfd.filename = (char *) r->filename;
                zfd.free_filename = 0;
                zfd.opened_path = NULL;
To:
                zfd.type = ZEND_HANDLE_FILENAME;
                char *resolved_path = realpath((char *) r->filename, NULL TSRMLS_CC);
                zfd.filename = resolved_path;
                zfd.free_filename = 0;
                zfd.opened_path = NULL;

The native realpath function is not going through the php realpath cache system.

jboffel commented May 23, 2016

Also, about the way the cache is storing the data and access them. One of your assumption is that it always try to perform a realpath on cache entry tentative and that it should then spot the change with a same symlink to invalidate the cache and cache the new one.

Turn out it is not true, it depends on the scenario and is rather complex.

The cache key is build in this way (in order specifically to avoid to have to perform a realpath anytime):

/* Instead of resolving full real path name each time we need to identify file,
 * we create a key that consist from requested file name, current working
 * directory, current include_path, etc */

Then in the case of you have let's say a script in a symlinked folder included by another script in a non symlinked folder the key will looks like:
"PARENT_SCRIPT_FOLDER:SYMLINKED_PATH(as in the include declaration):INCLUDE_:PATH:_CONTENT"

So technically, changing the symlink in that case is not going to change the key name and eventually could end in the original old script remains in the cache if non of the other invalidation mechanism is triggered and the realpath things has little or nothing to do with that use case...

Now another interesting use case, if the front controller is already in a symlinked folder, the key generated to identify the file in the cache for this very first file is just the path of the file still including the symlink.

Ex: /my/root/folder/symlinktoV1/frontController.php => key=/my/root/folder/symlinktoV1/frontController.php not /my/root/folder/V1/frontController.php

So without a cache reset it looks unlikely that the cache will as quickly as possible get updated.

Also don't get fool by the opcode status function. It will display things like:

   [scripts] => Array
        (
            [/home/test/V2/index.php] => Array
                (
                    [full_path] => /home/test/V2/index.php
                    [hits] => 0
                    [memory_consumption] => 1272
                    [last_used] => Mon May 23 18:26:27 2016
                    [last_used_timestamp] => 1463995587
                    [timestamp] => 1463994097
                )

            [/home/test/V2/V3/test.php] => Array
                (
                    [full_path] => /home/test/V2/V3/test.php
                    [hits] => 0
                    [memory_consumption] => 816
                    [last_used] => Mon May 23 18:26:27 2016
                    [last_used_timestamp] => 1463995587
                    [timestamp] => 1463966264
                )

        )

So you might trust the output to be like the array key in scripts match the cache key to identify a file where it's always a realpath returned but it's actually not always really the key used internally.

Well to be fair, there are actually several different ways to access the cache value and depending on how it has been cached in the first place but it will be retrieve later on in this order for a FastCGI request:

  1. Test if store with the full path (so the full path is the key actually)
  2. Test if store with the key (key in term of has been generated differently than the full path or the real path using the above rule like working_directory:filename...)
  3. Test if store with real path (so the key would have been the real path)

So if 1 or 2 is match before 3 then 3 is never called.

Which means it all depends on how it has been cache in the first place. FastCGI and require_once and include_once use similar algorithm to cache or access a cached file. So I tried to reproduce the FastCGI situation but using the cli by creating a test script in that way (in cli opcache is not persistent, it's just keep compiled script in opcache until end of the script execution):

opcache_compile_file(using_non_symlink_path);
include_once 'using_symlink_path';

And the include_once then hit the cache.

Now a little more closer to what happen in the first hit when the file is not yet cached with an include_once on a symlinked path.

First the include_once is trigger and actually try to resolve the path (default PHP behavior) and then the opcache take its chance by hooking the process there to identify the already eventually compiled script saved in the cache and a key as well as resolving the path.

As the script is not yet cached this step is simply memorizing that it will be needed to cache this script and returning the resolved path to PHP engine.

Then surprisingly persistent_zend_resolve_path got hit again still by the include_once but this time in the context of the php_stream_open_for_zend_ex, interestingly here we can note that the given filename is already the resolved path, not anymore the symlink. But this does not matter that much.

Finally we hit the persistent_compile_file in the context of the include_once. However the symlink information is gone already and we are just left with the real path that will be really use as the key to store the value.

So, sorry for the very long comment, in the case of an include_once, require_once or any FastCGI request as the trigger, the resolution of the symlink will be left actually to the PHP original engine function (phar_resolve_path=>phar_find_in_include_path=>phar_save_resolve_path=>php_resolve_path 🍡 ), it will happen systematically due to include_once/require_once behavior and the opcode will always get a resolved path, never a symlinked path if the script has not been cached previously using a different way (like opcache_compile_file). So if php_resolve_path return anytime the last resolved path then the opcache will create a new entry as soon as the link points to a new location as it will be a new key (which as few to do with the invalidation of the old keys that could persist depending on the cache ttl, which means that several version of the same file may coexist in the cache at the same time). However if for some reason the resolved path return by the PHP engine is still the old one then you will get the old script.

The only thing I could not analyze clearly here is for the very first file loaded by PHP from the webserver request as the entry point (the "front controller") if the internal key is really the resolved path or the symlinked path... But for any subsequent include_once, require_once (not include) the just above scenario should apply.

Hope it may help to understand a little more the inside of the opcache :)

I did reproduce apache prefork (most commonly used configuration for apache+php when php loaded as a module) and attached a gdb on it and I confirmed that for the very first file (I mean actually any files open up by Apache from the document root as the first entry point of your request out of the include_once, require_once mechanism etc...) is indeed stored in the cache by full path which means NOT the real path. So it will let you with the problem of the real path cache of php to deal with the entry point.

Then it seems if you really don't want to have to restart Apache but continue to deploy based on symlink that you'll have to disable the realpath_cache and accept performance consequences.
Also you most likely will want to avoid call to include as it cache by key. Then you should be able to deploy without the need to reset the opcache neither the realpath_cache.
However it is not going to prevent race conditions like request already in execution may then try to load some new file version in the context of the old version.
Of course you can use the mentioned module by @rlerdorf (https://github.com/etsy/mod_realdoc) as it resolve the symlink prior to PHP then it can solve most of your issues. However it seems people having issue in a virtual host configuration with this module...

By the way, if you want to get the "same" results than with @rlerdorf Apache extension but you rather prefer patch on PHP side (if you can compile it) to avoid the VirtualHost issue then here you go:

File: ./sapi/apache2handler/sapi_apache2.c
Function: php_handler (on recent source code around line 600)

Change:
                zfd.type = ZEND_HANDLE_FILENAME;
                zfd.filename = (char *) r->filename;
                zfd.free_filename = 0;
                zfd.opened_path = NULL;
To:
                zfd.type = ZEND_HANDLE_FILENAME;
                char *resolved_path = realpath((char *) r->filename, NULL TSRMLS_CC);
                zfd.filename = resolved_path;
                zfd.free_filename = 0;
                zfd.opened_path = NULL;

The native realpath function is not going through the php realpath cache system.

@vingrad

This comment has been minimized.

Show comment
Hide comment
@vingrad

vingrad Jan 2, 2017

Is this issue by PHP7 still present?

vingrad commented Jan 2, 2017

Is this issue by PHP7 still present?

@AnatolyRugalev

This comment has been minimized.

Show comment
Hide comment
@AnatolyRugalev

AnatolyRugalev Apr 12, 2017

@vingrad this is not an issue. This behavior still persist in PHP7 as well

AnatolyRugalev commented Apr 12, 2017

@vingrad this is not an issue. This behavior still persist in PHP7 as well

@sulate

This comment has been minimized.

Show comment
Hide comment
@sulate

sulate Jun 15, 2017

This IS a bug! Opcache remembers symlink-to-realpath mapping internally forever, like already stated above, so this should be fixed! And still persists in php7.

sulate commented Jun 15, 2017

This IS a bug! Opcache remembers symlink-to-realpath mapping internally forever, like already stated above, so this should be fixed! And still persists in php7.

@kayue

This comment has been minimized.

Show comment
Hide comment
@kayue

kayue Jun 15, 2017

@sulate See problem explain here: http://jpauli.github.io/2014/06/30/realpath-cache.html

I don't think they are ever gonna fix this.

kayue commented Jun 15, 2017

@sulate See problem explain here: http://jpauli.github.io/2014/06/30/realpath-cache.html

I don't think they are ever gonna fix this.

@mediafigaro

This comment has been minimized.

Show comment
Hide comment
@mediafigaro

mediafigaro commented Jan 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment