Skip to content

Commit

Permalink
[BUGFIX] Support concurrent requests without 503 responses
Browse files Browse the repository at this point in the history
Instead of offloading the work to wait for the final page content,
concurrent requests now wait for the real page content to be
rendered (and deliver the content from cache once ready) instead
of sending a 503 response code and the famous "Page is being
generated" message.

The logic to wait for the rendered page (lock and wait) is already
there thanks to the page locking, but was deliberately circumvented
by the temporary page cache content before to get rid of waiting
requests in high-load situations.

This approach to simply skip the temporary cache and wait for the
renderer to finish has been tested using the skip_page_is_being_generated
extension in wild for 3 years (for TYPO3 v6, v7, v8 and v9).

Note: In case the increased number of waiting requests has a negative
impact on sites with high server load, a additional proxy cache should be
considered in front of the server to make sure clients are served a valid
response without waiting until new content is ready.

The motivation for this change:
The 503 status code together with the "Page is being generated message"
does not only occur for slow or high traffic sites. It will be displayed
even for two concurrent requests, no matter how fast the page rendered
or how low the current traffic is.
The requests only need to (nearly) arrive at the same time. This can
easily be reproduced using two parallel curl requests:
for i in {1..2}; do curl -sv https://doma.in/foo |& grep '^< HTTP'& done
There would be one "503 Service unavailable" response when /foo has not
yet been rendered to the cache before.

An explanation for the releaseLock('pagesection') addition in
TSFE::getFromCache(): This has been added as the pagesection lock – which
is acquired in TemplateService::start() – was implicitly released in
setPageCacheContent() before. Now this would block concurrent rendering
for pages with $_GET-aware plugins, therefore we release the pagesection
lock early, after the pagesection cache has been generated.

Releases: master, 9.5, 8.7
Resolves: #87980
Change-Id: I034f410335b3035c5863b26e3e689ca29b5f3f80
Reviewed-on: https://review.typo3.org/c/Packages/TYPO3.CMS/+/60656
Tested-by: TYPO3com <noreply@typo3.com>
Tested-by: Jonas Eberle <flightvision@googlemail.com>
Tested-by: Andreas Fernandez <a.fernandez@scripting-base.de>
Tested-by: Benjamin Franzke <bfr@qbus.de>
Reviewed-by: Andreas Fernandez <a.fernandez@scripting-base.de>
Reviewed-by: Benjamin Franzke <bfr@qbus.de>
  • Loading branch information
bnf committed May 6, 2019
1 parent e420088 commit b8775ca
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 115 deletions.
@@ -0,0 +1,34 @@
.. include:: ../../Includes.txt

======================================================================
Important: #87980 - "Page is being generated" message has been removed
======================================================================

See :issue:`87980`

Description
===========

The "Page is being generated" message and the corresponding
temporary 503 response have been removed.

Instead of offloading the work to wait for the final page content,
concurrent requests now wait for the real page content to be
rendered (and deliver the content from cache once ready) instead
of sending a 503 response code and the famous "Page is being
generated" message.


The motivation for this change:
The 503 status code together with the "Page is being generated message"
does not only occur for slow or high traffic sites. It will be displayed
even for two concurrent requests, no matter how fast the page rendered
or how low the current traffic is.
The requests only need to (nearly) arrive at the same time.

Note: In case the increased number of waiting requests has a negative
impact on highly frequented servers, a additional proxy cache should be
considered in front of the server to make sure clients are served a valid
response without waiting until new content is ready.

.. index:: Frontend, ext:frontend
Expand Up @@ -356,8 +356,7 @@ class TypoScriptFrontendController implements LoggerAwareInterface
protected $cacheTimeOutDefault = false;

/**
* Set internally if cached content is fetched from the database. No matter if it is temporary
* content (tempContent) or already generated page content.
* Set internally if cached content is fetched from the database.
*
* @var bool
* @internal
Expand Down Expand Up @@ -433,6 +432,7 @@ class TypoScriptFrontendController implements LoggerAwareInterface
* that the current page request is fully cached, and needs no page generation.
* @var mixed
* @internal
* @deprecated this property is not in use anymore and will be removed in TYPO3 v10.0.
*/
protected $tempContent = false;

Expand Down Expand Up @@ -2299,9 +2299,6 @@ public function reqCHash()
return;
}
if ($GLOBALS['TYPO3_CONF_VARS']['FE']['pageNotFoundOnCHashError']) {
if ($this->tempContent) {
$this->clearPageCacheContent();
}
$response = GeneralUtility::makeInstance(ErrorController::class)->pageNotFoundAction(
$GLOBALS['TYPO3_REQUEST'],
'Request parameters could not be validated (&cHash empty)',
Expand Down Expand Up @@ -2372,9 +2369,8 @@ public function getFromCache()
// we have the content, nice that some other process did the work for us already
$this->releaseLock('pagesection');
}
// We keep the lock set, because we are the ones generating the page now
// and filling the cache.
// This indicates that we have to release the lock in the Registry later in releaseLocks()
// We keep the lock set, because we are the ones generating the page now and filling the cache.
// This indicates that we have to release the lock later in releaseLocks()
}

if (is_array($pageSectionCacheContent)) {
Expand Down Expand Up @@ -2409,9 +2405,8 @@ public function getFromCache()
// we have the content, nice that some other process did the work for us
$this->releaseLock('pages');
}
// We keep the lock set, because we are the ones generating the page now
// and filling the cache.
// This indicates that we have to release the lock in the Registry later in releaseLocks()
// We keep the lock set, because we are the ones generating the page now and filling the cache.
// This indicates that we have to release the lock later in releaseLocks()
}
if (is_array($row)) {
// we have data from cache
Expand All @@ -2425,8 +2420,6 @@ public function getFromCache()
$this->config = $row['cache_data'];
// Getting the content
$this->content = $row['content'];
// Flag for temp content
$this->tempContent = $row['temp_content'];
// Setting flag, so we know, that some cached content has been loaded
$this->cacheContentFlag = true;
$this->cacheExpires = $row['expires'];
Expand Down Expand Up @@ -2586,6 +2579,9 @@ public function getConfigArray()
// Start parsing the TS template. Might return cached version.
$this->tmpl->start($this->rootLine);
$timeTracker->pull();
// At this point we have a valid pagesection_cache (generated in $this->tmpl->start()),
// so let all other processes proceed now. (They are blocked at the pagessection_lock in getFromCache())
$this->releaseLock('pagesection');
if ($this->tmpl->loaded) {
$timeTracker->push('Setting the config-array');
// toplevel - objArrayName
Expand Down Expand Up @@ -3294,61 +3290,6 @@ public function isGeneratePage()
return !$this->cacheContentFlag && empty($this->activeUrlHandlers);
}

/**
* Temp cache content
* The temporary cache will expire after a few seconds (typ. 30) or will be cleared by the rendered page,
* which will also clear and rewrite the cache.
*/
protected function tempPageCacheContent()
{
$this->tempContent = false;
if (!$this->no_cache) {
$seconds = 30;
$title = htmlspecialchars($this->printTitle($this->page['title']));
$request_uri = htmlspecialchars(GeneralUtility::getIndpEnv('REQUEST_URI'));
$stdMsg = '
<strong>Page is being generated.</strong><br />
If this message does not disappear within ' . $seconds . ' seconds, please reload.';
$message = $this->config['config']['message_page_is_being_generated'];
if ((string)$message !== '') {
$message = str_replace(['###TITLE###', '###REQUEST_URI###'], [$title, $request_uri], $message);
} else {
$message = $stdMsg;
}
$temp_content = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>' . $title . '</title>
<meta http-equiv="refresh" content="10" />
</head>
<body style="background-color:white; font-family:Verdana,Arial,Helvetica,sans-serif; color:#cccccc; text-align:center;">' . $message . '
</body>
</html>';
// Fix 'nice errors' feature in modern browsers
$padSuffix = '<!--pad-->';
// prevent any trims
$padSize = 768 - strlen($padSuffix) - strlen($temp_content);
if ($padSize > 0) {
$temp_content = str_pad($temp_content, $padSize, LF) . $padSuffix;
}
if (!$this->headerNoCache() && ($cachedRow = $this->getFromCache_queryRow())) {
// We are here because between checking for cached content earlier and now some other HTTP-process managed to store something in cache AND it was not due to a shift-reload by-pass.
// This is either the "Page is being generated" screen or it can be the final result.
// In any case we should not begin another rendering process also, so we silently disable caching and render the page ourselves and that's it.
// Actually $cachedRow contains content that we could show instead of rendering. Maybe we should do that to gain more performance but then we should set all the stuff done in $this->getFromCache()... For now we stick to this...
$this->set_no_cache('Another process wrote into the cache since the beginning of the render process', true);

// Since the new Locking API this should never be the case
} else {
$this->tempContent = true;
// This flag shows that temporary content is put in the cache
$this->setPageCacheContent($temp_content, $this->config, $GLOBALS['EXEC_TIME'] + $seconds);
}
}
}

/**
* Set cache content to $this->content
*/
Expand All @@ -3357,7 +3298,6 @@ protected function realPageCacheContent()
// seconds until a cached page is too old
$cacheTimeout = $this->get_cache_timeout();
$timeOutTime = $GLOBALS['EXEC_TIME'] + $cacheTimeout;
$this->tempContent = false;
$usePageCache = true;
// Hook for deciding whether page cache should be written to the cache backend or not
// NOTE: as hooks are called in a loop, the last hook will have the final word (however each
Expand All @@ -3381,15 +3321,14 @@ protected function realPageCacheContent()
* @param string $content The content to store in the HTML field of the cache table
* @param mixed $data The additional cache_data array, fx. $this->config
* @param int $expirationTstamp Expiration timestamp
* @see realPageCacheContent(), tempPageCacheContent()
* @see realPageCacheContent()
*/
protected function setPageCacheContent($content, $data, $expirationTstamp)
{
$cacheData = [
'identifier' => $this->newHash,
'page_id' => $this->id,
'content' => $content,
'temp_content' => $this->tempContent,
'cache_data' => $data,
'expires' => $expirationTstamp,
'tstamp' => $GLOBALS['EXEC_TIME'],
Expand Down Expand Up @@ -3520,17 +3459,6 @@ public function generatePage_preProcessing()
// \TYPO3\CMS\Core\TypoScript\TemplateService::start() in the meantime, so this must be called again!
$this->newHash = $this->getHash();

// If the pages_lock is set, we are in charge of generating the page.
if (is_object($this->locks['pages']['accessLock'])) {
// Here we put some temporary stuff in the cache in order to let the first hit generate the page.
// The temporary cache will expire after a few seconds (typ. 30) or will be cleared by the rendered page,
// which will also clear and rewrite the cache.
$this->tempPageCacheContent();
}
// At this point we have a valid pagesection_cache and also some temporary page_cache content,
// so let all other processes proceed now. (They are blocked at the pagessection_lock in getFromCache())
$this->releaseLock('pagesection');

// Setting cache_timeout_default. May be overridden by PHP include scripts.
$this->cacheTimeOutDefault = (int)($this->config['config']['cache_period'] ?? 0);
// Page is generated
Expand Down Expand Up @@ -3692,10 +3620,6 @@ public function generatePage_postProcessing()
// Storing for cache:
if (!$this->no_cache) {
$this->realPageCacheContent();
} elseif ($this->tempContent) {
// If there happens to be temporary content in the cache and the cache was not cleared due to new content, put it in... ($this->no_cache=0)
$this->clearPageCacheContent();
$this->tempContent = false;
}
// Sets sys-last-change:
$this->setSysLastChanged();
Expand Down Expand Up @@ -3768,7 +3692,7 @@ public function generatePageTitle(): string
* @param bool $showTitleFirst If set, then "sitetitle" and $title is swapped
* @param string $pageTitleSeparator an alternative to the ": " as the separator between site title and page title
* @return string The page title on the form "[sitetitle]: [input-title]". Not htmlspecialchar()'ed.
* @see tempPageCacheContent(), generatePageTitle()
* @see generatePageTitle()
*/
protected function printTitle(string $pageTitle, bool $noTitle = false, bool $showTitleFirst = false, string $pageTitleSeparator = ''): string
{
Expand Down Expand Up @@ -4020,14 +3944,6 @@ public function sendHttpHeadersDirectly()
$headerConfig['statusCode']
);
}
// Send appropriate status code in case of temporary content
if ($this->tempContent) {
header('HTTP/1.0 503 Service unavailable');
$headers = $this->getHttpHeadersForTemporaryContent();
foreach ($headers as $header => $value) {
header($header . ': ' . $value);
}
}
}

/**
Expand Down Expand Up @@ -4085,14 +4001,6 @@ public function applyHttpHeadersToResponse(ResponseInterface $response): Respons
$response = $response->withAddedHeader($header, $value);
}
}
// Send appropriate status code in case of temporary content
if ($this->tempContent) {
$response = $response->withStatus(503, 'Service unavailable');
$headers = $this->getHttpHeadersForTemporaryContent();
foreach ($headers as $header => $value) {
$response = $response->withHeader($header, $value);
}
}
return $response;
}

Expand All @@ -4112,9 +4020,6 @@ public function sendCacheHeaders()

/**
* Get cache headers good for client/reverse proxy caching.
* This function should not be called if the page content is temporary (like for "Page is being generated..."
* message, but in that case it is ok because the config-variables are not yet available and so will not allow to
* send cache headers).
*
* @return array
*/
Expand Down Expand Up @@ -4992,6 +4897,31 @@ public function getRequestedId()
/**
* Acquire a page specific lock
*
*
* The schematics here is:
* - First acquire an access lock. This is using the type of the requested lock as key.
* Since the number of types is rather limited we can use the type as key as it will only
* eat up a limited number of lock resources on the system (files, semaphores)
* - Second, we acquire the actual lock (named page lock). We can be sure we are the only process at this
* very moment, hence we either get the lock for the given key or we get an error as we request a non-blocking mode.
*
* Interleaving two locks is extremely important, because the actual page lock uses a hash value as key (see callers
* of this function). If we would simply employ a normal blocking lock, we would get a potentially unlimited
* (number of pages at least) number of different locks. Depending on the available locking methods on the system
* we might run out of available resources. (e.g. maximum limit of semaphores is a system setting and applies
* to the whole system)
* We therefore must make sure that page locks are destroyed again if they are not used anymore, such that
* we never use more locking resources than parallel requests to different pages (hashes).
* In order to ensure this, we need to guarantee that no other process is waiting on a page lock when
* the process currently having the lock on the page lock is about to release the lock again.
* This can only be achieved by using a non-blocking mode, such that a process is never put into wait state
* by the kernel, but only checks the availability of the lock. The access lock is our guard to be sure
* that no two processes are at the same time releasing/destroying a page lock, whilst the other one tries to
* get a lock for this page lock.
* The only drawback of this implementation is that we basically have to poll the availability of the page lock.
*
* Note that the access lock resources are NEVER deleted/destroyed, otherwise the whole thing would be broken.
*
* @param string $type
* @param string $key
* @throws \InvalidArgumentException
Expand Down
8 changes: 0 additions & 8 deletions typo3/sysext/t3editor/Resources/Private/tsref.xml
Expand Up @@ -539,14 +539,6 @@ This will render dates in danish made with stdWrap/strftime:
locale_all = danish
locale_all = da_DK]]></description>
<default><![CDATA[
]]></default>
</property>
<property name="message_page_is_being_generated" type="string">
<description><![CDATA[Alternative HTML message that appears if a page is being generated.
Normally when a page is being generated a temporary copy is stored in the cache-table with an expire-time of 30 seconds.
It is possible to use some keywords that are replaced with the corresponding values. Possible keywords are: ###TITLE###, ###REQUEST_URI###]]></description>
<default><![CDATA[
]]></default>
</property>
<property name="message_preview" type="string">
Expand Down
Expand Up @@ -569,7 +569,6 @@
'menuName': kw('menuName'),
'menuOffset': kw('menuOffset'),
'menuWidth': kw('menuWidth'),
'message_page_is_being_generated': kw('message_page_is_being_generated'),
'message_preview': kw('message_preview'),
'META': kw('META'),
'meta': kw('meta'),
Expand Down

0 comments on commit b8775ca

Please sign in to comment.