-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDB eviction and stale response processing #2074
base: master
Are you sure you want to change the base?
Conversation
5d4eaf5
to
d02eeaa
Compare
d02eeaa
to
53cf2bc
Compare
53cf2bc
to
d9d48f0
Compare
d9d48f0
to
5bd7df8
Compare
00dffd7
to
67b01fb
Compare
cfg = kzalloc(sizeof(TfwCacheUseStale), GFP_KERNEL); | ||
if (!cfg) | ||
return -ENOMEM; | ||
loc->cache_use_stale = cfg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we have memory leak here. For such simple config we allocate cfg two times and rewrite already set cache_use_stale
.
listen 443 proto=h2;
srv_group default {
server 127.0.0.1:8000;
}
vhost default {
proxy_pass default;
tls_certificate /home/evgeny/workdir/certs/tempesta-tech.test.crt;
tls_certificate_key /home/evgeny/workdir/certs/tempesta-tech.test.key;
tls_match_any_server_name;
cache_use_stale 200;
}
cache 2;
cache_fulfill * *;
cache_use_stale 500;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover I am not sure about this implementation, where we add cfg only for location. In previos config we rewrite cache_use_stale 200 with cache_use_stale 500. For all other options we don't do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overriding only for default vhost, I don't see a reason to fix it, because default vhost will be removed soon. Now I just I forbade this directive for default vhost.
About inheritance: it's has same logic with req_hdr_set
and we also don't have inheritance for these directives. I think we should fix it together in another task, we have such task for inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error is still present.
vhost xxx {
proxy_pass default;
tls_certificate /home/evgeny/workdir/certs/tempesta-tech.test.crt;
tls_certificate_key /home/evgeny/workdir/certs/tempesta-tech.test.key;
cache_use_stale 400;
cache_use_stale 500;
}
And we have a memory leak. Because we don't check dublicate values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For http_resp_code_block
I catch duplicate entry: 'http_resp_code_block', for such config. Moreover
if (cfg->http_resp_code_block) {
if (ce->dflt_value)
return 0;
kfree(cfg->http_resp_code_block);
cfg->http_resp_code_block = NULL;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need also check how we handle this directive when it is present in vhost and location inside vhost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we should set .allow_repeat = false, to this option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
loc->cache_use_stale = cfg; | ||
|
||
for (i = 0; i < ce->val_n; i++) { | ||
if (!mask4x && !strcasecmp(ce->vals[i], "4*")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we should check duplicate values in config and return error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have TFW_CFG_CHECK_VAL_DUP on master branch, please use it to avoid dublicates
HTTP2_ECODE_PROTO); | ||
|
||
if (tfw_http_resp_should_fwd_stale(bad_req, 502)) { | ||
__tfw_http_resp_fwd_stale(hmresp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set r == 0 here as I think. Otherwise we drop the connection with this backend. or it is expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it's error case I think we need to drop the connection. However without effects for client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krizhanovsky pay attention here. As I think if we fail to send stail response we should got to else
directive and call tfw_http_req_block
or tfw_http_req_drop
. Also I am not sure that we can send stale response if we blocked by frang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to drop upstream connection if we receive a broken response, basically this means that something on the upstream went wrong and this is the reason why Nginx and Apache by default terminate connections every 100 requests
Also I agree that we should not send stale response on a security event, e.g. if r == T_BLOCK
at least.
As I see we don't have return code for __tfw_http_resp_fwd_stale()
. @EvgeniiMekhanik do you propose to make the function (and all called functions) to return code? I didn't get your proposal
As I think if we fail to send stail response we should got to else directive and call tfw_http_req_block or tfw_http_req_drop
|
||
if (resp && stale) { | ||
req->resp = NULL; | ||
req->stale_resp = resp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this commit (Add ability to respond with a stale response ). What purpose to send request to server and after receiving response send stale response?? It is very strange for me! If we receive new fresh response we should send it. If this directive is set we should send stale response to client without accessing backend. We should check status of response saved in cache and if it is in cache_use_stale
send it without accessing backend. For error cases we can send stale successful responses after receiving error from backend.
For example if I add
param(name="200", status=200, use_stale="200", r2_hdrs=[]),
to TestCacheUseStale in your branch. I receive stale 200 response after receiving successful response from server!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are wrong. You got stale response because you specified cache_use_stale 200
in the config. It's desired behavior, however, maybe we should deny to use status codes except 4x and 5x.
This feature need in case when you for instance got 500 status code, but you have stale response and you reply with stale resp if it configured.
Error will be if you specify cache_use_stale 500
, but you received 200 stale response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion we decide that administrator can't set 2** or 3** in cache_use_stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the logic seems aligned with the proposed implementation in #522. I'm OK with allowing only 4xx and 5xx codes
Resp_I_CC_StaleIfErrorV); | ||
|
||
__FSM_STATE(Resp_I_CC_StaleIfErrorV) { | ||
__fsm_sz = __data_remain(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we don like for previos directives?
if (unlikely(resp->cache_ctl.flags & TFW_HTTP_CC_STALE_IF_ERROR))
__FSM_EXIT(T_DROP);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? We can have repeating headers with different values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I check that without this check we can set cache_control: stale-if-error=20, stale-if-error=25. Is it ok?
For other cache-control directives it is prohibited.
|
||
b_tmp = bckt; | ||
bckt = TDB_HTRIE_BUCKET_NEXT(dbh, bckt); | ||
write_lock_bh(&bckt->lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand this function we should lock prev backet and current backet. We shoudl have not more then two locks.
But here we try to catch third lock. And why we don't unlock b_tmp?
As I think we should do
write_unlock_bh(&b_tmp->lock);
write_lock_bh(&bckt->lock);
Please explain for me logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do it in this sequence because we can't free the bucket than now hold the lock and just relies on the lock from previous bucket. However, maybe it's overengineering, for simplicity we can lock/unlock only head bucket, to lock whole list.
To remove an entry from tdb we must be able to free allocated blocks. For this purposes block allocation logic was reworked, now every block has its own refcounter, when it equal to zero, block will be moved to percpu freelist of blocks and will be reused on next block allocation on current cpu. Seems we cam use only atomic inc/dec for refcounter, because blocks allocates percpu and race between get/put is not possible. Refcounter is not parf of the block, all refcounters stores in the last block of current extent. It's bad solution, but I guees it's enough for current fix. Removing applies only for buckets in collision chain, the first bucket in chain can't be removed. Essentially entry removing is just freeing blocks in collision chain, index node never be removed.
This directive has following syntax: cache_use_stale 1*(status_code | status_code_wildcard) Examples: `cache_use_stale 404 502 500;` - Respond with stale response in case of 404 502 500 statuses; `cache_use_stale 5*;` - Respond with stale response in case of any 5xx status code;
Depending on response status code specified using `cache_use_stale` directive Tempesta can respond from cache using stale response. Here we have two pathes: - Tempesta received a response without errors. It checks the status code, and if the status code matches the code specified in cache_use_stale, it responds with the stale response from the cache instead of the received response. - Tempesta can't send the response due to scheduling error and tries to send error response to the client, at this stage error response status code from Tempesta's prepared response will be matched with status code specified in `cache_use_stale` directive and if the status code matches, stale response from the cache will be sent instead of prepared error response.
Some htrie entries can't be allocated in one "shot", only "head" of this record adds to htrie and after part by part record will be copied to htrie completely. However, after adding the "head", entry already accessible, therefore might be read and removed, that leads to races. To fix this we mark these entries as incomplete and ignore them while removing or reading. Also changed behaviour of tdb_htrie_lookup(), now it locks the bucket. Otherwise potentially we will have race when check completness and while removing the record returned by tdb_htrie_lookup(). For instance: In tdb_rec_get() record might be removed before it will reach tdb_htrie_bscan_for_rec().
Simple remove expired clients with same key, when obtain new client. Client removing is not fully copleted by this commit, it's only temporary fix. Now we can't remove client just on disconnection time, because we need to store frang client info for a while even after disconnection, this info will be used for the connection rate calculation when client tries to reconnect.
This logic might be used and should be revisioned only after implementation #934 issue.
For client identification we must calculate hash only for address excluding client's port, because on the next connection port will be different.
Implemented logic of using stale response in case when upstream server returns some type of error(500, 502 503, 504) or when such error code was generated on Tempesta side. Defined by RFC 5861 Fix warning header for stale response The response is stale when age is greater then lifetime. We must compare cache entry lifetime with age instead of total calculated lifetime of entry.
Added two counters to cache statists: `Cache objects`- the number of cached objects. `Cache bytes` - number of bytes at the moment stored in the cache. `Cache bytes` calculation: Length of key + length of status + length of reason phrase + length of headers + length of body + length of cache entry descriptor (part of TfwCacheEntry). Size of `TdbRec` is not included. Total length of the record added to debug print as well
The new logic was introduced, this patch changes removing logic to follow new requirments. Now we not remove records with different method on the cleaning stage wich invokes before adding the new record to the cache. The main reason is HEAD and GET with the same key may have two different records. One with body for the GET and second without for the HEAD.
-Fix style and comments -Forbid use `cache_use_stale` to simplify inheritance. -Rewrite validation in `tfw_cfgop_cache_use_stale()` -Add `cache_use_stale` to config file. -Forbid status code below that 400 for `cache_use_stale`
4ae592e
to
9c5de88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do better in terms of performance and memory efficiency
{ | ||
TdbExt *e = tdb_ext(dbh, TDB_PTR(dbh, ptr)); | ||
atomic_t *reftbl = tdb_get_reftbl(dbh, e); | ||
unsigned int blkoff = TDB_BLK_ID(ptr & TDB_BLK_MASK) >> PAGE_SHIFT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/ktest/linux/slab.h b/ktest/linux/slab.h
index cc421400..8d8ac230 100644
--- a/ktest/linux/slab.h
+++ b/ktest/linux/slab.h
@@ -35,6 +35,7 @@
/* asm/page.h */
#define PAGE_SIZE 4096UL
#define PAGE_MASK (~(PAGE_SIZE-1))
+#define PAGE_SHIFT 12
/* arch/x86/include/asm/page_64_types.h */
#define THREAD_SIZE (PAGE_SIZE << 2)
or the unit test compilation fails with
~/tempesta/db/t# make
cc -O2 -msse4.2 -ggdb -Wall -Werror -fno-strict-aliasing -lpthread -DL1_CACHE_BYTES=64 -I../../ktest -I../.. -Wno-address-of-packed-member -c -o tdb_htrie.o tdb_htrie.c
In file included from tdb_htrie.c:39:
../core/htrie.c: In function ‘tdb_get_blk’:
../core/htrie.c:131:65: error: ‘PAGE_SHIFT’ undeclared (first use in this function); did you mean ‘PAGE_SIZE’?
131 | unsigned int blkoff = TDB_BLK_ID(ptr & TDB_BLK_MASK) >> PAGE_SHIFT;
| ^~~~~~~~~~
| PAGE_SIZE
../core/htrie.c:131:65: note: each undeclared identifier is reported only once for each function it appears in
../core/htrie.c: In function ‘tdb_put_blk’:
../core/htrie.c:143:65: error: ‘PAGE_SHIFT’ undeclared (first use in this function); did you mean ‘PAGE_SIZE’?
143 | unsigned int blkoff = TDB_BLK_ID(ptr & TDB_BLK_MASK) >> PAGE_SHIFT;
| ^~~~~~~~~~
| PAGE_SIZE
../core/htrie.c: In function ‘tdb_alloc_index’:
../core/htrie.c:455:33: error: variable ‘old_rptr’ set but not used [-Werror=unused-but-set-variable]
455 | unsigned long rptr = 0, old_rptr;
| ^~~~~~~~
cc1: all warnings being treated as errors
make: *** [<builtin>: tdb_htrie.o] Error 1
@@ -356,10 +462,17 @@ tdb_alloc_index(TdbHdr *dbh) | |||
|| TDB_BLK_O(rptr + sizeof(TdbHtrieNode) - 1) | |||
> TDB_BLK_O(rptr))) | |||
{ | |||
/* Use a new page and/or extent for local CPU. */ | |||
rptr = tdb_alloc_blk(dbh); | |||
old_rptr = ((rptr - 1) & TDB_BLK_MASK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never free index blocks, so old_rptr
isn't used
if (!r) | ||
TDB_ERR("Cannot allocate cache entry for key=%#lx\n", key); | ||
|
||
return r; | ||
} | ||
EXPORT_SYMBOL(tdb_entry_alloc); | ||
|
||
void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a comment what the function is required for, just an API doc
@@ -37,11 +37,17 @@ MODULE_LICENSE("GPL"); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's advance the TDB version to 0.2.0 :)
b->flags |= TDB_HTRIE_COMPLETE_BIT; | ||
write_unlock_bh(&b->lock); | ||
} | ||
EXPORT_SYMBOL(tdb_entry_mark_complete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is used in t/tdb_htrie.c, so it seems HTrie isn't usable w/o the function and shouldn't we move it to htrie.c?
r = 0; | ||
} else { | ||
has_alive = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a poor design to write polimorphic code for fixed and variable sized records through macros and in the new implementation I got rid of the approach. You can get the type from dbh
handler (TDB table can handle only one type of records). You can write a function iterating over the all the records using the records type from dbh
. Both the callbacks can be passed to the function. The function can return has_alive
.
tfw_cache_entry_key_eq(db, req, ce)) | ||
{ | ||
if (!ce_best || tfw_cache_entry_age(ce) | ||
< tfw_cache_entry_age(ce_best)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is wrong as well as the statement "Cache may store one or more responses to the effective Request URI." in the comment above. What for to store so many entries? We recently discussed this with @EvgeniiMekhanik .
We can #ifdef 0
the code with TODO #508
, but the point is that at the moment we are able to use and do use only one cache entry for a key and the rest of entries are just a garbage and extra iterations. We need to store only one cache entry at any given point of time. If the entry is stale, then we remove it on insertion a new response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is URI, so for one uri we can have HEAD and GET records which are both valid
* Remove each entry with current key, even fresh entries. We assume | ||
* that if we at current place, no valid entries are exists. | ||
*/ | ||
tdb_entry_remove(db, key, &tfw_cache_rec_eq_req, resp->req, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as for tfw_client_obtain()
: double index traversal, maybe in vain, and bucket locking
@@ -602,6 +718,11 @@ tdb_htrie_create_rec(TdbHdr *dbh, unsigned long off, unsigned long key, | |||
if (data) | |||
memcpy_fast(ptr, data, len); | |||
|
|||
bckt = (TdbBucket *)((unsigned long)r & TDB_HTRIE_DMASK); | |||
bckt->flags |= (TDB_HTRIE_COMPLETE_BIT * complete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for variable length records only, right? This should be under if (TDB_HTRIE_VARLENRECS(dbh))
.
Why this is for the whole bucket? If we have fully resolved key, we may have collisions in the bucket, so even for variable length records we can have many TdbVRec
s in a bucket. I'd expect that the bit should be set for TdbVRec
.
TdbHtrieNode *root = TDB_HTRIE_ROOT(dbh); | ||
|
||
o = tdb_htrie_descend(dbh, &root, key, &bits); | ||
if (!o) | ||
return NULL; | ||
b = TDB_PTR(dbh, o); | ||
read_lock_bh(&(b)->lock); | ||
if (!tdb_bucket_is_complete(b)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to implement #500 right now, but there is a comment #500 (comment) , which is aligned with incomplete
buckets (why not records?), so it'd be good to write a good TODO #500
comment and align the current implementation with the upcoming task.
Wiki branch related to this PR is
kt-use-stale-desc