-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protection on namespace cache in case namespace is not created #824
Add protection on namespace cache in case namespace is not created #824
Conversation
return nil | ||
} | ||
|
||
func (c *namespaceCache) checkNamespaceExists( | ||
name string, | ||
id string, | ||
) error { | ||
) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logic of this func should be changed. It is great that it returns (bool, error)
now but bool
should mean exist/not exist and error should indicate the error: persistence or throttling. I would return service.ResourceExhausted
right from here and propagate it up to the caller. Caller should be able to differentiate if it is actual NotFound
or "you are pinging me to fast".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but bool should mean exist/not exist
not really, bool means continue to check the cache & refresh the cache or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and there is not guarantee the check DB see if the namespace exists
logic will be exercised within this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then don't add this bool
at all. Let it return serviceerror.NotFound
or serviceerror.ResourceExhausted
which will be propagated to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nop, that is not the purpose of this function
the purpose of this function is check whether to continue the path, i.e. check & potentially force refresh the cache.
the bool return value means whether to continue (either the domain exists, or somebody has already checked within short period of time and it is really likely cache can contain the target domain, but no guarantee)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the functionality (function in question) was originally introduced to solve the issue of
t = 0, cache got refreshed, to be refreshed again at t = 10
t = 1, new domain got registered
t = 2, caller just want to use the domain (from within cache)
the function in question (along with the following logic) will solve the above case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be the function name should be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, change the name. If existsSomething()
returns bool
it usually means exists/doesn't exist.
Also you return NotFound
at t=2 now, right? I think Unavailable
suites better here (client should wait and try again w/o performing extra actions): https://pkg.go.dev/google.golang.org/grpc/codes (comment to FailedPrecondition Code = 8
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function name is changed
Unavailable means server fault, while in this case (namespace missing in cache) is client fault.
the only guarantee we have with customer is: if you namespace is in the DB, it is guaranteed to be in the cache after 10s + creation time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think clients should not know about cache or any internal details. So if the statement is "namespace creation can take up to 10 seconds" then returning NotFound
within those 10 seconds makes perfect sense.
Co-authored-by: Alex Shtin <alex@shtin.com>
c.checkLock.Lock() | ||
defer c.checkLock.Unlock() | ||
|
||
now = c.timeSource.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need it one more time here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, consider 1000 caller with 10 frontend host with namespace not registered.
on average, there will be ~ 100 concurrent call to this cache, so the line 514 line now := c.timeSource.Now()
maybe called a while back.
common/cache/namespaceCache.go
Outdated
@@ -563,9 +590,13 @@ func (c *namespaceCache) getNamespace( | |||
return c.getNamespaceByID(id, true) | |||
} | |||
|
|||
if err := c.checkNamespaceExists(name, ""); err != nil { | |||
doContinue, err := c.checkAndRefresh(name, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably there is a better name for doContinue
?
What changed?
Add additional frequency check on namespace cache to reduce the impact if namespace is not registered.
Why?
Currently, if a namespace does not exists within cache, the logic will check if the namespace exists in DB and possibly perform a cache refreshment. If the namespace does not exists in DB, and caller keeps on calling server APIs, the DB will encounter huge API call volume.
How did you test it?
Existing tests
Potential risks
N/A