Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not cache large blob history event #4621

Merged
merged 7 commits into from
Jul 18, 2023
Merged

Conversation

yujieli-temporal
Copy link
Contributor

@yujieli-temporal yujieli-temporal commented Jul 12, 2023

What changed?

Allow lru cache size check against with CacheSize() method.

Why?
We want to limit the history event cache by BytesSize.

How did you test it?
unittest

Potential risks

Is hotfix candidate?

@yujieli-temporal yujieli-temporal marked this pull request as ready for review July 12, 2023 19:58
@yujieli-temporal yujieli-temporal requested a review from a team as a code owner July 12, 2023 19:58
@wxing1292
Copy link
Contributor

take a look at https://github.com/temporalio/temporal/blob/master/common/cache/lru.go

  1. create a duplicate of ^ and use cache size as cache eviction crateria
  2. modify the ^ implement, so ^ can support both size & count

common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/lru.go Show resolved Hide resolved
common/cache/lru.go Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
service/history/events/cache.go Outdated Show resolved Hide resolved
service/history/events/cache.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
common/cache/size_getter.go Outdated Show resolved Hide resolved
common/cache/lru.go Outdated Show resolved Hide resolved
Copy link
Contributor

@wxing1292 wxing1292 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -290,6 +302,7 @@ func (c *lru) putInternal(key interface{}, value interface{}, allowUpdate bool)
entry := &entryImpl{
key: key,
value: value,
size: getSize(value),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we still do the check of size v.s. cache max size and return error early if item size is too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I move the size validation on the top before entry created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I mean we don't have to evict any entry if new item size is too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for clarify. I don't feel strong here, I will make the change to do the check before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, run the loop twice will make the logic looks duplicate and complex. do you feel strong about check before eviction?

@yujieli-temporal yujieli-temporal enabled auto-merge (squash) July 18, 2023 17:29
common/cache/lru.go Outdated Show resolved Hide resolved
@yujieli-temporal yujieli-temporal merged commit 964a600 into master Jul 18, 2023
10 checks passed
@yujieli-temporal yujieli-temporal deleted the yj/cache_event_limit branch July 18, 2023 18:47
yujieli-temporal added a commit that referenced this pull request Jul 19, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
change history cache key from history.eventsCache*Size to
history.eventsCache*SizeBytes


<!-- Tell your future self why have you made these changes -->
It will remove the key name confusion since cache size definition
changed in: #4621


<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
unittests


<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
dnr pushed a commit that referenced this pull request Jul 21, 2023
dnr pushed a commit that referenced this pull request Jul 21, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
change history cache key from history.eventsCache*Size to
history.eventsCache*SizeBytes


<!-- Tell your future self why have you made these changes -->
It will remove the key name confusion since cache size definition
changed in: #4621


<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
unittests


<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
dnr pushed a commit that referenced this pull request Jul 21, 2023
dnr pushed a commit that referenced this pull request Jul 21, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
change history cache key from history.eventsCache*Size to
history.eventsCache*SizeBytes


<!-- Tell your future self why have you made these changes -->
It will remove the key name confusion since cache size definition
changed in: #4621


<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
unittests


<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
dnr pushed a commit that referenced this pull request Jul 21, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
change history cache key from history.eventsCache*Size to
history.eventsCache*SizeBytes


<!-- Tell your future self why have you made these changes -->
It will remove the key name confusion since cache size definition
changed in: #4621


<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
unittests


<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**


<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
wxing1292 pushed a commit that referenced this pull request Jul 22, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
change history cache key from history.eventsCache*Size to
history.eventsCache*SizeBytes

<!-- Tell your future self why have you made these changes -->
It will remove the key name confusion since cache size definition
changed in: #4621

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
unittests

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
dnr added a commit that referenced this pull request Jul 28, 2023
wxing1292 added a commit that referenced this pull request Jul 29, 2023
yujieli-temporal added a commit that referenced this pull request Aug 8, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
remove the cache initial size from lru cache and related configs. also
refactor putInternal to better support different scenarios.

and added more unittests for cache size.

<!-- Tell your future self why have you made these changes -->
**Why?**
in #4621 I change the history
cache to use item bytes instead item count. However it accidentally
touched default number of key when cache initialized


<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
unittest

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
Memory usage increase and pined mutable state will never leave the cache, which cause block workflow.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants