-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow caching stats in JDBC but not metadata #19859
Conversation
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestBaseJdbcConfig.java
Show resolved
Hide resolved
@@ -44,6 +45,7 @@ public class BaseJdbcConfig | |||
private Duration metadataCacheTtl = new Duration(0, SECONDS); | |||
private Optional<Duration> schemaNamesCacheTtl = Optional.empty(); | |||
private Optional<Duration> tableNamesCacheTtl = Optional.empty(); | |||
private Optional<Duration> statisticsCacheTtl = Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enable this by default with some value like 5m ? We enabled it by default in hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backwards compatibility. enabling stats cache may harm some workloads. like staging tables that are sometimes empty and sometimes full of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's going to be better for the vast majority of workloads, then I would still enable it despite the harm to some less common cases. This can be explicitly disabled in the catalogs where it turns out to be harmful.
Alternatively, can we enable it for connectors where staging tables are rarely or not used at all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the usage patterns, so I didn't want to enable it by default, at least just yet.
Do you see a problem with this PR if this is not enabled by default from the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a hard blocker for the PR, but the whole point of this change in hive connector was to allow us to enable stats metadata caching by default rather than just allowing for greater manual tweaking of cache ttls. If we don't have a path way to enabling this by default ever in JDBC connectors, then we're missing out on the main benefit of this change.
Unless you have a different plan to find out the usage patterns, the only way I see of discovering the problems with enabling this by default is to just enable it and see what kind of problems are reported due to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not against enabling it by default and I think we agree that this is not required for this PR. In other words, can be enabled as a follow-up.
@@ -129,7 +131,7 @@ public CachingJdbcClient( | |||
tableHandlesByQueryCache = buildCache(cacheMaximumSize, metadataCachingTtl); | |||
procedureHandlesByQueryCache = buildCache(cacheMaximumSize, metadataCachingTtl); | |||
columnsCache = buildCache(cacheMaximumSize, metadataCachingTtl); | |||
statisticsCache = buildCache(cacheMaximumSize, metadataCachingTtl); | |||
statisticsCache = buildCache(cacheMaximumSize, statisticsCachingTtl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If metadataCachingTtl
is set higher than statisticsCachingTtl
, then that should be used as the value for statisticsCachingTtl
(we did that in hive metadata caching to avoid inadvertently using smaller ttl for stats cache than overall metadata cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want such logic, this should rather be a validation, not silently ignored config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to ever provide some default value for statisticsCachingTtl, then there's a chance that this default could be lower than the existing value of metadataCachingTtl setup by the user. In this case, using smaller value of statisticsCachingTtl would potentially regress perf and applying validation would force user to read docs and update the config values. Applying validation strictly is a correct but disruptive choice, we chose to do the right thing silently in the case of hive.
cc: @sopel39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hive we just use
Duration metastoreCacheTtl = config.getMetastoreCacheTtl();
Duration statsCacheTtl = config.getStatsCacheTtl();
if (metastoreCacheTtl.compareTo(statsCacheTtl) > 0) {
statsCacheTtl = metastoreCacheTtl;
}
it mitigates all bad configs and I think it's OK (less troubles for us)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a09b73d
to
34273d6
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestCachingJdbcClient.java
Outdated
Show resolved
Hide resolved
- static fields before instance fields - private where appropriate - remove special values - fix code indentation - use explicit `Duration` constructor instead of approximating `Duration.succinctDuration`
Inline in usages to simplify CachingJdbcClient construction.
Previously the test used cacheMissing=true by default, despite this being false by default in the config.
Using builder avoids maintaining overloads of various "create some CachingJdbcClient" methods. Using config avoids passing 3 Duration parameters positionally to the constructor. The config is equivalent of named parameters. This also moves helper methods below tests. This is code cleanup and preparatory change before adding yet another Duration parameter.
Many TestCachingJdbcClient's tests construct their own instance, so providing a default instance in the `@Before` creates duplicated state and confusion.
This logically reverts commit b25c90b. The class uses `EvictableCacheBuilder` and the `io.trino.cache.EvictableCacheBuilder#cacheDisabled` considers the cache to be disabled when TTL is 0 or cache size is 0, so no need to set cache size to 0 if TTL is known to be 0.
Apply `CachingHiveMetastore` capabilities in JDBC. Allow turning on statistics caching without having to use caching for all of the metadata. In common case caching statistics has much fewer side effects than caching other stuff.
The test used a masked Thread.sleep to await for cache expiration, and thus lasted at least 6 seconds. Now the whole test class runs about 1 second.
34273d6
to
a212756
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I don't have strong opinions about last commit.
I run |
I think only creates the potential for user facing changes and therefore no release notes are needed. Can you confirm @findepi ? |
Yes, there is a new configuration toggle added. |
Uff .. I overlooked that. I sent #19881 to fix this. Also we should add a release notes entry then .. right @findepi and @raunaqmorarka - any suggestions? |
I will add a release notes entry for all affected connectors.. done now |
Apply
CachingHiveMetastore
capabilities in JDBC. Allow turning onstatistics caching without having to use caching for all of the
metadata. In common case caching statistics has much fewer side effects
than caching other stuff.