Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow caching stats in JDBC but not metadata #19859

Merged
merged 8 commits into from
Nov 23, 2023
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -29,24 +29,25 @@

import static com.google.common.base.Strings.nullToEmpty;
import static jakarta.validation.constraints.Pattern.Flag.CASE_INSENSITIVE;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import static java.util.concurrent.TimeUnit.SECONDS;

public class BaseJdbcConfig
{
public static final String METADATA_CACHE_TTL = "metadata.cache-ttl";
public static final String METADATA_SCHEMAS_CACHE_TTL = "metadata.schemas.cache-ttl";
public static final String METADATA_TABLES_CACHE_TTL = "metadata.tables.cache-ttl";
public static final String METADATA_CACHE_MAXIMUM_SIZE = "metadata.cache-maximum-size";
private static final String METADATA_CACHE_TTL = "metadata.cache-ttl";
private static final String METADATA_SCHEMAS_CACHE_TTL = "metadata.schemas.cache-ttl";
private static final String METADATA_TABLES_CACHE_TTL = "metadata.tables.cache-ttl";
private static final String METADATA_STATISTICS_CACHE_TTL = "metadata.statistics.cache-ttl";
private static final String METADATA_CACHE_MAXIMUM_SIZE = "metadata.cache-maximum-size";
private static final long DEFAULT_METADATA_CACHE_SIZE = 10000;

private String connectionUrl;
private Set<String> jdbcTypesMappedToVarchar = ImmutableSet.of();
public static final Duration CACHING_DISABLED = new Duration(0, MILLISECONDS);
private Duration metadataCacheTtl = CACHING_DISABLED;
private Duration metadataCacheTtl = new Duration(0, SECONDS);
private Optional<Duration> schemaNamesCacheTtl = Optional.empty();
private Optional<Duration> tableNamesCacheTtl = Optional.empty();
private Optional<Duration> statisticsCacheTtl = Optional.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enable this by default with some value like 5m ? We enabled it by default in hive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backwards compatibility. enabling stats cache may harm some workloads. like staging tables that are sometimes empty and sometimes full of data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's going to be better for the vast majority of workloads, then I would still enable it despite the harm to some less common cases. This can be explicitly disabled in the catalogs where it turns out to be harmful.
Alternatively, can we enable it for connectors where staging tables are rarely or not used at all ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the usage patterns, so I didn't want to enable it by default, at least just yet.
Do you see a problem with this PR if this is not enabled by default from the start?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a hard blocker for the PR, but the whole point of this change in hive connector was to allow us to enable stats metadata caching by default rather than just allowing for greater manual tweaking of cache ttls. If we don't have a path way to enabling this by default ever in JDBC connectors, then we're missing out on the main benefit of this change.
Unless you have a different plan to find out the usage patterns, the only way I see of discovering the problems with enabling this by default is to just enable it and see what kind of problems are reported due to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against enabling it by default and I think we agree that this is not required for this PR. In other words, can be enabled as a follow-up.

private boolean cacheMissing;
public static final long DEFAULT_METADATA_CACHE_SIZE = 10000;
private long cacheMaximumSize = DEFAULT_METADATA_CACHE_SIZE;
private Optional<Long> cacheMaximumSize = Optional.empty();

@NotNull
// Some drivers match case insensitive in Driver.acceptURL
Expand Down Expand Up @@ -118,6 +119,20 @@ public BaseJdbcConfig setTableNamesCacheTtl(Duration tableNamesCacheTtl)
return this;
}

@NotNull
public Duration getStatisticsCacheTtl()
{
return statisticsCacheTtl.orElse(metadataCacheTtl);
}

@Config(METADATA_STATISTICS_CACHE_TTL)
@ConfigDescription("Determines how long table statistics information will be cached")
public BaseJdbcConfig setStatisticsCacheTtl(Duration statisticsCacheTtl)
{
this.statisticsCacheTtl = Optional.ofNullable(statisticsCacheTtl);
return this;
}

public boolean isCacheMissing()
{
return cacheMissing;
Expand All @@ -134,32 +149,34 @@ public BaseJdbcConfig setCacheMissing(boolean cacheMissing)
@Min(1)
public long getCacheMaximumSize()
{
return cacheMaximumSize;
return cacheMaximumSize.orElse(DEFAULT_METADATA_CACHE_SIZE);
}

@Config(METADATA_CACHE_MAXIMUM_SIZE)
@ConfigDescription("Maximum number of objects stored in the metadata cache")
public BaseJdbcConfig setCacheMaximumSize(long cacheMaximumSize)
{
this.cacheMaximumSize = cacheMaximumSize;
this.cacheMaximumSize = Optional.of(cacheMaximumSize);
return this;
}

@AssertTrue(message = METADATA_CACHE_TTL + " must be set to a non-zero value when " + METADATA_CACHE_MAXIMUM_SIZE + " is set")
@AssertTrue(message = METADATA_CACHE_TTL + " or " + METADATA_STATISTICS_CACHE_TTL + " must be set to a non-zero value when " + METADATA_CACHE_MAXIMUM_SIZE + " is set")
public boolean isCacheMaximumSizeConsistent()
{
return !metadataCacheTtl.equals(CACHING_DISABLED) || cacheMaximumSize == BaseJdbcConfig.DEFAULT_METADATA_CACHE_SIZE;
return !metadataCacheTtl.isZero() ||
(statisticsCacheTtl.isPresent() && !statisticsCacheTtl.get().isZero()) ||
cacheMaximumSize.isEmpty();
}

@AssertTrue(message = METADATA_SCHEMAS_CACHE_TTL + " must not be set when " + METADATA_CACHE_TTL + " is not set")
public boolean isSchemaNamesCacheTtlConsistent()
{
return !metadataCacheTtl.equals(CACHING_DISABLED) || schemaNamesCacheTtl.isEmpty();
return !metadataCacheTtl.isZero() || schemaNamesCacheTtl.isEmpty();
}

@AssertTrue(message = METADATA_TABLES_CACHE_TTL + " must not be set when " + METADATA_CACHE_TTL + " is not set")
public boolean isTableNamesCacheTtlConsistent()
{
return !metadataCacheTtl.equals(CACHING_DISABLED) || tableNamesCacheTtl.isEmpty();
return !metadataCacheTtl.isZero() || tableNamesCacheTtl.isEmpty();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
package io.trino.plugin.jdbc;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Ticker;
import com.google.common.cache.Cache;
import com.google.common.cache.CacheStats;
import com.google.common.collect.ImmutableMap;
Expand Down Expand Up @@ -66,7 +67,6 @@
import static com.google.common.collect.ImmutableList.toImmutableList;
import static com.google.common.collect.ImmutableMap.toImmutableMap;
import static io.trino.cache.CacheUtils.invalidateAllIf;
import static io.trino.plugin.jdbc.BaseJdbcConfig.CACHING_DISABLED;
import static java.util.Objects.requireNonNull;
import static java.util.concurrent.TimeUnit.MILLISECONDS;

Expand Down Expand Up @@ -97,41 +97,27 @@ public CachingJdbcClient(
BaseJdbcConfig config)
{
this(
Ticker.systemTicker(),
delegate,
sessionPropertiesProviders,
identityMapping,
config.getMetadataCacheTtl(),
config.getSchemaNamesCacheTtl(),
config.getTableNamesCacheTtl(),
config.getStatisticsCacheTtl(),
config.isCacheMissing(),
config.getCacheMaximumSize());
}

public CachingJdbcClient(
JdbcClient delegate,
Set<SessionPropertiesProvider> sessionPropertiesProviders,
IdentityCacheMapping identityMapping,
Duration metadataCachingTtl,
boolean cacheMissing,
long cacheMaximumSize)
{
this(delegate,
sessionPropertiesProviders,
identityMapping,
metadataCachingTtl,
metadataCachingTtl,
metadataCachingTtl,
cacheMissing,
cacheMaximumSize);
}

public CachingJdbcClient(
Ticker ticker,
JdbcClient delegate,
Set<SessionPropertiesProvider> sessionPropertiesProviders,
IdentityCacheMapping identityMapping,
Duration metadataCachingTtl,
Duration schemaNamesCachingTtl,
Duration tableNamesCachingTtl,
Duration statisticsCachingTtl,
boolean cacheMissing,
long cacheMaximumSize)
{
Expand All @@ -142,23 +128,19 @@ public CachingJdbcClient(
this.cacheMissing = cacheMissing;
this.identityMapping = requireNonNull(identityMapping, "identityMapping is null");

long cacheSize = metadataCachingTtl.equals(CACHING_DISABLED)
// Disables the cache entirely
? 0
: cacheMaximumSize;

schemaNamesCache = buildCache(cacheSize, schemaNamesCachingTtl);
tableNamesCache = buildCache(cacheSize, tableNamesCachingTtl);
tableHandlesByNameCache = buildCache(cacheSize, metadataCachingTtl);
tableHandlesByQueryCache = buildCache(cacheSize, metadataCachingTtl);
procedureHandlesByQueryCache = buildCache(cacheSize, metadataCachingTtl);
columnsCache = buildCache(cacheSize, metadataCachingTtl);
statisticsCache = buildCache(cacheSize, metadataCachingTtl);
schemaNamesCache = buildCache(ticker, cacheMaximumSize, schemaNamesCachingTtl);
tableNamesCache = buildCache(ticker, cacheMaximumSize, tableNamesCachingTtl);
tableHandlesByNameCache = buildCache(ticker, cacheMaximumSize, metadataCachingTtl);
tableHandlesByQueryCache = buildCache(ticker, cacheMaximumSize, metadataCachingTtl);
procedureHandlesByQueryCache = buildCache(ticker, cacheMaximumSize, metadataCachingTtl);
columnsCache = buildCache(ticker, cacheMaximumSize, metadataCachingTtl);
statisticsCache = buildCache(ticker, cacheMaximumSize, statisticsCachingTtl);
}

private static <K, V> Cache<K, V> buildCache(long cacheSize, Duration cachingTtl)
private static <K, V> Cache<K, V> buildCache(Ticker ticker, long cacheSize, Duration cachingTtl)
{
return EvictableCacheBuilder.newBuilder()
.ticker(ticker)
.maximumSize(cacheSize)
.expireAfterWrite(cachingTtl.toMillis(), MILLISECONDS)
.shareNothingWhenDisabled()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
*/
package io.trino.plugin.jdbc;

import com.google.common.base.Ticker;
import com.google.common.collect.ImmutableSet;
import com.google.inject.Inject;
import io.airlift.units.Duration;
Expand Down Expand Up @@ -41,12 +42,16 @@ public JdbcMetadata create(JdbcTransactionHandle transaction)
// Session stays the same per transaction, therefore session properties don't need to
// be a part of cache keys in CachingJdbcClient.
return create(new CachingJdbcClient(
jdbcClient,
Set.of(),
new SingletonIdentityCacheMapping(),
new Duration(1, DAYS),
true,
Integer.MAX_VALUE));
Ticker.systemTicker(),
jdbcClient,
Set.of(),
new SingletonIdentityCacheMapping(),
new Duration(1, DAYS),
new Duration(1, DAYS),
new Duration(1, DAYS),
new Duration(1, DAYS),
true,
Integer.MAX_VALUE));
}

protected JdbcMetadata create(JdbcClient transactionCachingJdbcClient)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,13 @@
import static io.airlift.configuration.testing.ConfigAssertions.recordDefaults;
import static io.airlift.testing.ValidationAssertions.assertFailsValidation;
import static io.airlift.testing.ValidationAssertions.assertValidates;
import static java.util.concurrent.TimeUnit.MINUTES;
import static io.airlift.units.Duration.ZERO;
import static java.util.concurrent.TimeUnit.SECONDS;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

public class TestBaseJdbcConfig
{
private static final Duration ZERO = Duration.succinctDuration(0, MINUTES);

@Test
public void testDefaults()
{
Expand All @@ -44,6 +42,7 @@ public void testDefaults()
.setMetadataCacheTtl(ZERO)
.setSchemaNamesCacheTtl(null)
.setTableNamesCacheTtl(null)
.setStatisticsCacheTtl(null)
.setCacheMissing(false)
.setCacheMaximumSize(10000));
}
Expand All @@ -57,6 +56,7 @@ public void testExplicitPropertyMappings()
.put("metadata.cache-ttl", "1s")
.put("metadata.schemas.cache-ttl", "2s")
.put("metadata.tables.cache-ttl", "3s")
.put("metadata.statistics.cache-ttl", "7s")
.put("metadata.cache-missing", "true")
.put("metadata.cache-maximum-size", "5000")
.buildOrThrow();
Expand All @@ -67,6 +67,7 @@ public void testExplicitPropertyMappings()
.setMetadataCacheTtl(new Duration(1, SECONDS))
.setSchemaNamesCacheTtl(new Duration(2, SECONDS))
.setTableNamesCacheTtl(new Duration(3, SECONDS))
.setStatisticsCacheTtl(new Duration(7, SECONDS))
.setCacheMissing(true)
.setCacheMaximumSize(5000);

Expand Down Expand Up @@ -96,24 +97,32 @@ public void testCacheConfigValidation()
.setTableNamesCacheTtl(new Duration(3, SECONDS))
.setCacheMaximumSize(5000));

assertValidates(new BaseJdbcConfig()
.setConnectionUrl("jdbc:h2:mem:config")
.setStatisticsCacheTtl(new Duration(7, SECONDS))
.setCacheMaximumSize(5000));

assertValidates(new BaseJdbcConfig()
.setConnectionUrl("jdbc:h2:mem:config")
.setMetadataCacheTtl(new Duration(1, SECONDS)));

assertFailsValidation(new BaseJdbcConfig()
.setCacheMaximumSize(5000),
assertFailsValidation(
new BaseJdbcConfig()
.setCacheMaximumSize(5000),
findepi marked this conversation as resolved.
Show resolved Hide resolved
"cacheMaximumSizeConsistent",
"metadata.cache-ttl must be set to a non-zero value when metadata.cache-maximum-size is set",
"metadata.cache-ttl or metadata.statistics.cache-ttl must be set to a non-zero value when metadata.cache-maximum-size is set",
AssertTrue.class);

assertFailsValidation(new BaseJdbcConfig()
.setSchemaNamesCacheTtl(new Duration(1, SECONDS)),
assertFailsValidation(
new BaseJdbcConfig()
.setSchemaNamesCacheTtl(new Duration(1, SECONDS)),
"schemaNamesCacheTtlConsistent",
"metadata.schemas.cache-ttl must not be set when metadata.cache-ttl is not set",
AssertTrue.class);

assertFailsValidation(new BaseJdbcConfig()
.setTableNamesCacheTtl(new Duration(1, SECONDS)),
assertFailsValidation(
new BaseJdbcConfig()
.setTableNamesCacheTtl(new Duration(1, SECONDS)),
"tableNamesCacheTtlConsistent",
"metadata.tables.cache-ttl must not be set when metadata.cache-ttl is not set",
AssertTrue.class);
Expand Down