JSON based object mapping in JDBC connectors #7841

kokosing · 2021-05-05T12:32:13Z

The idea is to allow JDBC connectors too, to configure

case-insensitive-name-matching.config-file=mapping.json
case-insensitive-name-matching.config-file.refresh-period=30s

Where mapping.json would look like:

{
  "schemas": [
    {
      "remote": "CaseSensitiveName",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_1"
    },
    {
      "remote": "cASEsENSITIVEnAME",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_2"
    }],
  "tables": [
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "tablex",
      "mapping": "table_1"
    },
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "TABLEX",
      "mapping": "table_2"
    }]
}

Having the above, a remote schema CaseSensitiveName would be mapped to case_sensitive_name_mapped_to_case_inseitive_1 schema in Trino. Remote table in CaseSensitiveName schema and named tablex would be named named to case_sensitive_name_mapped_to_case_inseitive_1 schema and table_1 table name in Trino.

Notice that this mapping file is refreshable so no restart is needed when it changes.

losipiuk · 2021-05-06T11:14:02Z

Are two initial commits relevant for PR?

...ase-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CaseInsensitiveNameMatchingCacheTtl.java

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java

losipiuk · 2021-05-06T11:20:00Z

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java

+    @Inject
+    public CacheBasedIdentifierMapping(
+            @CaseInsensitiveNameMatchingCacheTtl Duration caseInsensitiveNameMatchingCacheTtl,
+            DefaultIdentifierMapping defaultIdentifierMapping,


don't we want to be able to provide different implementation here? Seems like you should use annotated interface here.

I see it resolved but not answered addressed.

Sorry about that. I changed the code a lot so I thought that comment no longer applies. See the above comment (copy pasted below):

No. DefaultIdentifierMapping has specific implementation that I wanted to consciously use here. Hence type is not generic IdentifierMapping.

DefaultIdentifierMapping is part of implementation of CacheBasedIdentifierMapping, that way there are no logical changes here.

losipiuk

skimmed last commit. Seems fine. A couple questions.

kokosing · 2021-05-06T12:26:03Z

Are two initial commits relevant for PR?

Kind of. First commit is a prerequisite, otherwise guice in Phoenix could look very ugly. Second one is just a litter that I found.

kokosing · 2021-05-06T12:26:40Z

It is WIP. I will post an update soon.

losipiuk · 2021-05-06T13:10:26Z

It is WIP. I will post an update soon.

Sure :)

A hint - it is nice if WIP PRs are marked as WIP label. Or even better send out as Draft

kokosing · 2021-05-06T13:12:10Z

A hint - it is nice if WIP PRs are marked as WIP

Sure. I was thinking that if I not ask for a review anyone that will be enough. Sorry about that.

kokosing · 2021-05-07T07:48:22Z

Extracted #7851, so review can be easier.

losipiuk · 2021-05-07T08:23:55Z

Extracted #7851, so review can be easier.

that one is good. Please merge and rebase.

losipiuk · 2021-05-07T09:17:25Z

plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPlugin.java


-        connectorFactory.create("test", TestingH2JdbcModule.createProperties(), new TestingConnectorContext());
+    private static ConnectorFactory getConnectorFactory()


nit: inline?

It will be used in next commit, when new test is added here.

...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/BaseSqlCaseInsensitiveMappingTest.java

losipiuk · 2021-05-07T10:47:11Z

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java

+{
+    private final Cache<JdbcIdentity, Map<String, String>> remoteSchemaNames;
+    private final Cache<RemoteTableNameCacheKey, Map<String, String>> remoteTableNames;
+    private final DefaultIdentifierMapping defaultIdentifierMapping;


name it delegate?

No. DefaultIdentifierMapping has specific implementation that I wanted to consciously use here. Hence type is not generic IdentifierMapping.

losipiuk · 2021-05-07T10:58:46Z

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java

+    private final Provider<BaseJdbcClient> baseJdbcClient;
+
+    @Inject
+    public CacheBasedIdentifierMapping(


I still think CachingIdentifierMapping would be a better name.

CachingIdentifierMapping sounds like a layer on top of generic IdentifierMapping that just do the caching of results of mapping. Here this implementation is using caching to provide the mapping, so identifier mapping is based on caching. Notice that next commit provides mapping based on rules.

CachingIdentifierMapping sounds like a layer on top of generic IdentifierMapping that just do the caching of results of map

Well. It actually just does that. It just does not cache strictly for the keys it is asked for. But uses extra listers to make caching more efficient.
Also it seems like it should work fine with any implementation of IdentifierMapping (not only DefaultIdentifierMapping). Probably I am missing something, please add a code comment explaining the situation for less smart people like me :)

Or probably I am too stubborn. I think it make sense what you are saying.

losipiuk · 2021-05-07T10:59:30Z

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java

+import static java.util.Objects.requireNonNull;
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+public final class CacheBasedIdentifierMapping


Are here any logical changes vs code which was originally in JdbcClient? Something to focus review on?

No logical changes. No thing is the fall back to DefaultIdentifierMapping but it was there as well but expressed differently.

I changed this class later, so logical changes are in separate commit.

losipiuk · 2021-05-07T11:01:44Z

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMapping.java

+
+    String toRemoteTableName(JdbcIdentity identity, Connection connection, String remoteSchema, String tableName);
+
+    String toRemoteColumnName(Connection connection, String columnName);


This looks asymetric. Why this one does not get JdbcIdentity?

It would be not used. Adding this would make code the symetric, but it would be more difficult to call.

Unless you think that a case where one has a table with columns which names differ by casing and we should implement support for them now (or later as a follow up PR).

Whatever. I guess leave it until we really need that.

losipiuk · 2021-05-07T11:21:29Z

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java

@@ -817,7 +790,7 @@ public PreparedStatement getPreparedStatement(Connection connection, String sql)
        return connection.prepareStatement(sql);
    }

-    protected ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
+    public ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)


rename schemaName and tableName to reamoteSchemaName and remoteTableName.
Add comment that this method is called by by IdentifierMapping and it can not call back to it, as this would result in a loop.

Btw. Can we get rid of this cyclic dependency somehow.

I was thinking about extract 3rd component which could be use by jdbc client and identifier mapping. However since this method is overridden by several vendor specific jdbc client it is not that simple. Identifier mapping introduction is quite invasive. So I would leave extracting such component for later...

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMappingModule.java

plugin/trino-clickhouse/src/main/java/io/trino/plugin/clickhouse/ClickHouseClient.java

kokosing · 2021-05-07T20:01:43Z

CI hit: #7216

kokosing · 2021-05-10T08:42:56Z

CI hit: #7872

kokosing · 2021-05-10T08:44:23Z

@losipiuk AC

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMappingModule.java

...n/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/RuleBasedIdentifierMapping.java

losipiuk · 2021-05-10T12:26:15Z

...ver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerCaseInsensitiveRuleBaseMapping.java

+// With case-insensitive-name-matching enabled colliding schema/table names are considered as errors.
+// Some tests here create colliding names which can cause any other concurrent test to fail.
+@Test(singleThreaded = true)
+public class TestSqlServerCaseInsensitiveRuleBaseMapping


nit: should the test be SQLServer specific? Can we share test code as we share feature implementation?

I think this could be covered with #7864 we are hitting same with other tests. Extracting base class is not trivial. There is many vendor specific things. Hence I would like to handle it as separate effort.

losipiuk

Looks great. Some editorials. Feel free to ignore.

kokosing

Looks great.

Thanks.

Some editorials. Feel free to ignore.

Applied them all % one was commented.

kokosing · 2021-05-11T12:10:37Z

...ver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerCaseInsensitiveRuleBaseMapping.java

+// With case-insensitive-name-matching enabled colliding schema/table names are considered as errors.
+// Some tests here create colliding names which can cause any other concurrent test to fail.
+@Test(singleThreaded = true)
+public class TestSqlServerCaseInsensitiveRuleBaseMapping


I think this could be covered with #7864 we are hitting same with other tests. Extracting base class is not trivial. There is many vendor specific things. Hence I would like to handle it as separate effort.

kokosing · 2021-05-11T13:57:37Z

I hit #7872 again, which successfully hide the tests results so I based this PR on top of #7889

Sometimes TestingSqlServer is fails to run: ALTER DATABASE database_xxx SET READ_COMMITTED_SNAPSHOT ON saying: Transaction (Process ID 51) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This way test is using higher level API that should be a default way to use the base JdbcPlugin.

It extracts the mapping responsibility out of the BaseJdbcClient. Thanks to that it is possible to make it customizable.

Only fail the query when an ambiguous object is accessed.

The idea is to allow JDBC connectors too, to configure ``` case-insensitive-name-matching.config-file=mapping.json case-insensitive-name-matching.config-file.refresh-period=30s ``` Where mapping.json would look like: ``` { "schemas": [ { "remote": "CaseSensitiveName", "mapping": "case_sensitive_name_mapped_to_case_inseitive_1" }, { "remote": "cASEsENSITIVEnAME", "mapping": "case_sensitive_name_mapped_to_case_inseitive_2" }], "tables": [ { "remote-schema": "CaseSensitiveName", "remote-table": "tablex", "mapping": "table_1" }, { "remote-schema": "CaseSensitiveName", "remote-table": "TABLEX", "mapping": "table_2" }] } ``` Having the above, a remote schema `CaseSensitiveName` would be mapped to `case_sensitive_name_mapped_to_case_inseitive_1` schema in Trino. Remote table in `CaseSensitiveName` schema and named `tablex` would be named named to `case_sensitive_name_mapped_to_case_inseitive_1` schema and `table_1` table name in Trino. Notice that this mapping file is refreshable so no restart is needed when it changes.

cla-bot bot added the cla-signed label May 5, 2021

losipiuk reviewed May 6, 2021

View reviewed changes

...ase-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CaseInsensitiveNameMatchingCacheTtl.java Outdated Show resolved Hide resolved

losipiuk reviewed May 6, 2021

View reviewed changes

.../trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/CacheBasedIdentifierMapping.java Outdated Show resolved Hide resolved

losipiuk reviewed May 6, 2021

View reviewed changes

kokosing marked this pull request as draft May 6, 2021 13:12

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 1ccb0d2 to bf7e00d Compare May 6, 2021 15:17

kokosing marked this pull request as ready for review May 6, 2021 15:18

kokosing requested review from Praveen2112, ksobolew and findepi May 6, 2021 15:19

kokosing changed the title ~~Extract identifier mapping from BaseJdbcClient~~ JSON based object mapping in JDBC connectors May 6, 2021

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch 2 times, most recently from ec325a7 to 8deaaa3 Compare May 7, 2021 07:45

losipiuk reviewed May 7, 2021

View reviewed changes

...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/BaseSqlCaseInsensitiveMappingTest.java Outdated Show resolved Hide resolved

losipiuk reviewed May 7, 2021

View reviewed changes

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMappingModule.java Outdated Show resolved Hide resolved

losipiuk reviewed May 7, 2021

View reviewed changes

plugin/trino-clickhouse/src/main/java/io/trino/plugin/clickhouse/ClickHouseClient.java Outdated Show resolved Hide resolved

kokosing mentioned this pull request May 8, 2021

Migrate CaseInsensitiveMapping tests to BaseCaseInsensitiveMappingTest #7864

Closed

3 tasks

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch 2 times, most recently from 156efba to f1db060 Compare May 9, 2021 07:44

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from f1db060 to afd5223 Compare May 10, 2021 08:44

losipiuk reviewed May 10, 2021

View reviewed changes

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMappingModule.java Outdated Show resolved Hide resolved

losipiuk reviewed May 10, 2021

View reviewed changes

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/IdentifierMappingModule.java Outdated Show resolved Hide resolved

losipiuk reviewed May 10, 2021

View reviewed changes

...n/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/mapping/RuleBasedIdentifierMapping.java Outdated Show resolved Hide resolved

losipiuk reviewed May 10, 2021

View reviewed changes

losipiuk approved these changes May 10, 2021

View reviewed changes

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from afd5223 to 3771832 Compare May 11, 2021 12:14

kokosing commented May 11, 2021

View reviewed changes

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 3771832 to 26df5ec Compare May 11, 2021 13:57

kokosing added 6 commits May 11, 2021 16:03

Convert TestJdbcConnectorFactory into TestJdbcPlugin

7d6f7d4

This way test is using higher level API that should be a default way to use the base JdbcPlugin.

Add TestSqlServerCaseInsensitiveMapping test for SQL Server

eaf289b

Extract identifier mapping from BaseJdbcClient

5c06dbb

It extracts the mapping responsibility out of the BaseJdbcClient. Thanks to that it is possible to make it customizable.

Do not fail all queries if there are duplicates

ef7003b

Only fail the query when an ambiguous object is accessed.

kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 26df5ec to 8323863 Compare May 11, 2021 14:03

kokosing merged commit 7136d30 into trinodb:master May 11, 2021

kokosing deleted the origin/master/309_jdbc_identifier_mapping branch May 11, 2021 19:39

This was referenced May 11, 2021

Cache JDBC ConnectionMetadata.storesUpperCaseIdentifiers() result #7721

Closed

Release notes for 357 #7815

Closed

m57lyra mentioned this pull request Jun 22, 2021

Document case-insensitive-name-matching configuration file properties #8354

Closed

meneal mentioned this pull request Jul 23, 2021

Update to trino 359 IBM/trino-db2#77

Merged

jhlodin mentioned this pull request Oct 8, 2021

Document case-sensitive configuration properties #9567

Merged


		connectorFactory.create("test", TestingH2JdbcModule.createProperties(), new TestingConnectorContext());
		private static ConnectorFactory getConnectorFactory()


		String toRemoteTableName(JdbcIdentity identity, Connection connection, String remoteSchema, String tableName);

		String toRemoteColumnName(Connection connection, String columnName);

JSON based object mapping in JDBC connectors #7841

JSON based object mapping in JDBC connectors #7841

Conversation

kokosing commented May 5, 2021 • edited

losipiuk commented May 6, 2021

Choose a reason for hiding this comment

losipiuk May 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

losipiuk left a comment

Choose a reason for hiding this comment

kokosing commented May 6, 2021

kokosing commented May 6, 2021

losipiuk commented May 6, 2021

kokosing commented May 6, 2021

kokosing commented May 7, 2021

losipiuk commented May 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

losipiuk May 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented May 7, 2021

kokosing commented May 10, 2021 • edited

kokosing commented May 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

losipiuk left a comment

Choose a reason for hiding this comment

kokosing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented May 11, 2021

kokosing commented May 5, 2021 •

edited

losipiuk May 7, 2021 •

edited

losipiuk May 7, 2021 •

edited

kokosing commented May 10, 2021 •

edited