Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON based object mapping in JDBC connectors #7841

Merged

Conversation

kokosing
Copy link
Member

@kokosing kokosing commented May 5, 2021

The idea is to allow JDBC connectors too, to configure

case-insensitive-name-matching.config-file=mapping.json
case-insensitive-name-matching.config-file.refresh-period=30s

Where mapping.json would look like:

{
  "schemas": [
    {
      "remote": "CaseSensitiveName",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_1"
    },
    {
      "remote": "cASEsENSITIVEnAME",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_2"
    }],
  "tables": [
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "tablex",
      "mapping": "table_1"
    },
    {
      "remoteSchema": "CaseSensitiveName",
      "remoteTable": "TABLEX",
      "mapping": "table_2"
    }]
}

Having the above, a remote schema CaseSensitiveName would be mapped to case_sensitive_name_mapped_to_case_inseitive_1 schema in Trino. Remote table in CaseSensitiveName schema and named tablex would be named named to case_sensitive_name_mapped_to_case_inseitive_1 schema and table_1 table name in Trino.

Notice that this mapping file is refreshable so no restart is needed when it changes.

@cla-bot cla-bot bot added the cla-signed label May 5, 2021
@losipiuk
Copy link
Member

losipiuk commented May 6, 2021

Are two initial commits relevant for PR?

@Inject
public CacheBasedIdentifierMapping(
@CaseInsensitiveNameMatchingCacheTtl Duration caseInsensitiveNameMatchingCacheTtl,
DefaultIdentifierMapping defaultIdentifierMapping,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to be able to provide different implementation here? Seems like you should use annotated interface here.

Copy link
Member

@losipiuk losipiuk May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it resolved but not answered addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I changed the code a lot so I thought that comment no longer applies. See the above comment (copy pasted below):

No. DefaultIdentifierMapping has specific implementation that I wanted to consciously use here. Hence type is not generic IdentifierMapping.

DefaultIdentifierMapping is part of implementation of CacheBasedIdentifierMapping, that way there are no logical changes here.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed last commit. Seems fine. A couple questions.

@kokosing
Copy link
Member Author

kokosing commented May 6, 2021

Are two initial commits relevant for PR?

Kind of. First commit is a prerequisite, otherwise guice in Phoenix could look very ugly. Second one is just a litter that I found.

@kokosing
Copy link
Member Author

kokosing commented May 6, 2021

It is WIP. I will post an update soon.

@losipiuk
Copy link
Member

losipiuk commented May 6, 2021

It is WIP. I will post an update soon.

Sure :)

A hint - it is nice if WIP PRs are marked as WIP label. Or even better send out as Draft

@kokosing
Copy link
Member Author

kokosing commented May 6, 2021

A hint - it is nice if WIP PRs are marked as WIP

Sure. I was thinking that if I not ask for a review anyone that will be enough. Sorry about that.

@kokosing kokosing marked this pull request as draft May 6, 2021 13:12
@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 1ccb0d2 to bf7e00d Compare May 6, 2021 15:17
@kokosing kokosing marked this pull request as ready for review May 6, 2021 15:18
@kokosing kokosing changed the title Extract identifier mapping from BaseJdbcClient JSON based object mapping in JDBC connectors May 6, 2021
@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch 2 times, most recently from ec325a7 to 8deaaa3 Compare May 7, 2021 07:45
@kokosing
Copy link
Member Author

kokosing commented May 7, 2021

Extracted #7851, so review can be easier.

@losipiuk
Copy link
Member

losipiuk commented May 7, 2021

Extracted #7851, so review can be easier.

that one is good. Please merge and rebase.


connectorFactory.create("test", TestingH2JdbcModule.createProperties(), new TestingConnectorContext());
private static ConnectorFactory getConnectorFactory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used in next commit, when new test is added here.

{
private final Cache<JdbcIdentity, Map<String, String>> remoteSchemaNames;
private final Cache<RemoteTableNameCacheKey, Map<String, String>> remoteTableNames;
private final DefaultIdentifierMapping defaultIdentifierMapping;
Copy link
Member

@losipiuk losipiuk May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name it delegate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. DefaultIdentifierMapping has specific implementation that I wanted to consciously use here. Hence type is not generic IdentifierMapping.

private final Provider<BaseJdbcClient> baseJdbcClient;

@Inject
public CacheBasedIdentifierMapping(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think CachingIdentifierMapping would be a better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachingIdentifierMapping sounds like a layer on top of generic IdentifierMapping that just do the caching of results of mapping. Here this implementation is using caching to provide the mapping, so identifier mapping is based on caching. Notice that next commit provides mapping based on rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachingIdentifierMapping sounds like a layer on top of generic IdentifierMapping that just do the caching of results of map

Well. It actually just does that. It just does not cache strictly for the keys it is asked for. But uses extra listers to make caching more efficient.
Also it seems like it should work fine with any implementation of IdentifierMapping (not only DefaultIdentifierMapping). Probably I am missing something, please add a code comment explaining the situation for less smart people like me :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or probably I am too stubborn. I think it make sense what you are saying.

import static java.util.Objects.requireNonNull;
import static java.util.concurrent.TimeUnit.MILLISECONDS;

public final class CacheBasedIdentifierMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are here any logical changes vs code which was originally in JdbcClient? Something to focus review on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logical changes. No thing is the fall back to DefaultIdentifierMapping but it was there as well but expressed differently.

I changed this class later, so logical changes are in separate commit.


String toRemoteTableName(JdbcIdentity identity, Connection connection, String remoteSchema, String tableName);

String toRemoteColumnName(Connection connection, String columnName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks asymetric. Why this one does not get JdbcIdentity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be not used. Adding this would make code the symetric, but it would be more difficult to call.

Unless you think that a case where one has a table with columns which names differ by casing and we should implement support for them now (or later as a follow up PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever. I guess leave it until we really need that.

@@ -817,7 +790,7 @@ public PreparedStatement getPreparedStatement(Connection connection, String sql)
return connection.prepareStatement(sql);
}

protected ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
public ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename schemaName and tableName to reamoteSchemaName and remoteTableName.
Add comment that this method is called by by IdentifierMapping and it can not call back to it, as this would result in a loop.

Btw. Can we get rid of this cyclic dependency somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about extract 3rd component which could be use by jdbc client and identifier mapping. However since this method is overridden by several vendor specific jdbc client it is not that simple. Identifier mapping introduction is quite invasive. So I would leave extracting such component for later...

@kokosing
Copy link
Member Author

kokosing commented May 7, 2021

CI hit: #7216

@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch 2 times, most recently from 156efba to f1db060 Compare May 9, 2021 07:44
@kokosing
Copy link
Member Author

kokosing commented May 10, 2021

CI hit: #7872

@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from f1db060 to afd5223 Compare May 10, 2021 08:44
@kokosing
Copy link
Member Author

@losipiuk AC

// With case-insensitive-name-matching enabled colliding schema/table names are considered as errors.
// Some tests here create colliding names which can cause any other concurrent test to fail.
@Test(singleThreaded = true)
public class TestSqlServerCaseInsensitiveRuleBaseMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the test be SQLServer specific? Can we share test code as we share feature implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be covered with #7864 we are hitting same with other tests. Extracting base class is not trivial. There is many vendor specific things. Hence I would like to handle it as separate effort.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Some editorials. Feel free to ignore.

@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from afd5223 to 3771832 Compare May 11, 2021 12:14
Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

Thanks.

Some editorials. Feel free to ignore.

Applied them all % one was commented.

// With case-insensitive-name-matching enabled colliding schema/table names are considered as errors.
// Some tests here create colliding names which can cause any other concurrent test to fail.
@Test(singleThreaded = true)
public class TestSqlServerCaseInsensitiveRuleBaseMapping
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be covered with #7864 we are hitting same with other tests. Extracting base class is not trivial. There is many vendor specific things. Hence I would like to handle it as separate effort.

@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 3771832 to 26df5ec Compare May 11, 2021 13:57
@kokosing
Copy link
Member Author

I hit #7872 again, which successfully hide the tests results so I based this PR on top of #7889

Sometimes TestingSqlServer is fails to run:
ALTER DATABASE database_xxx SET READ_COMMITTED_SNAPSHOT ON
saying:
Transaction (Process ID 51) was deadlocked on lock resources with another process and
has been chosen as the deadlock victim. Rerun the transaction.
This way test is using higher level API that should be
a default way to use the base JdbcPlugin.
It extracts the mapping responsibility out of the BaseJdbcClient.
Thanks to that it is possible to make it customizable.
Only fail the query when an ambiguous object is accessed.
The idea is to allow JDBC connectors too, to configure

```
case-insensitive-name-matching.config-file=mapping.json
case-insensitive-name-matching.config-file.refresh-period=30s
```

Where mapping.json would look like:
```
{
  "schemas": [
    {
      "remote": "CaseSensitiveName",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_1"
    },
    {
      "remote": "cASEsENSITIVEnAME",
      "mapping": "case_sensitive_name_mapped_to_case_inseitive_2"
    }],
  "tables": [
    {
      "remote-schema": "CaseSensitiveName",
      "remote-table": "tablex",
      "mapping": "table_1"
    },
    {
      "remote-schema": "CaseSensitiveName",
      "remote-table": "TABLEX",
      "mapping": "table_2"
    }]
}
```

Having the above, a remote schema `CaseSensitiveName` would be mapped to
`case_sensitive_name_mapped_to_case_inseitive_1` schema in Trino.
Remote table in `CaseSensitiveName` schema and named `tablex` would be named named to
`case_sensitive_name_mapped_to_case_inseitive_1` schema and `table_1` table name in Trino.

Notice that this mapping file is refreshable so no restart is needed when it changes.
@kokosing kokosing force-pushed the origin/master/309_jdbc_identifier_mapping branch from 26df5ec to 8323863 Compare May 11, 2021 14:03
@kokosing kokosing merged commit 7136d30 into trinodb:master May 11, 2021
@kokosing kokosing deleted the origin/master/309_jdbc_identifier_mapping branch May 11, 2021 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants