Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic JDBC data source connector #3105

Closed
wants to merge 16 commits into from

Conversation

tooptoop4
Copy link
Contributor

@tooptoop4 tooptoop4 commented Mar 15, 2020

Fixes #2910

Tested successfully on sqlite, sybase ase, oracle, impala, presto, dremio, sparksqlthrift, hiveserver2, postgres, mysql, db2, sqlserver, cockroachdb, derby, h2, hsqldb (ie hypersql), firebird

future enhancements might be:

  1. aggregate pushdown like 193872a#diff-e4b92558819e35928d8d20100000291dda1e15cf907c1d2f06ba4015b0afd6cc
  2. ability to overwrite custom configs. ie oracle fetchsize because default 10 is slow
  3. ability to have a json mapping of source db data type to presto data type (see if Add an optional JDBC connector type mapping to varchar #186 works)

@cla-bot cla-bot bot added the cla-signed label Mar 15, 2020
@dprophet
Copy link
Contributor

This is critical for my use cases. Thanks to the community. Amazing.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the commit logs and add tests.

pom.xml Show resolved Hide resolved
return driverClass;
}

@Config("driver-class")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is unrelated to JDBC base connector. Please move to the generic module.

Suggested change
@Config("driver-class")
@Config("generic-jdbc.driver-class")

presto-server/src/main/provisio/presto.xml Outdated Show resolved Hide resolved
throw new PrestoException(DRIVER_NOT_FOUND, config.getDriverClass() + " not found");
}
try {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the try is redundant.

presto-genericjdbc/pom.xml Outdated Show resolved Hide resolved
presto-genericjdbc/pom.xml Outdated Show resolved Hide resolved
presto-genericjdbc/pom.xml Outdated Show resolved Hide resolved
@RugratsJ
Copy link

@tooptoop4, when I tried to build it, I ran into the following error:

[ERROR] Failed to execute goal io.airlift.maven.plugins:sphinx-maven-plugin:2.1:generate (default) on project presto-docs: Failed to run the report: Sphinx report generation failed -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal io.airlift.maven.plugins:sphinx-maven-plugin:2.1:generate (default) on project presto-docs: Failed to run the report
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
....

Is there any problems with the genericjdbc rst file, or files added into presto-docs folder by genericjdbc connector?

@RugratsJ
Copy link

Found the problem, it's due to the overline and underline in the rst file.

@RugratsJ
Copy link

I tested this generic JDBC in version 331 against Oracle. For basic stuffs it worked without any issues. The performance is extreme slow. For a 150 million records table, the following counts returned in 244 minutes, 10.2k/s.

presto> select o_custkey, count() from jdbcoracle.tpch100g.orders where to_char(o_orderdate, 'mm/yyyy') = '03/1997' group by o_custkey having count() > 4;
o_custkey | _col1
-----------+-------
1809103 | 5
12675628 | 5
12038404 | 5
13299838 | 5
10843207 | 5
(5 rows)

Query 20200325_052117_00005_7gt3u, FINISHED, 5 nodes
Splits: 177 total, 177 done (100.00%)
244:13 [150M rows, 0B] [10.2K rows/s, 0B/s]

However, the Oracle connector based on base JDBC running the same query, returned in 6 minutes 33 seconds, at 382k/s.

presto> select o_custkey, count() from rdsoracle.tpch100g.orders where to_char(o_orderdate, 'mm/yyyy') = '03/1997' group by o_custkey having count() > 4;
o_custkey | _col1
-----------+-------
13299838 | 5
12675628 | 5
10843207 | 5
1809103 | 5
12038404 | 5
(5 rows)

Query 20200325_051232_00003_7gt3u, FINISHED, 5 nodes
Splits: 177 total, 177 done (100.00%)
6:33 [150M rows, 0B] [382K rows/s, 0B/s]

Could you let me know, what is the difference between these two? From my understanding, it's using the same ojdbc8.jar from Oracle, both of the connectors extend base JDBC library. There is no customized getSplits handling in both connectors.

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Mar 25, 2020

@RugratsJ this relates to point 1 in the description "future enhancements might be: ability to overwrite custom configs. ie oracle fetchsize because default 10 is slow" read https://docs.oracle.com/cd/E18283_01/java.112/e16548/resltset.htm "By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor. This is the default Oracle row fetch size value. You can change the number of rows retrieved with each trip to the database cursor by changing the row fetch size value." The Oracle Connector PR explicitly sets fetch size to 1000. Note for both connectors that even if u run count/group by - the entire 150mn rows are being retrieved to presto and the aggregation done within presto

@RugratsJ
Copy link

@tooptoop4, thank you for the quick response. I have the following questions:

  1. How do I change the fetching size to Oracle, in catalog properties file or in Presto session? If I want to change the fetching size to be 10000 rows, can you give an example?
  2. Anyway that I can let Oracle do the query and just return 5 records to Presto?

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Mar 25, 2020

  1. not supported now , maybe in future could be passed in like driver class is passed in. 2 is Allow connectors to participate in query optimization #18 (comment) also not supported yet

@RugratsJ
Copy link

@tooptoop4 , can you give me an example of doing a session properties in a JDBC connector, which can be changed in a session? For example, the fetch size?

@tooptoop4
Copy link
Contributor Author

no example as the connector in its current form does not handle it, its a future idea

@RugratsJ
Copy link

I added the session properties of fetch size and the configurable fetch size, so a default fetch size is given in catalog configuration file plus I can change the default fetch size by session. It helped the speed.

@eskabetxe
Copy link
Member

@RugratsJ have you try oracle connector #1959

@RugratsJ
Copy link

@eskabetxe, It's the same, since generic jdbc used the oracle JDBC library too. So underline is both ojdbc8.jar file. Speed wise, there is no difference from how Presto executing the query.

@RugratsJ
Copy link

I implemented isLimitGuaranteed function based upon the configuration value catalog properties file, supporting LIMIT or TOP, so the limit filter is working correctly in this generic JDBC, otherwise, it's always return false. Please add this function.

@RugratsJ
Copy link

RugratsJ commented May 4, 2020

@findepi, Thanks. I will file a new issue for Netezza. This generic jdbc connector is a valuable/great connector that we could try to use for lot of databases. Since it's generic, it will have lot of things to take care before it's complete and optimized.

@tooptoop4 tooptoop4 removed their assignment May 5, 2020
@dprophet
Copy link
Contributor

dprophet commented Jun 3, 2020

@tooptoop4 Sorry, just getting to testing this today. I built this genericjdbc plugin. I just TAR.GZ the presto-server-rpm/target/classes/presto-server-332-SNAPSHOT/plugin/genericjdbc directory. I assume I can drop this into the vanilla Presto. Please correct me if I am wrong.

Do you have an example of a catalog file for this plugin?

@tooptoop4
Copy link
Contributor Author

@dprophet examples in genericjdbc.rst

@dprophet
Copy link
Contributor

dprophet commented Jun 3, 2020

@tooptoop4 Any possibility of adding

<dependency>
     <groupId>com.google.protobuf</groupId>
     <artifactId>protobuf-java</artifactId>
</dependency>

To your pom?

Our JDBC driver requires it. I hate monkey patching installs.

@tooptoop4
Copy link
Contributor Author

since this is not merged u are patching anyway:)

@electrum
Copy link
Member

electrum commented Jun 4, 2020

For the generic connector, you’ll need to provide the JDBC JAR and any required dependencies.

@tooptoop4 tooptoop4 changed the title Add generic jdbc data source connector Add generic JDBC data source connector Jun 18, 2020
@tooptoop4
Copy link
Contributor Author

4 updated .java files in the root of https://github.com/tooptoop4/presto-1/tree/lightgenjdbc for 348-SNAPSHOT with fixes

it has been successfully tested against:
sqlite, sybase, oracle, impala, presto, dremio, sparksqlthrift, hiveserver2, postgres, mysql, db2, sqlserver, cockroachdb, derby, h2, hsqldb (ie hypersql), firebird

some recommended settings in .properties:
case-insensitive-name-matching=true
unsupported-type-handling=CONVERT_TO_VARCHAR
jdbc-types-mapped-to-varchar=timestamp

@amitds1997
Copy link
Contributor

@tooptoop4 Are you planning to work on this in the near future?

@tooptoop4
Copy link
Contributor Author

@amitds1997 it works, I don't plan to add tests or address review comments

@amitds1997
Copy link
Contributor

Hmm.. so no plans to merge this into master 😄 ? @tooptoop4
Let me know since this will be useful for a larger audience if it's out of the box. If not, I would like to work on this next month and probably get this through unless someone more knowledgeable wants to work on it.

@tooptoop4
Copy link
Contributor Author

@amitds1997 no plans from me, u can take it

@WilliamDidier
Copy link

Are we expecting this connector to be released at some point or not anymore ? @tooptoop4 @amitds1997
It would be of great interest for many people I guess (including myself)

@amitds1997
Copy link
Contributor

I was not able to work on this. So, probably a maintainer can answer this question better whether this is going to be released anytime soon.

@MichaelTiemannOSC
Copy link

Fixes #2910

Tested successfully on sqlite, sybase ase, oracle, impala, presto, dremio, sparksqlthrift, hiveserver2, postgres, mysql, db2, sqlserver, cockroachdb, derby, h2, hsqldb (ie hypersql), firebird

future enhancements might be:

  1. aggregate pushdown like 193872a#diff-e4b92558819e35928d8d20100000291dda1e15cf907c1d2f06ba4015b0afd6cc
  2. ability to overwrite custom configs. ie oracle fetchsize because default 10 is slow
  3. ability to have a json mapping of source db data type to presto data type (see if Add an optional JDBC connector type mapping to varchar #186 works)

You had me at sqlite...

@abhishekkrbaliase
Copy link

@tooptoop4 Can you please let me know if you are still working on it. Having jdbc connector seems to be very helpful

@tooptoop4
Copy link
Contributor Author

@abhishekkrbaliase I'm not planning more work on this

@colebow
Copy link
Member

colebow commented Oct 19, 2022

Closing out this PR as it's been established that it is no longer being worked on. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add generic jdbc datasource connector