-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neo4j connector to support cypher queries using Trino table functions #15587
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add docs and more tests?
- Documentation in
docs
- Test extends BaseJdbcConnectorTest
- Product test https://github.com/trinodb/trino/tree/master/testing/trino-product-tests
@ebyhr, thanks for your comment.
If the original test sql was -
The neo4j test would be -
So, any references to simple table names in the original test query, would be replaced by a pass-through table query that returns the same data
Thoughts? |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@ebyhr can you give this a look? |
|
||
import io.trino.plugin.jdbc.BaseJdbcConfig; | ||
|
||
public class Neo4jJdbcConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case can we use BaseJdbcConfig
directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we remove this class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
return query.transformQuery(sql -> { | ||
if (sql.toLowerCase(Locale.getDefault()).startsWith("select")) { | ||
int firstIndex = sql.toLowerCase(Locale.getDefault()).indexOf("from ("); | ||
int lastIndex = sql.lastIndexOf(")"); | ||
sql = sql.substring(firstIndex + 6, lastIndex); | ||
} | ||
log.debug("Neo4j Cypher sql query - " + sql); | ||
return sql; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead can we capture them as a Neo4jQueryBuilder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is probably a better approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created Neo4jQueryBuilder
private static final Logger log = Logger.get(Neo4jClient.class); | ||
private static final int MAX_RESULT_SET_INFO_CACHE_ENTRIES = 10000; | ||
private final Type jsonType; | ||
private final Cache<PreparedQuery, Neo4jResultSetInfo> cachedResultSetInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this cache ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache would help with query performance by not having to fetch resultset metadata from neo4j if we have already seen the same query in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will CachingJdbcClient
helps here ? Or can we add this cacahing layer as a follow-up PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could help depending on queries. It's not as helpful as full fledged connector as this connector only supports PTF queries. So, the underlying cypher query cannot be parameterized. Do you see any problem with the implementation?
plugin/trino-neo4j/pom.xml
Outdated
<parent> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-root</artifactId> | ||
<version>406-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean sync the fork to the latest version (419-snapshot)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to latest version
plugin/trino-neo4j/pom.xml
Outdated
<artifactId>neo4j-jdbc-bolt</artifactId> | ||
<version>4.0.6</version> | ||
<exclusions> | ||
<exclusion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could mention why we would need this exclusion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the exclusion
plugin/trino-neo4j/pom.xml
Outdated
<dependency> | ||
<groupId>org.neo4j</groupId> | ||
<artifactId>neo4j-jdbc-bolt</artifactId> | ||
<version>4.0.6</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the version as a maven property ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
plugin/trino-neo4j/pom.xml
Outdated
<dependency> | ||
<groupId>org.neo4j.driver</groupId> | ||
<artifactId>neo4j-java-driver</artifactId> | ||
<version>4.4.9</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be compatible with the bolt dependency ?
plugin/trino-neo4j/pom.xml
Outdated
</dependency> | ||
|
||
<!-- for testing --> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line can be removed
{ | ||
super(jdbcClient, jdbcQueryEventListeners); | ||
this.jdbcClient = jdbcClient; | ||
this.jdbcQueryEventListeners = jdbcQueryEventListeners; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a requireNonNull
check for all the arguments and for jdbcQueryEventListeners
can we use ImmutableSet.copyOf
- so that we have a immutable copy of jdbcQueryEventListener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these checks are in the super class
import java.util.Optional; | ||
import java.util.Set; | ||
|
||
public class Neo4jMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an extend Neo4jMetadata
- if the goal is to restrict the pushdown operation, the same can be achieved using Neo4jClient
itself instead of handing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Neo4jMetadata is called first and then for actual pushdown execution Neo4jClient is called. Since this connector doesn't do any pushdown, the sooner we short circuit the better.
Let me know if my assumption is wrong.
extends DefaultQueryBuilder | ||
{ | ||
/** | ||
* This connector only supports table functions, so wrap the user entered table function query into a Neo4j call subquery with projected columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this meant we don't normal SELECT * FROM
kind of queries in this Neo4j connector ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this connector is only for running 'cypher queries' in a PTF
import static java.util.Objects.requireNonNull; | ||
|
||
public class Neo4jLoader | ||
extends AbstractTestingTrinoClient<Void> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this Neo4jLoader
needs to extend AbstractTestingTrinoClient
can we do something like DruidQueryRunner
- we run SELECT * FROM
tpch table and use the materialized result to load data into Neo4j.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that too. I followed the approach taken in Elastic connector. In either approach we'll still have to take the result and insert data into Neo4j as we did here in Neo4jLoadingSession. Anything wrong with this?
} | ||
|
||
@Test | ||
public void testCreateNode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add these test as a part of BaseNeo4jTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created BaseNeo4jTest
@sashi-gh Are you willing to continue to work on it? We are very interested in this connector so we can take it over if you don't mind. We will address all the remaining comments and merge it. Your authorship of the code will remain. |
I'm also interested in a connector for neo4j. I've had a look at the work done by @sashi-gh and have continued working on the branch here (diff: master...ragnard:trino:neo4j-connector) So far I've mainly updated to @kokosing Would the project be interested in continued work on this? |
FWIW I've started working on another approach for a Neo4j connector that is not based on JDBC. In short, I think the mismatch between JDBC and Neo4j, and some implementation decisions made in the Neo4j JDBC driver brings more problems than it's worth. My branch is here, will submit PR for discussion when it's in a decent shape: As of now, I've got a POC of all the read operations implemented where node labels/properties are exposed as tables/columns, and I'm working on a |
Description
Neo4j uses a custom query syntax called Cypher query language that is optimized for interacting with a graph database. This is not compatible with SQL.
Neo4j supports SQL queries on a graph db using the "Neo4j connector for BI" JDBC driver.
https://neo4j.com/bi-connector/
This driver translates SQL queries to graph optimized cypher queries. But, this approach has some limitations and not suitable for all use cases as the SQL queries could get verbose, do not natively support graph semantics and not very performant.
Neo4j also has another driver called 'neo4j-jdbc' that supports native cypher queries over JDBC.
https://github.com/neo4j-contrib/neo4j-jdbc
However, this driver couldn't be configured using Trino's generic JDBC driver.
So, I created a bespoke connector for Neo4j to be able to run Neo4j cypher queries using Trino table functions. This could be a valuable tool for customers who would like to query Neo4j as part of their data mesh strategy.
Additional context and related issues
This driver functionality is limited to supporting cypher queries using Table functions. It derives most of it's functionality from trino-base-jdbc module. It doesn't support any DDL or DML statements and doesn't support any pushdown features.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text: