Neo4j connector to support cypher queries using Trino table functions #15587

sashi-gh · 2023-01-04T03:19:39Z

Description

Neo4j uses a custom query syntax called Cypher query language that is optimized for interacting with a graph database. This is not compatible with SQL.

Neo4j supports SQL queries on a graph db using the "Neo4j connector for BI" JDBC driver.
https://neo4j.com/bi-connector/
This driver translates SQL queries to graph optimized cypher queries. But, this approach has some limitations and not suitable for all use cases as the SQL queries could get verbose, do not natively support graph semantics and not very performant.

Neo4j also has another driver called 'neo4j-jdbc' that supports native cypher queries over JDBC.
https://github.com/neo4j-contrib/neo4j-jdbc
However, this driver couldn't be configured using Trino's generic JDBC driver.

So, I created a bespoke connector for Neo4j to be able to run Neo4j cypher queries using Trino table functions. This could be a valuable tool for customers who would like to query Neo4j as part of their data mesh strategy.

Additional context and related issues

This driver functionality is limited to supporting cypher queries using Table functions. It derives most of it's functionality from trino-base-jdbc module. It doesn't support any DDL or DML statements and doesn't support any pushdown features.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# Section
* New Neo4j connector to support cypher queries using Trino table functions  ({issue}`issuenumber`)

cla-bot · 2023-01-04T03:19:42Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

ebyhr

Could you please add docs and more tests?

Documentation in docs
Test extends BaseJdbcConnectorTest
Product test https://github.com/trinodb/trino/tree/master/testing/trino-product-tests

sashi-gh · 2023-01-23T04:02:46Z

@ebyhr, thanks for your comment.
I looked at the BaseJdbcConnectorTest and here are my thoughts -

The neo4j connector I've created supports only pass through queries in Cypher language. It is not a full fledged connector like others where the incoming SQL would be mapped to Cypher. So, maybe not all the tests in BaseJdbcConnectorTest are relevant to this neo4j connector even after customizing the test behavior using TestingConnectorBehavior
I plan to write tests like this -

If the original test sql was -

SELECT shippriority, clerk, totalprice FROM orders

The neo4j test would be -

SELECT shippriority, clerk, totalprice FROM TABLE(neo4j.system.query(query => 'MATCH (n:Orders) return n.orderkey as orderkey,n.custkey as custkey,n.orderstatus as orderstatus,n.totalprice as totalprice,n.orderdate as orderdate,n.orderpriority as orderpriority,n.clerk as clerk,n.shippriority as shippriority,n.comment as comment'))

So, any references to simple table names in the original test query, would be replaced by a pass-through table query that returns the same data

The problem with the above approach is that, I'll have to rewrite all relevant tests from BaseConnectorTest in my Test class because the base queries are not directly executable with neo4j connector. This could pose problems if tests are added/updated/removed in the base class
Instead of rewriting the tests, I was thinking a better approach would be to modify the BaseConnectorTest (and BaseJdbcConnectorTest) to allow subclasses define the table format of the query. So, the default behavior is to use the table name as is, but a sub-class could overwrite the tableName with a pass-through table function query. This would also allow future versions of any pass-through only connectors to define their own test sql behavior

Thoughts?

github-actions · 2023-02-22T19:49:32Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

bitsondatadev · 2023-02-23T01:43:03Z

@ebyhr can you give this a look?

Praveen2112 · 2023-05-16T15:55:33Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jJdbcConfig.java

+
+import io.trino.plugin.jdbc.BaseJdbcConfig;
+
+public class Neo4jJdbcConfig


In this case can we use BaseJdbcConfig directly ?

Yes, that should work

So can we remove this class ?

Praveen2112 · 2023-05-16T15:57:28Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jClient.java

+        return query.transformQuery(sql -> {
+            if (sql.toLowerCase(Locale.getDefault()).startsWith("select")) {
+                int firstIndex = sql.toLowerCase(Locale.getDefault()).indexOf("from (");
+                int lastIndex = sql.lastIndexOf(")");
+                sql = sql.substring(firstIndex + 6, lastIndex);
+            }
+            log.debug("Neo4j Cypher sql query - " + sql);
+            return sql;


Instead can we capture them as a Neo4jQueryBuilder.

That is probably a better approach

Created Neo4jQueryBuilder

Praveen2112 · 2023-05-16T15:58:30Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jClient.java

+    private static final Logger log = Logger.get(Neo4jClient.class);
+    private static final int MAX_RESULT_SET_INFO_CACHE_ENTRIES = 10000;
+    private final Type jsonType;
+    private final Cache<PreparedQuery, Neo4jResultSetInfo> cachedResultSetInfo;


Why do we need this cache ?

The cache would help with query performance by not having to fetch resultset metadata from neo4j if we have already seen the same query in the past.

Will CachingJdbcClient helps here ? Or can we add this cacahing layer as a follow-up PR ?

It could help depending on queries. It's not as helpful as full fledged connector as this connector only supports PTF queries. So, the underlying cypher query cannot be parameterized. Do you see any problem with the implementation?

…Builder

Praveen2112 · 2023-05-24T06:37:01Z

plugin/trino-neo4j/pom.xml

+    <parent>
+        <groupId>io.trino</groupId>
+        <artifactId>trino-root</artifactId>
+        <version>406-SNAPSHOT</version>


Can we update the version ?

You mean sync the fork to the latest version (419-snapshot)?

Updated to latest version

Praveen2112 · 2023-05-24T06:37:49Z

plugin/trino-neo4j/pom.xml

+            <artifactId>neo4j-jdbc-bolt</artifactId>
+            <version>4.0.6</version>
+            <exclusions>
+                <exclusion>


It would be nice if we could mention why we would need this exclusion ?

Removed the exclusion

Praveen2112 · 2023-05-24T06:38:09Z

plugin/trino-neo4j/pom.xml

+        <dependency>
+            <groupId>org.neo4j</groupId>
+            <artifactId>neo4j-jdbc-bolt</artifactId>
+            <version>4.0.6</version>


Can we extract the version as a maven property ?

Praveen2112 · 2023-05-24T06:39:07Z

plugin/trino-neo4j/pom.xml

+        <dependency>
+            <groupId>org.neo4j.driver</groupId>
+            <artifactId>neo4j-java-driver</artifactId>
+            <version>4.4.9</version>


Should this be compatible with the bolt dependency ?

Praveen2112 · 2023-05-24T06:39:23Z

plugin/trino-neo4j/pom.xml

+        </dependency>
+
+        <!-- for testing -->
+


new line can be removed

Praveen2112 · 2023-05-24T06:48:09Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jJdbcMetadataFactory.java

+    {
+        super(jdbcClient, jdbcQueryEventListeners);
+        this.jdbcClient = jdbcClient;
+        this.jdbcQueryEventListeners = jdbcQueryEventListeners;


Can we add a requireNonNull check for all the arguments and for jdbcQueryEventListeners can we use ImmutableSet.copyOf - so that we have a immutable copy of jdbcQueryEventListener

Yes, these checks are in the super class

Praveen2112 · 2023-05-24T06:49:43Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jMetadata.java

+import java.util.Optional;
+import java.util.Set;
+
+public class Neo4jMetadata


Why do we need an extend Neo4jMetadata - if the goal is to restrict the pushdown operation, the same can be achieved using Neo4jClient itself instead of handing here.

Looks like Neo4jMetadata is called first and then for actual pushdown execution Neo4jClient is called. Since this connector doesn't do any pushdown, the sooner we short circuit the better.
Let me know if my assumption is wrong.

Praveen2112 · 2023-05-24T06:51:30Z

plugin/trino-neo4j/src/main/java/io/trino/plugin/neo4j/Neo4jQueryBuilder.java

+        extends DefaultQueryBuilder
+{
+    /**
+     *  This connector only supports table functions, so wrap the user entered table function query into a Neo4j call subquery with projected columns


Does this meant we don't normal SELECT * FROM kind of queries in this Neo4j connector ?

Yes, this connector is only for running 'cypher queries' in a PTF

Praveen2112 · 2023-05-24T06:57:17Z

plugin/trino-neo4j/src/test/java/io/trino/plugin/neo4j/Neo4jLoader.java

+import static java.util.Objects.requireNonNull;
+
+public class Neo4jLoader
+        extends AbstractTestingTrinoClient<Void>


Does this Neo4jLoader needs to extend AbstractTestingTrinoClient can we do something like DruidQueryRunner - we run SELECT * FROM tpch table and use the materialized result to load data into Neo4j.

We could do that too. I followed the approach taken in Elastic connector. In either approach we'll still have to take the result and insert data into Neo4j as we did here in Neo4jLoadingSession. Anything wrong with this?

Praveen2112 · 2023-05-24T07:00:46Z

plugin/trino-neo4j/src/test/java/io/trino/plugin/neo4j/TestNeo4jNative.java

+    }
+
+    @Test
+    public void testCreateNode()


Can we add these test as a part of BaseNeo4jTest

Created BaseNeo4jTest

kokosing · 2023-06-07T09:57:00Z

@sashi-gh Are you willing to continue to work on it? We are very interested in this connector so we can take it over if you don't mind. We will address all the remaining comments and merge it. Your authorship of the code will remain.

ragnard · 2023-09-19T09:47:35Z

I'm also interested in a connector for neo4j. I've had a look at the work done by @sashi-gh and have continued working on the branch here (diff: master...ragnard:trino:neo4j-connector)

So far I've mainly updated to 427-SNAPSHOT and done some testing/review.

@kokosing Would the project be interested in continued work on this?

ragnard · 2023-09-20T14:04:29Z

FWIW I've started working on another approach for a Neo4j connector that is not based on JDBC.

In short, I think the mismatch between JDBC and Neo4j, and some implementation decisions made in the Neo4j JDBC driver brings more problems than it's worth.

My branch is here, will submit PR for discussion when it's in a decent shape:
master...ragnard:trino:neo4j-bolt-connector

As of now, I've got a POC of all the read operations implemented where node labels/properties are exposed as tables/columns, and I'm working on a system.query PTF for executing raw cypher.

Neo4j connector for executing cypher sql using table functions

5758b9c

ebyhr reviewed Jan 4, 2023

View reviewed changes

github-actions bot added the stale label Feb 22, 2023

github-actions bot removed the stale label Feb 23, 2023

kokosing force-pushed the master branch from 3f05134 to 58d6356 Compare March 14, 2023 11:34

Praveen2112 reviewed May 16, 2023

View reviewed changes

Moved the logic for query construction from Neo4jClient to Neo4jQuery…

b570e6d

…Builder

cla-bot bot added the cla-signed label May 18, 2023

Added Neo4j connector and native unit tests

9f18c64

Praveen2112 reviewed May 24, 2023

View reviewed changes

sashi-gh and others added 3 commits May 24, 2023 10:03

Merge branch 'trinodb:master' into master

a49b544

Upgrade module to 419-snapshot, pom cleanup and other minor changes

d522d38

Added more unit tests with Neo4j Connector to federated joins

f0dd770

sashi-gh closed this Jun 7, 2023


		import io.trino.plugin.jdbc.BaseJdbcConfig;

		public class Neo4jJdbcConfig

Neo4j connector to support cypher queries using Trino table functions #15587

Neo4j connector to support cypher queries using Trino table functions #15587

Conversation

sashi-gh commented Jan 4, 2023

Description

Additional context and related issues

Release notes

cla-bot bot commented Jan 4, 2023

ebyhr left a comment • edited

Choose a reason for hiding this comment

sashi-gh commented Jan 23, 2023

github-actions bot commented Feb 22, 2023

bitsondatadev commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented Jun 7, 2023

ragnard commented Sep 19, 2023 • edited

ragnard commented Sep 20, 2023 • edited

ebyhr left a comment •

edited

ragnard commented Sep 19, 2023 •

edited

ragnard commented Sep 20, 2023 •

edited