Skip to content

Commit

Permalink
fix version validation (#88)
Browse files Browse the repository at this point in the history
* fix the version validation info

* update version match

* fix assert info

* update Nebula to NebulaGraph
  • Loading branch information
Nicole00 committed Mar 7, 2023
1 parent f904dee commit 06e8900
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 52 deletions.
92 changes: 57 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Nebula Spark Connector
# NebulaGraph Spark Connector
[中文版](https://github.com/vesoft-inc/nebula-spark-connector/blob/master/README_CN.md)

## Introduction

Nebula Spark Connector 2.0/3.0 only supports Nebula Graph 2.x/3.x. If you are using Nebula Graph v1.x, please use [Nebula Spark Connector v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/nebula-spark) .
NebulaGraph Spark Connector 2.0/3.0 only supports NebulaGraph 2.x/3.x. If you are using NebulaGraph v1.x, please use [NebulaGraph Spark Connector v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/nebula-spark) .

Nebula Spark Connector support spark 2.2 and 2.4.
NebulaGraph Spark Connector support spark 2.2 and 2.4.

## How to Compile

1. Package Nebula Spark Connector.
1. Package NebulaGraph Spark Connector.

```bash
$ git clone https://github.com/vesoft-inc/nebula-spark-connector.git
Expand All @@ -24,27 +24,43 @@ Nebula Spark Connector support spark 2.2 and 2.4.

After the packaging, you can see the newly generated nebula-spark-connector-3.0-SNAPSHOT.jar under the nebula-spark-connector/nebula-spark-connector/target/ directory.

## New Features (Compared to Nebula Spark Connector 1.0)
## New Features (Compared to NebulaGraph Spark Connector 1.0)
* Supports more connection configurations, such as timeout, connectionRetry, and executionRetry.
* Supports more data configurations, such as whether vertexId can be written as vertex's property, whether srcId, dstId and rank can be written as edge's properties.
* Spark Reader Supports non-property, all-property, and specific-properties read.
* Spark Reader Supports reading data from Nebula Graph to Graphx as VertexRD and EdgeRDD, it also supports String type vertexId.
* Nebula Spark Connector 2.0 uniformly uses SparkSQL's DataSourceV2 for data source expansion.
* Nebula Spark Connector 2.1.0 support UPDATE write mode to NebulaGraph, see [Update Vertex](https://docs.nebula-graph.io/2.0.1/3.ngql-guide/12.vertex-statements/2.update-vertex/) .
* Nebula Spark Connector 2.5.0 support DELETE write mode to NebulaGraph, see [Delete Vertex](https://docs.nebula-graph.io/master/3.ngql-guide/12.vertex-statements/4.delete-vertex/)
* Spark Reader Supports reading data from NebulaGraph to Graphx as VertexRD and EdgeRDD, it also supports String type vertexId.
* NebulaGraph Spark Connector 2.0 uniformly uses SparkSQL's DataSourceV2 for data source expansion.
* NebulaGraph Spark Connector 2.1.0 support UPDATE write mode to NebulaGraph, see [Update Vertex](https://docs.nebula-graph.io/2.0.1/3.ngql-guide/12.vertex-statements/2.update-vertex/) .
* NebulaGraph Spark Connector 2.5.0 support DELETE write mode to NebulaGraph, see [Delete Vertex](https://docs.nebula-graph.io/master/3.ngql-guide/12.vertex-statements/4.delete-vertex/)

## How to Use

If you use Maven to manage your project, add the following dependency to your pom.xml:
If you use Maven to manage your project, add one of the following dependency to your pom.xml:

```
<!-- connector for spark 2.4 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
<!-- connector for spark 2.2 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_2.2</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
<!-- connector for spark 3.0 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_3.0</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
```

Write DataFrame `INSERT` into Nebula Graph as Vertices:
Write DataFrame `INSERT` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -61,7 +77,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Write DataFrame `UPDATE` into Nebula Graph as Vertices:
Write DataFrame `UPDATE` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -79,7 +95,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Write DataFrame `DELETE` into Nebula Graph as Vertices:
Write DataFrame `DELETE` into NebulaGraph as Vertices:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -96,7 +112,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
.build()
df.write.nebula(config, nebulaWriteVertexConfig).writeVertices()
```
Read vertices from Nebula Graph:
Read vertices from NebulaGraph:
```
val config = NebulaConnectionConfig
.builder()
Expand All @@ -115,7 +131,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.
val vertex = spark.read.nebula(config, nebulaReadVertexConfig).loadVerticesToDF()
```

Read vertices and edges from Nebula Graph to construct Graphx's graph:
Read vertices and edges from NebulaGraph to construct Graphx's graph:
```
val config = NebulaConnectionConfig
.builder()
Expand Down Expand Up @@ -148,7 +164,7 @@ Nebula Spark Connector support spark 2.2 and 2.4.

For more information on usage, please refer to [Example](https://github.com/vesoft-inc/nebula-spark-connector/tree/master/example/src/main/scala/com/vesoft/nebula/examples/connector).

## PySpark with Nebula Spark Connector
## PySpark with NebulaGraph Spark Connector

Below is an example of calling nebula-spark-connector jar package in pyspark.

Expand Down Expand Up @@ -276,7 +292,7 @@ For more options, i.e. delete edge with vertex being deleted, refer to [nebula/c
val DELETE_EDGE: String = "deleteEdge"
```

### Call Nebula Spark Connector in PySpark shell and .py file
### Call NebulaGraph Spark Connector in PySpark shell and .py file

Also, below are examples on how we run above code with pyspark shell or in python code files:

Expand Down Expand Up @@ -307,21 +323,27 @@ df = spark.read.format(
"partitionNumber", 1).load()
```

## Version match

There are the version correspondence between Nebula Spark Connector and Nebula:

| Nebula Spark Connector Version | Nebula Version |
|:------------------------------:|:--------------:|
| 2.0.0 | 2.0.0, 2.0.1 |
| 2.0.1 | 2.0.0, 2.0.1 |
| 2.1.0 | 2.0.0, 2.0.1 |
| 2.5.0 | 2.5.0, 2.5.1 |
| 2.5.1 | 2.5.0, 2.5.1 |
| 2.6.0 | 2.6.0, 2.6.1 |
| 2.6.1 | 2.6.0, 2.6.1 |
| 3.0.0 | 3.0.x, 3.1.x |
| 3.0-SNAPSHOT | nightly |
## Compatibility matrix

There are the version correspondence between NebulaGraph Spark Connector and Nebula、Spark:

| NebulaGraph Spark Connector Version | NebulaGraph Version | Spark Version |
|:-----------------------------------------:|:--------------:|:-------------:|
|nebula-spark-connector-2.0.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.0.1.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.1.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.5.0.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.5.1.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.6.0.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-2.6.1.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-3.0.0.jar | 3.x | 2.4.* |
|nebula-spark-connector-3.3.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.3.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.4.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.4.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.0-SNAPSHOT.jar | nightly | 2.4.* |
|nebula-spark-connector_2.2-3.0-SNAPSHOT.jar| nightly | 2.2.* |
|nebula-spark-connector_3.0-3.0-SNAPSHOT.jar| nightly | 3.* |

## Performance
We use LDBC dataset to test nebula-spark-connector's performance, here's the result.
Expand All @@ -332,7 +354,7 @@ We choose tag Comment and edge REPLY_OF for space sf30 and sf100 to test the con
And the application's resources are: standalone mode with three workers, 2G driver-memory,
3 num-executors, 30G executor-memory and 20 executor-cores.
The ReadNebulaConfig has 2000 limit and 100 partitionNum,
the same partition number with nebula space parts.
the same partition number with NebulaGraph space parts.


|data type|ldbc 67.12million with No Property| ldbc 220 million with No Property|ldbc 67.12million with All Property|ldbc 220 million with All Property|
Expand Down Expand Up @@ -360,8 +382,8 @@ The writeConfig has 2000 batch sizes, and the DataFrame has 60 partitions.

## How to Contribute

Nebula Spark Connector is a completely opensource project, opensource enthusiasts are welcome to participate in the following ways:
NebulaGraph Spark Connector is a completely opensource project, opensource enthusiasts are welcome to participate in the following ways:

- Go to [Nebula Graph Forum](https://discuss.nebula-graph.com.cn/ "go to“Nebula Graph Forum") to discuss with other users. You can raise your own questions, help others' problems, share your thoughts.
- Go to [NebulaGraph Forum](https://discuss.nebula-graph.com.cn/ "go to“NebulaGraph Forum") to discuss with other users. You can raise your own questions, help others' problems, share your thoughts.
- Write or improve documents.
- Submit code to add new features or fix bugs.
50 changes: 36 additions & 14 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,28 @@ Nebula Spark Connector 支持 Spark 2.2 和 2.4.
* Nebula Spark Connector 2.5.0 增加了 DELETE 写入模式,相关说明参考[Delete Vertex](https://docs.nebula-graph.com.cn/2.5.1/3.ngql-guide/12.vertex-statements/4.delete-vertex/)

## 使用说明
如果你使用Maven管理项目,请在pom.xml文件中增加依赖:
如果你使用Maven管理项目,请在pom.xml文件中增加下列某一项依赖:
```
<!-- connector for spark 2.4 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
<!-- connector for spark 2.2 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_2.2</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
<!-- connector for spark 3.0 -->
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_3.0</artifactId>
<version>3.0-SNAPSHOT</version>
</dependency>
```

将 DataFrame 作为点 `INSERT` 写入 Nebula Graph :
Expand Down Expand Up @@ -305,19 +320,26 @@ df = spark.read.format(
```

## 版本匹配
Nebula Spark Connector 和 Nebula 的版本对应关系如下:

| Nebula Spark Connector Version | Nebula Version |
|:------------------------------:|:--------------:|
| 2.0.0 | 2.0.0, 2.0.1 |
| 2.0.1 | 2.0.0, 2.0.1 |
| 2.1.0 | 2.0.0, 2.0.1 |
| 2.5.0 | 2.5.0, 2.5.1 |
| 2.5.1 | 2.5.0, 2.5.1 |
| 2.6.0 | 2.6.0, 2.6.1 |
| 2.6.1 | 2.6.0, 2.6.1 |
| 3.0.0 | 3.0.0 |
| 3.0-SNAPSHOT | nightly |
Nebula Spark Connector 和 Nebula 、Spark 的版本对应关系如下:

| Nebula Spark Connector Version | Nebula Version | Spark Version |
|:-----------------------------------------:|:--------------:|:-------------:|
|nebula-spark-connector-2.0.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.0.1.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.1.0.jar | 2.0.0, 2.0.1 | 2.4.* |
|nebula-spark-connector-2.5.0.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.5.1.jar | 2.5.0, 2.5.1 | 2.4.* |
|nebula-spark-connector-2.6.0.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-2.6.1.jar | 2.6.0, 2.6.1 | 2.4.* |
|nebula-spark-connector-3.0.0.jar | 3.x | 2.4.* |
|nebula-spark-connector-3.3.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.3.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.4.0.jar | 3.x | 2.4.* |
|nebula-spark-connector_2.2-3.4.0.jar | 3.x | 2.2.* |
|nebula-spark-connector-3.0-SNAPSHOT.jar | nightly | 2.4.* |
|nebula-spark-connector_2.2-3.0-SNAPSHOT.jar| nightly | 2.2.* |
|nebula-spark-connector_3.0-3.0-SNAPSHOT.jar| nightly | 3.* |


## 性能
我们使用LDBC数据集进行Nebula-Spark-Connector的性能测试,测试结果如下:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,8 @@ object ReadNebulaConfig {
assert(space != null && !space.isEmpty, s"config space is empty.")
assert(label != null && !label.isEmpty, s"config label is empty.")
assert(limit > 0, s"config limit must be positive, your limit is $limit")
assert(partitionNum > 0, s"config partitionNum must be positive, your partitionNum is $limit")
assert(partitionNum > 0,
s"config partitionNum must be positive, your partitionNum is $partitionNum")
if (noColumn && returnCols.nonEmpty) {
LOG.warn(
s"noColumn is true, returnCols will be invalidate "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ object SparkValidate {
val sparkVersion = SparkSession.getActiveSession.map(_.version).getOrElse("UNKNOWN")
if (sparkVersion != "UNKNOWN" && !supportedVersions.exists(sparkVersion.matches)) {
throw new RuntimeException(
s"""Your current spark version ${sparkVersion} is not supported by the current NebulaGraph Exchange.
| please visit https://github.com/vesoft-inc/nebula-exchange#version-match to know which Exchange you need.
s"""Your current spark version ${sparkVersion} is not supported by the current NebulaGraph Spark Connector.
| please visit https://github.com/vesoft-inc/nebula-spark-connector#version-match to know which Connector you need.
| """.stripMargin)
}
}
Expand Down

0 comments on commit 06e8900

Please sign in to comment.