Skip to content

Commit

Permalink
add desc for multi spark (#48)
Browse files Browse the repository at this point in the history
  • Loading branch information
Nicole00 committed Jan 5, 2022
1 parent 9af7ace commit 70bcbf0
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 22 deletions.
26 changes: 16 additions & 10 deletions README-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,30 @@ Exchange 2.0 仅支持 Nebula Graph 2.x。

如果您正在使用 Nebula Graph v1.x,请使用 [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange) ,或参考 Exchange 1.0 的使用文档[《Nebula Exchange 用户手册》](https://docs.nebula-graph.com.cn/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ "点击前往 Nebula Graph 网站")

Exchange 目前支持 Spark 2.2, Spark 2.4, Spark 3.0, 对应的工具包名分别是 nebula-exchange_spark_2.2,nebula-exchange_spark_2.4,nebula-exchange_spark_3.0。

## 如何获取

1. 编译打包最新的 Exchange。

```bash
$ git clone https://github.com/vesoft-inc/nebula-exchange.git
$ cd nebula-exchange/nebula-exchange
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true
$ cd nebula-exchange
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.2 -am -Pscala-2.11 -Pspark-2.2
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.4 -am -Pscala-2.11 -Pspark-2.4
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0
```

编译打包完成后,可以在 nebula-exchange/nebula-exchange/target/ 目录下看到 nebula-exchange-2.5-SNAPSHOT.jar 文件。
2. 在 Maven 远程仓库下载
编译打包完成后,可以在 nebula-exchange/nebula-exchange_spark_2.2/target/ 目录下看到 nebula-exchange_spark_2.2-2.5-SNAPSHOT.jar 文件,
在 nebula-exchange/nebula-exchange_spark_2.4/target/ 目录下看到 nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar 文件,
在 nebula-exchange/nebula-exchange_spark_3.0/target/ 目录下看到 nebula-exchange_spark_3.0-2.5-SNAPSHOT.jar 文件。
2. 在官网或 github 下载

正式版本:
https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/
https://github.com/vesoft-inc/nebula-exchange/releases or https://nebula-graph.com.cn/release/?exchange

快照版本:
https://oss.sonatype.org/content/repositories/snapshots/com/vesoft/nebula-exchange/
快照版本: (进入页面点击任意workflow后,snapshot版本的jar包在Artifacts中,根据需求自行下载)
https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml

## 版本匹配

Expand Down Expand Up @@ -57,7 +63,7 @@ Nebula Exchange 和 Nebula 的版本对应关系如下:

*7. Exchange 2.0 的导入命令:*
```
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.5.0.jar -c /path/to/application.conf
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar -c /path/to/application.conf
```
如果数据源有HIVE,则导入命令最后还需要加 `-h` 表示启用HIVE数据源。

Expand All @@ -68,7 +74,7 @@ $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--files application.conf \
--conf spark.driver.extraClassPath=./ \
--conf spark.executor.extraClassPath=./ \
nebula-exchange-2.5.0.jar \
nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar \
-c application.conf
```

Expand All @@ -77,7 +83,7 @@ nebula-exchange-2.5.0.jar \
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master local \
--conf spark.sql.shuffle.partitions=200 \
nebula-exchange-2.5.0.jar \
nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar \
-c application.conf
```

Expand Down
35 changes: 23 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,43 @@ Nebula Exchange (Exchange for short) is an Apache Spark application. It is used

Exchange 2.0 only supports Nebula Graph 2.0 . If you want to import data for Nebula Graph v1.x,please use [Nebula Exchange v1.0](https://github.com/vesoft-inc/nebula-java/tree/v1.0/tools/exchange).

Exchange currently supports spark2.2, spark2.4 and spark3.0, and the corresponding toolkits are nebula-exchange_spark_2.2, nebula-exchange_spark_2.4, nebula-exchange_spark_3.0.

## How to get

1. Package latest Exchange

```bash
$ git clone https://github.com/vesoft-inc/nebula-exchange.git
$ cd nebula-exchange/nebula-exchange
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true
$ cd nebula-exchange
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.2 -am -Pscala-2.11 -Pspark-2.2
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.4 -am -Pscala-2.11 -Pspark-2.4
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0
```

After the packaging, you can see the newly generated nebula-exchange-2.5-SNAPSHOT.jar under the nebula-exchange/nebula-exchange/target/ directory.
2. Download from Maven repository
After the packaging, you can see the newly generated nebula-exchange_spark_2.2-2.5-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.2/target/ directory,
nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_2.4/target/ directory,
nebula-exchange_spark_3.0-2.5-SNAPSHOT.jar under the nebula-exchange/nebula-exchange_spark_3.0/target/ directory.
2. Download from github artifact

**release version:**

`https://github.com/vesoft-inc/nebula-exchange/releases`
or
`https://nebula-graph.com.cn/release/?exchange`

release version:
https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/
**snapshot version:**

snapshot version:
https://oss.sonatype.org/content/repositories/snapshots/com/vesoft/nebula-exchange/
`https://github.com/vesoft-inc/nebula-exchange/actions/workflows/deploy_snapshot.yml`
## How to use

Import command:
```
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.5.0.jar -c /path/to/application.conf
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar -c /path/to/application.conf
```
If your source is HIVE, import command is:
```
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.5.0.jar -c /path/to/application.conf -h
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar -c /path/to/application.conf -h
```

Note:Submit Exchange with Yarn-Cluster mode, please use following command:
Expand All @@ -41,7 +51,7 @@ $SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--files application.conf \
--conf spark.driver.extraClassPath=./ \
--conf spark.executor.extraClassPath=./ \
nebula-exchange-2.5.0.jar \
nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar \
-c application.conf
```

Expand All @@ -50,7 +60,7 @@ Note: When use Exchange to generate SST files, please add spark.sql.shuffle.part
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master local \
--conf spark.sql.shuffle.partitions=200 \
nebula-exchange-2.5.0.jar \
nebula-exchange_spark_2.4-2.5-SNAPSHOT.jar \
-c application.conf
```

Expand All @@ -77,5 +87,6 @@ There are the version correspondence between Nebula Exchange and Nebula:
3. Supports importing data from other Hive sources besides Hive on Spark.
4. Supports recording and retrying the INSERT statement after failures during data import.
5. Supports SST import, but not support property's default value yet.
6. Supports Spark 2.2, Spark 2.4 and Spark 3.0.

Refer to [application.conf](https://github.com/vesoft-inc/nebula-exchange/tree/master/nebula-exchange/src/main/resources/application.conf) as an example to edit the configuration file.

0 comments on commit 70bcbf0

Please sign in to comment.