Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.0 support #50

Closed
wey-gu opened this issue Jun 16, 2022 · 7 comments
Closed

Spark 3.0 support #50

wey-gu opened this issue Jun 16, 2022 · 7 comments
Labels
help wanted Community: does anyone want to work on it? type/feature req Type: feature request

Comments

@wey-gu
Copy link
Contributor

wey-gu commented Jun 16, 2022

like vesoft-inc/nebula-exchange#41

update: 2023-April, spark connector supports spark 3.0 now.

@wey-gu wey-gu added the help wanted Community: does anyone want to work on it? label Jun 16, 2022
@Nicole00
Copy link
Contributor

Nicole00 commented Sep 6, 2022

it depends on spark connector with spark 3.0, not implemented but in the schedule.

@Sophie-Xie Sophie-Xie added the type/feature req Type: feature request label Nov 30, 2022
@xiajingchun
Copy link

@Nicole00 I noticed there's already a pull request of connector supporting spark 3.0. When that one is done, any further work here in order to run algos in spark 3?

@porscheme
Copy link

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

@Nicole00
Copy link
Contributor

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

At present, you can do this temporarily: pull branch and execute maven install for nebula-spark-connector_3.0 , and update the spark connector version referenced in algorithm

@meet-alfie
Copy link

meet-alfie commented Nov 3, 2023

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

I also encountered the same problem。The spark version used by our platform is 3.x (3.2.2)。
My algorithm does not use nebula data directly, but uses the results of business query nebula. It can be thought of as only data containing vertices, end points, and weights.
I only extracted the source code of nebula and nebula-spark-connector used in the algorithm to my project, and relied on my own, which can run the algorithm like pagerank normally.

My main modifications are as follows:

  1. source file
├── base
│   └── client
│       ├── meta_data
│       │   ├── FieldMetaData.java
│       │   └── FieldValueMetaData.java
│       ├── protocol
│       │   ├── ShortStack.java
│       │   ├── TCompactProtocol.java
│       │   ├── TException.java
│       │   ├── TField.java
│       │   ├── TList.java
│       │   ├── TMap.java
│       │   ├── TMessage.java
│       │   ├── TProtocol.java
│       │   ├── TProtocolException.java
│       │   ├── TProtocolFactory.java
│       │   ├── TSet.java
│       │   ├── TStruct.java
│       │   └── TTransportException.java
│       ├── schema
│       │   ├── IScheme.java
│       │   ├── SchemeFactory.java
│       │   └── StandardScheme.java
│       ├── thrift
│       │   └── TBase.java
│       └── transport
│           ├── TException.java
│           ├── TTransport.java
│           └── TTransportException.java
├── config
│   ├── AlgoConfig.scala
│   └── SparkConfigEntry.scala
├── examples
│   └── PageRankExample.scala
├── lib
│   └── PageRankAlgo.scala
├── reader
│   └── ReadData.scala
└── utils
    ├── DecodeUtil.scala
    └── NebulaUtil.scala

13 directories, 29 files
  1. pom.xml
        <properties>
           <maven.compiler.source>8</maven.compiler.source>
           <maven.compiler.target>8</maven.compiler.target>
           <scala.version>2.12</scala.version>
           <spark.version>3.2.2</spark.version>
           <lombok.version>1.18.28</lombok.version>
           <config.version>1.4.0</config.version>
           <scopt.version>3.7.1</scopt.version>
         </properties>
        
         <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-graphx_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>${config.version}</version>
        </dependency>
        <dependency>
            <groupId>com.github.scopt</groupId>
            <artifactId>scopt_${scala.version}</artifactId>
            <version>${scopt.version}</version>
        </dependency>

Just follow this method to add your algorithm source code and update your own dependencies
Hope it helps you

@xin-hao-awx
Copy link

Can we take this as a higher priority?

@Nicole00
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Community: does anyone want to work on it? type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

7 participants