Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pyspark #19

Closed
melin opened this issue Dec 7, 2021 · 6 comments · Fixed by #51
Closed

Support pyspark #19

melin opened this issue Dec 7, 2021 · 6 comments · Fixed by #51
Assignees
Labels
help wanted Community: does anyone want to work on it?

Comments

@melin
Copy link

melin commented Dec 7, 2021

Pyspark is best supported. Algorithmic people are familiar with Python and easy to use

@harryprince
Copy link

same need +1

@Nicole00 Nicole00 added the help wanted Community: does anyone want to work on it? label Jan 28, 2022
@wey-gu
Copy link
Contributor

wey-gu commented Apr 19, 2022

@melin @harryprince
I had spent some time today to figure out that pyspark is supported out of box, I will write more in doc/blog later.

/spark/bin/pyspark --driver-class-path nebula-spark-connector-3.0.0.jar --jars nebula-spark-connector-3.0.0.jar
df = spark.read.format(
    "com.vesoft.nebula.connector.NebulaDataSource").option(
        "type", "vertex").option(
        "spaceName", "basketballplayer").option(
        "label", "player").option(
        "returnCols", "name,age").option(
        "metaAddress", "metad0:9559").option(
        "partitionNumber", 1).load()

>>> df.show(n=2)
+---------+--------------+---+
|_vertexId|          name|age|
+---------+--------------+---+
|player105|   Danny Green| 31|
|player109|Tiago Splitter| 34|
+---------+--------------+---+
only showing top 2 rows

@wey-gu
Copy link
Contributor

wey-gu commented Apr 19, 2022

@harryprince
Copy link

does the schema information could be detected automatically like we use Hive with meta info?
specific schema via option seems not a better way, when the column list is too long.

@wey-gu
Copy link
Contributor

wey-gu commented Apr 20, 2022

does the schema information could be detected automatically like we use Hive with meta info?
specific schema via option seems not a better way, when the column list is too long.

I think you could use the nebula-python client to fetch meta/schema easier(it should be working to do so via spark-c, too with this py4j under the hood, I didn't try that yet), while, please be noted returnCols isn't mandatory, if it's omitted, all prop will be fetched by default.

@wey-gu
Copy link
Contributor

wey-gu commented Aug 23, 2022

Now both write and read examples were provided #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Community: does anyone want to work on it?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants