Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support custom partitioner for nebula when generate sst files #49

Merged
merged 4 commits into from Jan 5, 2022

Conversation

Nicole00
Copy link
Contributor

@Nicole00 Nicole00 commented Jan 5, 2022

  1. whether to use custom partitioner is configurable.
  2. use custom partitioner make sure the keys in different sst files does not overlap.
  3. When ingest sst files generated with custom partitioner, all most sst files lies on L6 (space is empty before ingest).
    image

add config for each tag or edge in the config file:
repartitionWithNebula:false/true , default is false.

@Nicole00
Copy link
Contributor Author

Nicole00 commented Jan 5, 2022

close #46

@codecov-commenter
Copy link

codecov-commenter commented Jan 5, 2022

Codecov Report

Merging #49 (ebec49c) into master (d31546b) will increase coverage by 4.40%.
The diff coverage is 69.44%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #49      +/-   ##
============================================
+ Coverage     50.19%   54.60%   +4.40%     
- Complexity       74       76       +2     
============================================
  Files            16       17       +1     
  Lines          1291     1315      +24     
  Branches        246      249       +3     
============================================
+ Hits            648      718      +70     
+ Misses          525      472      -53     
- Partials        118      125       +7     
Impacted Files Coverage Δ
...soft/exchange/common/utils/NebulaPartitioner.scala 11.11% <11.11%> (ø)
...om/vesoft/exchange/common/config/SinkConfigs.scala 73.33% <66.66%> (-3.59%) ⬇️
...m/vesoft/exchange/common/processor/Processor.scala 67.42% <71.42%> (+0.22%) ⬆️
...la/com/vesoft/exchange/common/config/Configs.scala 66.15% <100.00%> (+0.45%) ⬆️
.../vesoft/exchange/common/config/SchemaConfigs.scala 71.87% <100.00%> (+0.90%) ⬆️
...vesoft/exchange/common/writer/FileBaseWriter.scala 85.71% <100.00%> (+85.71%) ⬆️
...a/com/vesoft/exchange/common/utils/HDFSUtils.scala 27.58% <0.00%> (+27.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d31546b...ebec49c. Read the comment docs.

Copy link

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done

@critical27 critical27 merged commit 78fb290 into vesoft-inc:master Jan 5, 2022
@Nicole00 Nicole00 added the doc affected PR: improvements or additions to documentation label Jan 5, 2022
data: Dataset[(Array[Byte], Array[Byte])],
partitionNum: Int): Dataset[(Array[Byte], Array[Byte])] = {
import spark.implicits._
data.rdd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use repartition directly?

Copy link
Contributor Author

@Nicole00 Nicole00 Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use repartition directly?

Dataframe doesn't have customed repartition function, it's RDD's function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc affected PR: improvements or additions to documentation
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants