Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC: High-performance import tool for TiKV #9778

Open
5 tasks done
andylokandy opened this issue Mar 10, 2021 · 6 comments
Open
5 tasks done

GSoC: High-performance import tool for TiKV #9778

andylokandy opened this issue Mar 10, 2021 · 6 comments
Assignees
Labels
gsoc-program type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@andylokandy
Copy link
Contributor

andylokandy commented Mar 10, 2021

Description

TiKV has two import tools, one is the importer which has been passively maintained since 4.0 and lack some new optimizations, the other one is tidb-lightning which has been merged into BR recently, but the tidb-lightning only supports importing SQL-like data into TiKV.

This issue aims to split tidb-lightning into two parts: (1) translating SQL data into KV pair, and (2) importing KV pairs into TiKV. By splitting into these two steps, we can support importing KV-like data into TiKV, e.g. migrating from Cassandra to TiKV. The size of data being imported is about 50TB.

Related issue: pingcap/br#764

GSoC Program information

  • Mentor of this issue: @andylokandy @kennytm
  • Recommended skills: Golang / Rust
  • Estimated Workloads: 3 Man-Month

Milestones and action items

Milestone 1: Implement the import tool with simple batch_put.

  • Read local CSV file and sort it by Pebble.
  • Parallelly batch_put the sorted data.
  • Benchmark the batch_put

Milestone 2: Implement the import tool with Ingest SST.

  • Make a java package environment for TiKV Spark Ingest executor.
  • Implement ingest API for Java client.
@andylokandy andylokandy added the type/feature-request Categorizes issue or PR as related to a new feature. label Mar 10, 2021
@andylokandy andylokandy changed the title Implement import tool for TiKV GSoC: High-performance import tool for TiKV Jun 7, 2021
@andylokandy
Copy link
Contributor Author

/assign @Abingcbc

@ti-chi-bot
Copy link
Member

@andylokandy: GitHub didn't allow me to assign the following users: Abingcbc.

Note that only tikv members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @Abingcbc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Abingcbc
Copy link

Abingcbc commented Jun 8, 2021

/assign @Abingcbc

@mikechengwei
Copy link
Contributor

I want to export from one tikv to another tikv , do you have a good solution? @andylokandy

@andylokandy
Copy link
Contributor Author

andylokandy commented Jun 9, 2021

I want to export from one tikv to another tikv , do you have a good solution?

@mikechengwei in this case you may want to use BR.

@Abingcbc
Copy link

Related PRs:
Abingcbc/bulk-loader#1
Abingcbc/bulk-loader#2
Abingcbc/bulk-loader#3
tikv/client-java#254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc-program type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants