Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for data write to MySQL storage table #2294

Merged
merged 8 commits into from
May 15, 2018

Conversation

sundy-li
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@sundy-li
Copy link
Contributor Author

Almost done, could someone help review this?

@tobad357
Copy link

@sundy-li as a potential user it would be nice to have a config option to allow ON DUPLICATE KEY

@zhang2014
Copy link
Contributor

I'm also interested in this task. Maybe we can work together to make it better?
I've listed several improvements that we might be able to do together : ) :

  • Build write to mysql using prepared_statement
  • Support INSERT INTO mysql() SELECT * FROM table
  • Add a new parameter at TableFunctionMySQL for ON DUPLICATE KEY

Look forward to your reply @sundy-li : )

@sundy-li
Copy link
Contributor Author

sundy-li commented May 6, 2018

@zhang2014 Great to hear that. Let's work it together!!!

query.execute();
}

Blocks splitBlocks(const Block & block, const size_t & max_rows) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excessive copy when block is small enough.

WriteBufferFromOwnString sqlbuf;
sqlbuf << "INSERT INTO " << remote_database_name << "." << remote_table_name << " ( " << block.dumpNames() << " ) VALUES ";

auto writer = std::make_shared<BlockOutputStreamFromRowOutputStream>(std::make_shared<ValuesRowOutputStream>(sqlbuf), sample_block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better way is to use FormatFactory (for looser coupling).

void writeBlockData(const Block & block)
{
WriteBufferFromOwnString sqlbuf;
sqlbuf << "INSERT INTO " << remote_database_name << "." << remote_table_name << " ( " << block.dumpNames() << " ) VALUES ";
Copy link
Member

@alexey-milovidov alexey-milovidov May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backQuote names

@alexey-milovidov
Copy link
Member

As I remember, the most efficient way to insert data is LOAD DATA LOCAL INFILE, but current solution seems good enough and I don't care.

@alexey-milovidov
Copy link
Member

Support INSERT INTO mysql() SELECT * FROM table

I see no reason why it shouldn't work automatically.

Add a new parameter at TableFunctionMySQL for ON DUPLICATE KEY

And REPLACE. It can be postponed for the next commit.

void write(const Block & block) override
{
auto blocks = splitBlocks(block, max_batch_rows);
mysqlxx::Transaction trans(entry);
Copy link
Member

@alexey-milovidov alexey-milovidov May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inserting in a single transaction should be optional.
Sometimes user is going to export huge amount of data and would like to watch while it loads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But user did't know how much data it inserts, it is hard to control. So I make it in a single transaction.

@sundy-li
Copy link
Contributor Author

I see no reason why it shouldn't work automatically.

Because Table functions can only be specified in the FROM clause for select usage.

Add a new parameter at TableFunctionMySQL for ON DUPLICATE KEY

The ON DUPLICATE KEY and REPLACE INTO clauses are only used in the MySQL insert query. But from now on table functions can only be select usage. So will it be ok to add this two config option as the engine configs?

Such as Use ENGINE = mysql(host_port, database_name, table_name, user_name, password, on_duplicate_clause, replace_query ) . The user could create another table if he wants to have different on duplicate settings.

@zhang2014
Copy link
Contributor

I created a issues[#2346] to track subsequent changes.

@alexey-milovidov
Copy link
Member

Because Table functions can only be specified in the FROM clause for select usage.

We have INSERT INTO TABLE FUNCTION.

@alexey-milovidov
Copy link
Member

Could you please also add something into dbms/tests/integration?

@sundy-li
Copy link
Contributor Author

@alexey-milovidov thanks for your detail comment, I will do those improvement.

@alexey-milovidov alexey-milovidov merged commit 0eace5f into ClickHouse:master May 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants