Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts #3174

Closed
Jack012a opened this issue Sep 19, 2018 · 22 comments
Labels
question Question? st-need-info We need extra data to continue (waiting for response)

Comments

@Jack012a
Copy link

ClickHouse client version 18.6.0.
Connected to ClickHouse server version 18.6.0 revision 54401.

Hello all,
I have been working on this issue for almost a week and still see the problem coming back.

The following changes have been made on config.xml:

<merge_tree>
    <parts_to_delay_insert>300</parts_to_delay_insert>
    <parts_to_throw_insert>600</parts_to_throw_insert>
    <max_delay_to_insert>2</max_delay_to_insert>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>

The input data size: 177GB
The total number of input files: 2050
Average size of each file: 110MB
Command used to import the data:
cat *.clickhouse | clickhouse-client --query="INSERT INTO default.MyTable(....")...
Question 1> Based on the above command, how many insert statement generated?
Is this a bulk insertion?

After the change, the system still reports the same errors as below.
Question 2> What else should I try? Increase the size again?

Thank you very much

2018.09.19 10:25:02.114885 [ 49 ] executeQuery: Code: 252, e.displayText() = DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts., e.what() = DB::Exception (from 127.0.0.1:46902) (in query: INSERT INTO default.MyTable(timestamp, ...) FORMAT RowBinary), Stack trace:

  1. clickhouse-server(StackTrace::StackTrace()+0x16) [0x48abed6]
  2. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0x2682dcf]
  3. clickhouse-server(DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*) const+0x3a7) [0x431ccc7]
  4. clickhouse-server(DB::MergeTreeBlockOutputStream::write(DB::Block const&)+0x2c) [0x4314cfc]
  5. clickhouse-server(DB::PushingToViewsBlockOutputStream::write(DB::Block const&)+0x31) [0x44b38a1]
  6. clickhouse-server(DB::AddingDefaultBlockOutputStream::write(DB::Block const&)+0x9a1) [0x44442c1]
  7. clickhouse-server(DB::SquashingBlockOutputStream::write(DB::Block const&)+0x363) [0x44b91a3]
  8. clickhouse-server(DB::CountingBlockOutputStream::write(DB::Block const&)+0x1d) [0x4468a3d]
  9. clickhouse-server(DB::TCPHandler::receiveData()+0x8d) [0x268b63d]
  10. clickhouse-server(DB::TCPHandler::receivePacket()+0x85) [0x268c435]
  11. clickhouse-server(DB::TCPHandler::readData(DB::Settings const&)+0x19b) [0x268c8eb]
  12. clickhouse-server(DB::TCPHandler::processInsertQuery(DB::Settings const&)+0x1d6) [0x268cc86]
  13. clickhouse-server(DB::TCPHandler::runImpl()+0x43c) [0x268d24c]
  14. clickhouse-server(DB::TCPHandler::run()+0x1c) [0x268e0ac]
  15. clickhouse-server(Poco::Net::TCPServerConnection::start()+0xf) [0x49eefaf]
  16. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x110) [0x49ef610]
  17. clickhouse-server(Poco::PooledThread::run()+0x77) [0x4a4df27]
  18. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x38) [0x4a4b488]
  19. clickhouse-server() [0x4d3fe1f]
  20. /lib64/libpthread.so.0(+0x7e25) [0x7f912655ae25]
  21. /lib64/libc.so.6(clone+0x6d) [0x7f9125d8234d]

2018.09.19 10:25:02.162124 [ 49 ] ServerErrorHandler: Code: 99, e.displayText() = DB::Exception: Unknown packet 44 from client, e.what() = DB::Exception, Stack trace:

  1. clickhouse-server(StackTrace::StackTrace()+0x16) [0x48abed6]
  2. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0x2682dcf]
  3. clickhouse-server(DB::TCPHandler::receivePacket()+0x2e0) [0x268c690]
  4. clickhouse-server(DB::TCPHandler::runImpl()+0x2b0) [0x268d0c0]
  5. clickhouse-server(DB::TCPHandler::run()+0x1c) [0x268e0ac]
  6. clickhouse-server(Poco::Net::TCPServerConnection::start()+0xf) [0x49eefaf]
  7. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x110) [0x49ef610]
  8. clickhouse-server(Poco::PooledThread::run()+0x77) [0x4a4df27]
  9. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x38) [0x4a4b488]
  10. clickhouse-server() [0x4d3fe1f]
  11. /lib64/libpthread.so.0(+0x7e25) [0x7f912655ae25]
  12. /lib64/libc.so.6(clone+0x6d) [0x7f9125d8234d]
@filimonov
Copy link
Contributor

filimonov commented Sep 20, 2018

The main requirement about insert to Clickhouse: you should never send too many INSERT statements per second. Ideally - one insert per second / per few seconds.

So you can insert 100K rows per second but only with one big bulk INSERT statement. When you send hundreds / thousands insert statements per second to *MergeTree table you will always get some errors, and it can not be changed by adjusting some settings.

If you can't combine lot of inserts into one big bulk insert statement outside - then you should create Buffer table before *MergeTree table.

@Jack012a
Copy link
Author

Jack012a commented Sep 20, 2018

Hello filimonov,

I use the following command to insert the data into ClickHouse.

cat *.clickhouse | clickhouse-client --query="INSERT INTO default.MyTable(....")...

Since the number of files are large, I don't think many insert statement can be triggered.

Question> Based on the above statement, how many insert statement has been issues?
One? Or the number of inserted files?

Thank you

@filimonov
Copy link
Contributor

filimonov commented Sep 20, 2018

Sorry, didn't follow your first post.
You have opposite problem - you're trying to put 200Gb of data with one very huge insert.

Can't you just insert your files one by one with tiny delays?

for i in *.clickhouse; do
 [ -f "$i" ] || break;
  echo "Processing $i"
  cat $i | clickhouse-client --query="INSERT INTO default.MyTable(....)" ... ;
  sleep 0.1;
done

@Jack012a
Copy link
Author

Jack012a commented Sep 20, 2018

Hello filimonov,

I do have some issues on understanding the meaning of the following config.

<merge_tree>
/** If table contains at least that many active parts, artificially slow down insert into table. /
<parts_to_delay_insert>300</parts_to_delay_insert>
/
* If more than this number active parts, throw 'Too many parts ...' exception */
<parts_to_throw_insert>600</parts_to_throw_insert>
<max_delay_to_insert>2</max_delay_to_insert>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>

Question 1> Why does clickhouse have to throw when the limit of the parts_to_throw_insert has been hit?

Question 2> What is the real meaning of parts here? Is it true that the parts are the individual files located on /var/lib/clickhouse/data/default/MyTable/...?
What if I increase parts_to_throw_insert to 100000? Does this cause any potential issues?

Question 3> By decreasing the value in <max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>, will this cause performance issue? My current workstation has 512GB memory. Since the machine has large memory capacity, I really want to improve the insert/merge performance if that is possible.

Question 4> I have increased the value in <max_delay_to_insert>60</max_delay_to_insert>.
Will this cause performance issue?

Thank you

@filimonov
Copy link
Contributor

filimonov commented Sep 21, 2018

  1. Each insert create a folder in /var/lib/clickhouse/.../table_name/. Inside that folder there are 2 files per each column - one with data (compressed), second with index. Data is physically sorted by primary key inside those files. Those folders are called 'parts'.
  2. ClickHouse merges those smaller parts to bigger parts in the background. It chooses parts to merge according to some rules. After merging two (or more) parts one bigger part is being created and old parts are queued to be removed. The settings you list allow finetuning the rules of merging parts. The goal of merging process - is to leave one big part for each partition (or few big parts per partition which are not worth to merge because they are too big). Please check also that comment.
  3. if you create new parts too fast (for example by doing lot of small inserts) and ClickHouse is not able to merge them with proper speed (so new parts come faster than ClickHouse can merge them) - then you get the exception 'Merges are processing significantly slower than inserts'. You can try to increase the limit but you can get the situation then you get filesystem problems caused by the too big number of files / directories (like inodes limit).
  4. If you insert to lot of partitions at once the problem is multiplied by the number of partitions affected by insert.
  5. You can try to adjust the behaviour of clickhouse with one of the listed settings, or with max_insert_block_size / max_block_size / insert_format_max_block_size / max_client_network_bandwidth. But: the better solution is just to insert data in expected tempo. The expected tempo is: one insert per 1-2 sec, each insert containing 10K-500K rows of data.
  6. So proper solution to solve "Merges are processing significantly slower than inserts" is to adjust the number of inserts per second and number of rows in each insert. Use batch insert to combine small inserts into one bigger if data comes row-by-row. Throttle huge inserts if you have too much data to insert at once. Don't change clickhouse internals, unless you really understand well what does they it mean.
  7. If your data comes faster than 500K rows per second - most probably you need more servers in the cluster to serve that traffic, not the adjustment of settings.
  8. the speed of background merges usually depends on storage speed, used compression settings, and mergetree option, i.e. merge algorithm - plain merge / aggregating / summing / collapsing etc. & used soring key.

@M1ha-Shvn
Copy link

M1ha-Shvn commented Jan 12, 2019

Hi. And what can I do, if inserts are rare and small? I want near realtime processing, So I can't form greater batches. I have up to 1k rows in a batch (3kb-2mb on disc), one insert in 30 seconds. And it seems ClickHouse doesn't merge parts, collect 300 on this table, but it hasn't reached some minimal merge size (even if I stop inserts at all, parts are not merged). So insert process stucks - no more inserts are allowed due to max parts limits, but parts are not merged as total sum of partition sizes is small.
P.s. configuration parameters above are not per table, as I understood - I don't want to affect total cluster this way, only tables I need.

@ishirav
Copy link

ishirav commented Jan 13, 2019

We also have small inserts every 10 seconds (into dozens of separate tables), so we put a buffer table in front of every MergeTree table. So data is accumulated in memory, and when the buffer gets full it's flushed to disk. This way the number of parts that are created remains small.
You can run your queries against the buffer table, so you'll get both the old data (which resides on disk) and the new data (which is still in memory).

@simPod
Copy link
Contributor

simPod commented Apr 13, 2019

Note to my future self and and also others struggling. In my case there were mutations that looked quite heavy / stale. Check also for them then.

@filimonov is there a way to for example list the parts that are waited for to be merged?

@filimonov
Copy link
Contributor

filimonov commented Apr 16, 2019

@filimonov is there a way to for example list the parts that are waited for to be merged?

If there are more than one part per partition and their sizes are smaller than maximum part size - that means they will be chosen for merge someday. The smaller part is the bigger chance that it will be merged soon.

Check also system.mutations system.merges and system.part_log (last one should be enabled in config).

@blinkov
Copy link
Contributor

blinkov commented May 1, 2019

@q0987 do you have any further questions?

@blinkov blinkov added the st-need-info We need extra data to continue (waiting for response) label May 1, 2019
@WeiGangqiang
Copy link

Hi. And what can I do, if inserts are rare and small? I want near realtime processing, So I can't form greater batches. I have up to 1k rows in a batch (3kb-2mb on disc), one insert in 30 seconds. And it seems ClickHouse doesn't merge parts, collect 300 on this table, but it hasn't reached some minimal merge size (even if I stop inserts at all, parts are not merged). So insert process stucks - no more inserts are allowed due to max parts limits, but parts are not merged as total sum of partition sizes is small.
P.s. configuration parameters above are not per table, as I understood - I don't want to affect total cluster this way, only tables I need.

i have same problem, even i stop insert about 4 hours, no more part is merged ,most of them are small parts

@simPod
Copy link
Contributor

simPod commented May 24, 2019

@WeiGangqiang also check for mutations is_done=0

@WeiGangqiang
Copy link

WeiGangqiang commented May 24, 2019

isDone

how to check mutations is Done?

@simPod
Copy link
Contributor

simPod commented May 24, 2019

select * from system.mutations where is_done = 0;

@xmariachi
Copy link

@filimonov If there are more than one part per partition and their sizes are smaller than maximum part size - that means they will be chosen for merge someday. The smaller part is the bigger chance that it will be merged soon.

Does that mean, that if the sizes are bigger than the maximum part size, they will never be merged?
If so, and If max value is increased once it happened, will they eventually be merged then?

@den-crane
Copy link
Contributor

Does that mean, that if the sizes are bigger than the maximum part size, they will never be merged?

Yes.

If so, and If max value is increased once it happened, will they eventually be merged then?

Yes. But the merge of too large parts could be unreasonable because it will take a lot of CPU and IO resources for a long time and will provide zero speed-up for future selects.

@simper66
Copy link

So, the same primary key could be in different parts forever? Because it is at least in one part that has reached its max size?

@den-crane
Copy link
Contributor

So, the same primary key could be in different parts forever? Because it is at least in one part that has reached its max size?

yes

@simper66
Copy link

So, we have to avoid that at any cost, right? Because the summing, aggregating, replacing trees and sparse indexes would be somehow broken?

Perhaps making the partitioning key more granular, adding shards, or whatever?

@den-crane
Copy link
Contributor

den-crane commented Aug 15, 2019

So, we have to avoid that at any cost, right? Because the summing, aggregating, replacing trees and sparse indexes would be somehow broken?

Nothing is broken. The opposite.
Merges are eventual.
At any moment your queries should expect that data in the table is in not final state.
There is no difference that final merge still did not happen or never will happen.

@nzb15555196162
Copy link

Hi. And what can I do, if inserts are rare and small? I want near realtime processing, So I can't form greater batches. I have up to 1k rows in a batch (3kb-2mb on disc), one insert in 30 seconds. And it seems ClickHouse doesn't merge parts, collect 300 on this table, but it hasn't reached some minimal merge size (even if I stop inserts at all, parts are not merged). So insert process stucks - no more inserts are allowed due to max parts limits, but parts are not merged as total sum of partition sizes is small.
P.s. configuration parameters above are not per table, as I understood - I don't want to affect total cluster this way, only tables I need.
hi friend i meet similar problem, seems like yours, so how do you slove this problem? can you give me some suggestion?

@yasamprom
Copy link

@filimonov Could you please explain how recommended batch size was calculated?

one insert per 1-2 sec, each insert containing 10K-500K rows of data.

I tried using less inserts with around 20kk rows and it gave me a good boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question? st-need-info We need extra data to continue (waiting for response)
Projects
None yet
Development

No branches or pull requests