ClickHouse Kafka Performance Issue #2169

ennio1991 · 2018-04-04T15:01:26Z

Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/

I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.

In the Kafka topic I am getting around 150 messages per second.

Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.

Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka

Should I set some particular configuration?

I tried to change the configurations from the cli:

SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750

But there was no improvement

Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?

The text was updated successfully, but these errors were encountered:

jonatasfreitasv · 2018-04-05T03:30:27Z

Hi @ennio1991.

I'm too use Kafka Engine with MergeTree Engine in topic with a lot of messages per second, 10-20k, in real-time.

But have a lot of variables to get real-time performance.

Some questions:

What hardware with ClickHouse Server?

What is the MergeTree and Kafka Engine table struct?

How many partitions your Kafka Topic has?

vavrusa · 2018-04-05T07:56:50Z

@ennio1991 in your case the problem is that the event rate is low, so you have to adjust the settings accordingly to get lower latency. The max_block_size and stream_flush_interval_ms should work, but you need to set those in the environment before you create/attach tables. The CREATE TABLE ... SETTINGS x=y might also work I think.

ennio1991 · 2018-04-06T08:48:20Z

Hi @vavrusa I didn't find any info about the SETTINGS syntax, can you give me more info?

CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
)
ENGINE = MergeTree(day, (day, UserId), 8192)
SETTINGS max_block_size=10 
;

Something like this is not working

ennio1991 · 2018-04-06T08:55:48Z

@javisantana
I have a single m4.xlarge machine on aws.

Here the structure:

  CREATE TABLE games (
    UserId UInt32,
    ActivityType UInt8,
    Amount Float32,
    CurrencyId UInt8,
    Date String
  ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');


CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
)
ENGINE = MergeTree(day, (day, UserId), 8192);


  CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
    AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
    FROM default.games;

I have a topic with 3 partitions

ennio1991 · 2018-04-18T13:20:18Z

@vavrusa any news?

vavrusa · 2018-04-18T18:43:22Z

It's probably not supported by the SQL parser for anything else than MergeTree. You can configure it for the default user profile in users.xml like:

<yandex>
    <profiles>
        <default>
           <max_block_size>100</max_block_size>
        </default>
    </profiles>
</yandex>

I'm not sure how is the time limit applied in the Union stream pulling data from RowInputStream, but I suspect it only checks the time limit after a full block is pulled, cc @proller

ennio1991 · 2018-04-19T08:43:02Z

Hi @vavrusa, thanks!! in this way it works, but in this way, this configuration is applied to all the tables..

vavrusa · 2018-10-16T18:30:22Z

Added per-table setting in #3396

elprup mentioned this issue Jun 14, 2018

High latency when importing stream from kafka #2508

Closed

millin mentioned this issue Aug 1, 2018

Kafka, MergeTree долгая запись. #2382

Closed

alexey-milovidov closed this as completed Jan 26, 2019

filimonov added the comp-kafka Kafka Engine label May 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClickHouse Kafka Performance Issue #2169

ClickHouse Kafka Performance Issue #2169

ennio1991 commented Apr 4, 2018

jonatasfreitasv commented Apr 5, 2018 •

edited

vavrusa commented Apr 5, 2018

ennio1991 commented Apr 6, 2018 •

edited

ennio1991 commented Apr 6, 2018

ennio1991 commented Apr 18, 2018

vavrusa commented Apr 18, 2018

ennio1991 commented Apr 19, 2018 •

edited

vavrusa commented Oct 16, 2018

ClickHouse Kafka Performance Issue #2169

ClickHouse Kafka Performance Issue #2169

Comments

ennio1991 commented Apr 4, 2018

jonatasfreitasv commented Apr 5, 2018 • edited

vavrusa commented Apr 5, 2018

ennio1991 commented Apr 6, 2018 • edited

ennio1991 commented Apr 6, 2018

ennio1991 commented Apr 18, 2018

vavrusa commented Apr 18, 2018

ennio1991 commented Apr 19, 2018 • edited

vavrusa commented Oct 16, 2018

jonatasfreitasv commented Apr 5, 2018 •

edited

ennio1991 commented Apr 6, 2018 •

edited

ennio1991 commented Apr 19, 2018 •

edited