Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse Kafka Performance Issue #2169

Closed
ennio1991 opened this issue Apr 4, 2018 · 8 comments
Closed

ClickHouse Kafka Performance Issue #2169

ennio1991 opened this issue Apr 4, 2018 · 8 comments
Labels
comp-kafka Kafka Engine

Comments

@ennio1991
Copy link

Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/

I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.

In the Kafka topic I am getting around 150 messages per second.

Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.

Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka

Should I set some particular configuration?

I tried to change the configurations from the cli:

SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750

But there was no improvement

Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?

@jonatasfreitasv
Copy link
Contributor

jonatasfreitasv commented Apr 5, 2018

Hi @ennio1991.

I'm too use Kafka Engine with MergeTree Engine in topic with a lot of messages per second, 10-20k, in real-time.

But have a lot of variables to get real-time performance.

Some questions:

What hardware with ClickHouse Server?

What is the MergeTree and Kafka Engine table struct?

How many partitions your Kafka Topic has?

@vavrusa
Copy link
Contributor

vavrusa commented Apr 5, 2018

@ennio1991 in your case the problem is that the event rate is low, so you have to adjust the settings accordingly to get lower latency. The max_block_size and stream_flush_interval_ms should work, but you need to set those in the environment before you create/attach tables. The CREATE TABLE ... SETTINGS x=y might also work I think.

@ennio1991
Copy link
Author

ennio1991 commented Apr 6, 2018

Hi @vavrusa I didn't find any info about the SETTINGS syntax, can you give me more info?

CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
)
ENGINE = MergeTree(day, (day, UserId), 8192)
SETTINGS max_block_size=10 
;

Something like this is not working

@ennio1991
Copy link
Author

@javisantana
I have a single m4.xlarge machine on aws.

Here the structure:

  CREATE TABLE games (
    UserId UInt32,
    ActivityType UInt8,
    Amount Float32,
    CurrencyId UInt8,
    Date String
  ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');


CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
)
ENGINE = MergeTree(day, (day, UserId), 8192);


  CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
    AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
    FROM default.games;

I have a topic with 3 partitions

@ennio1991
Copy link
Author

@vavrusa any news?

@vavrusa
Copy link
Contributor

vavrusa commented Apr 18, 2018

It's probably not supported by the SQL parser for anything else than MergeTree. You can configure it for the default user profile in users.xml like:

<yandex>
    <profiles>
        <default>
           <max_block_size>100</max_block_size>
        </default>
    </profiles>
</yandex>

I'm not sure how is the time limit applied in the Union stream pulling data from RowInputStream, but I suspect it only checks the time limit after a full block is pulled, cc @proller

@ennio1991
Copy link
Author

ennio1991 commented Apr 19, 2018

Hi @vavrusa, thanks!! in this way it works, but in this way, this configuration is applied to all the tables..

@vavrusa
Copy link
Contributor

vavrusa commented Oct 16, 2018

Added per-table setting in #3396

@filimonov filimonov added the comp-kafka Kafka Engine label May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-kafka Kafka Engine
Projects
None yet
Development

No branches or pull requests

5 participants