Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Hardlimit #102

Open
wants to merge 331 commits into
base: twitter-master
Choose a base branch
from

Conversation

xiaoyao1991
Copy link

No description provided.

Bill Graham and others added 30 commits February 11, 2016 15:40
…scovery

Add a few logs in DiscoveryNodeManager to debug 'No worker nodes' error.
…_queries

Add query logging of all presto queries
…sitive_parquet_column_match

Look up parquet columns by name case-insensitive
…are_parquet_column_match

Handle hive keywords when doing a name-based parquet field lookup
[maven-release-plugin]  copy for tag 0.141

# Conflicts:
#	pom.xml
@xiaoyao1991 xiaoyao1991 changed the title init Kafka Hardlimit Aug 14, 2017
@xiaoyao1991
Copy link
Author

@maosongfu

@@ -76,7 +76,17 @@
/**
* Fetch size

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more about what Fetch Size is

private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE);

/**
* Default Query Interval

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a variable. So:

  1. Don't call it defaultXXXX.
  2. Default value should be a final value. Here, it is good to have one final Duration value called DEFAULT_QUERY_RANGE and a variable called queryInterval. And then assign the default value of the variable with the constant value.

Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.

Copy link

@maosongfu maosongfu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basing on the code logic, i.e. this value can be assigned. It is not a default value. Default value is a constant value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultQueryInterval is the variable and Duration.valueOf("10m") is the default value for this variable which is a constant value.

Copy link

@maosongfu maosongfu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above:

This is a variable. So:

  1. Don't call it defaultXXXX.
  2. Default value should be a final value. Here, it is good to have one final Duration value called DEFAULT_QUERY_RANGE and a variable called queryInterval. And then assign the default value of the variable with the constant value.

Call it queryInterval if it is a variable. (Or queryRange can be more descriptive). Otherwise, it is confusing; people need to go through the code logic to understand it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above:

But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.

private Duration defaultQueryInterval = Duration.valueOf("10m");

/**
* Hard limit on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.

/**
* Hard limit on
*/
private boolean hardLimitOn = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHardLimitOn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. "isHardLimitOn" should be a method instead of a variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the variable name could be hardLimited

{
return fetchSize;
}

@Config("kafka.fetch-size")
public KafkaConnectorConfig setFetchSize(int fetchSize)
public KafkaConnectorConfig setFetchSize(String fetchSize)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling you want to use the Builder pattern to achieve the immutable Config class. But it seems self-finished; not that great.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this class is reading the config files

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -225,6 +227,11 @@ public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTable
}

log.info("startTs: %s, endTs: %s", startTs, endTs);
if (config.isHardLimitOn() && startTs == null && endTs == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how the hard_limit is applied. Can you add more comments to make it clear? Especially for the for (Map.Entry<ColumnHandle, Domain> entry : columnHandleDomainMap.entrySet()) loop above.

Also, could we check whether it meets the expectations:

  1. If users specify a time range exceeding the limitation, an exception should be thrown directly. Otherwise, the behavior is changed silently and unexpectedly.
  2. If users do not specify the range, we could either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users.

Copy link
Collaborator

@Yaliang Yaliang Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some thoughts about your expectations:

  1. If the user doesn't know how to specify a range, they will fall into default query interval.
  2. If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.

Copy link

@maosongfu maosongfu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If the user doesn't know how to specify a range, Presto should either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users. Silently unexpected behavior is bad.

  2. A design should be robust to users' mistakes/errors, rather than to rely on users always providing safe input. Also, this is what limitation means: it should not happen.

If users want to query exceeding the limitation, they need to notify all stakeholders, for instance, messaging team, since it could potentially break the whole kafka service, leading to a SEV0 or SEV1.

@@ -124,7 +124,7 @@ private static String kafkaTopicName(TpchTable<?> table)
public static Session createSession()
{
return testSessionBuilder()
.setCatalog("kafka")
.setCatalog("kafka07")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. I like the version tag attached.

private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE);

/**
* Default Query Interval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.

{
return fetchSize;
}

@Config("kafka.fetch-size")
public KafkaConnectorConfig setFetchSize(int fetchSize)
public KafkaConnectorConfig setFetchSize(String fetchSize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this class is reading the config files

/**
* Hard limit on
*/
private boolean hardLimitOn = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. "isHardLimitOn" should be a method instead of a variable.

@@ -225,6 +227,11 @@ public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTable
}

log.info("startTs: %s, endTs: %s", startTs, endTs);
if (config.isHardLimitOn() && startTs == null && endTs == null) {
Copy link
Collaborator

@Yaliang Yaliang Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some thoughts about your expectations:

  1. If the user doesn't know how to specify a range, they will fall into default query interval.
  2. If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants