-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29028: Iceberg: Implement auto compaction #5886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7c50524
to
b75da71
Compare
b75da71
to
05f666a
Compare
05f666a
to
c6cbb86
Compare
c6cbb86
to
dc06a7c
Compare
plenty of reactions, but no reviews :) |
@@ -2381,4 +2384,15 @@ private static List<FieldSchema> schema(List<VirtualColumn> exprs) { | |||
private static List<FieldSchema> orderBy(VirtualColumn... exprs) { | |||
return schema(Arrays.asList(exprs)); | |||
} | |||
|
|||
@Override | |||
public CompactionType determineCompactionType(org.apache.hadoop.hive.metastore.api.Table table, CompactionInfo ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't belong here. maybe CompactionEvaluator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed from HiveIcebergStorageHandler
* @throws IOException If an I/O error occurs during the compaction evaluation process. | ||
* @throws UnsupportedOperationException if the implementing storage handler does not support this operation. | ||
*/ | ||
default CompactionType determineCompactionType(Table table, CompactionInfo ci) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not responsibility of StorageHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
* A class to initiate Iceberg compactions. This will run in a separate thread. | ||
* It's critical that there exactly 1 of these in a given warehouse. | ||
*/ | ||
public class IcebergInitiator extends InitiatorBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move it under iceberg-handler, org.apache.iceberg.mr.hive.compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
*/ | ||
public abstract class InitiatorBase extends MetaStoreCompactorThread { | ||
|
||
protected ExecutorService compactionExecutor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to disable certain Initiators individually? I don't think we need 2 processes, but 1 running multiple tasks:
- AcidTableOptimizer
- IcebergTableOptimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both should return a list of tasks the Initiator needs to schedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a way to disable certain initiators individually by introducing a new Hive conf that contains list of active initiator classes and refactored it to be one process instead of 2.
0023436
to
2830bda
Compare
2541f2d
to
4af67bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly +1. I left some minor comments
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/TableOptimizer.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/TableOptimizer.java
Outdated
Show resolved
Hide resolved
4af67bc
to
c5290ea
Compare
c5290ea
to
77bc8d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the remaining issues!
|
Thanks, @deniskuzZ and @okumin for code reviews! |
What changes were proposed in this pull request?
Iceberg auto compaction. In this initial version,
IcebergInitiator
remembers the last checked snapshot id of all Iceberg tables and it compares the current snapshot id to the cached snapshot id to determine if further evaluation is needed.Why are the changes needed?
Currently, Iceberg compaction can only be triggered manually.
This PR introduces automatic Iceberg compaction which periodically checks all Iceberg tables and schedules compactions when necessary.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit tests, precommit tests.