Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system.sync_partition_metadata procedure to sync Hive table partitions #223

Merged
merged 1 commit into from
Feb 26, 2019
Merged

Add system.sync_partition_metadata procedure to sync Hive table partitions #223

merged 1 commit into from
Feb 26, 2019

Conversation

luohao
Copy link
Member

@luohao luohao commented Feb 13, 2019

Fix #174

@cla-bot cla-bot bot added the cla-signed label Feb 13, 2019
@ebyhr
Copy link
Member

ebyhr commented Feb 13, 2019

@luohao
Copy link
Member Author

luohao commented Feb 13, 2019

How about adding smoke test like system.create_empty_partition or product test?

I initially had very comprehensive unit tests in TestHiveIntegrationSmokeTest. But then I realize those tests don't make sense because the way FileHiveMetastore::getPartitionNames works is that it scans the directory and return all partitions in file system. So in the end I did some manual testing which is equivilant to what I had in smoke tests.

product tests might be possible. I need some time to figure out how to add one. Do you know a good example that I can follow?

@ebyhr
Copy link
Member

ebyhr commented Feb 13, 2019

I think TestHivePartitionsTable.java
would be helpful. You can execute hive query using onHive().executeQuery("~~~"); if needed.

// partitions in metastore but not in file system
List<String> partitionsToDrop = difference(partitionsInMetastore, partitionsInFileSystem);
// partitions in file system but not in metastore
List<String> partitionsToAdd = difference(partitionsInFileSystem, partitionsInMetastore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about moving these lines in if block?

@luohao
Copy link
Member Author

luohao commented Feb 13, 2019

I think TestHivePartitionsTable.java would be helpful. You can execute hive query using onHive().executeQuery("~~~"); if needed.

Thanks! I added a product test for msck procedure.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Thanks for adding product tests for this.

@@ -559,6 +559,10 @@ Procedures

Create an empty partition in the specified table.

* ``system.msck(schema_name, table_name, add, drop)``

Check partitions in metastore and in file system. Repair the partitions(``add`` and ``drop``) if requested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are possible values of add and drop parameters? what's their type?

from the code i know they are booleans, but initially i thought they could be a list of partitions to add or something. I would consider renaming add to eg discover_new_partitions.

@electrum ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the name msck is not descriptive, and having two booleans is confusing. Also, there are only three valid states, not four, since having both be false would do nothing.

What about having three functions:

  • add_partitions
  • drop_partitions
  • sync_partitions

These are be similar to the MSCK options for Hive. I thought of some longer names, but it's hard to be descriptive without making the names too long, and it seems good to have add/drop be symmetric.

For the docs, rather than saying "repair", something more direct like

Add any partitions that exist on the file system but not in the metastore.

Drop any partitions that exist in the metastore but not on the file system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, how about let's make it something like

system.repair_partitions(schema, table, mode)

where mode (VARCHAR type) is a one of ADD, DROP or SYNC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the name “repair” since it implies we are fixing them (because they are damaged?), when in reality we are synchronizing the metadata with the file system.

Although if we want to stick with the “repair” name, or any single name, your proposal seems good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea, based on yours: sync_partition_metadata with modes ADD, DROP, FULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_partition_metadata looks good. I'll update the PR with the new name.

import static java.util.Objects.requireNonNull;
import static java.util.stream.Collectors.toList;

public class MsckProcedure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you positive this does the same job as MSCK in Hive?
Or, maybe, Hive's MSCK has some additional functionalities besides discovering partitions? (In the latter case, we could pick a different name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are almost equivalent.

  • Hive DDL command takes two parameters: repair and [ ADD | DROP | SYNC]. It will update metastore if repair is set to true, otherwise just print out information about information about stale/missing partitions. ADD option will add any partitions that exist on HDFS but not in metastore to the metastore; DROP option will remove the partition that is already removed from file system from metastore; SYNC option will do both ADD and DROP.

  • Presto msck procedure does the same thing in a more straightforward way. It takes two parameters for

    • add: add any partitions that exist on HDFS but not in metastore
    • drop: remove the partition that is already removed from file system from metastore

The combination of add and drop flag in Presto msck can achieve the same goal of [ ADD | DROP | SYNC] option in Hive.

The differences are:

  • Presto does not print partition infomation if both add and drop are set to false. I should probably fix this so Presto msck also print out info.
  • Hive support batching to avoid OOM when there's a large number of untracked partitions. I am not sure if this is necessary in Presto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a procedure return something to the client seems impossible. I looked at the io.prestosql.execution.CallTask::execute and it ignores the return value of invokeWithArguments, as shown at https://github.com/prestosql/presto/blob/master/presto-main/src/main/java/io/prestosql/execution/CallTask.java#L160.

@electrum

session,
table.getDatabaseName(),
table.getTableName(),
buildPartitionObject(session, table, name, new Path(location, name)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Path(location, name)

maybe you shouldn't relativize( above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to collect relativized names (partition names) to figure out what needs to be added/dropped.

@Inject
private HdfsClient hdfsClient;

@Test(groups = {HIVE_PARTITIONING, SMOKE})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should run these tests in various hadoop auth setup context, like we do with "storage_formats".
maybe we add "smoke" group to wherever we have "storage_formats" in .travis.yml?

Also, i think we could make one test method {HIVE_PARTITIONING, SMOKE}, while others just HIVE_PARTITIONING.

@kokosing @electrum

hdfsClient.delete(tableLocation + "/x=b/y=2");

// add invalid partition path
hdfsClient.createDirectory(tableLocation + "/y=3/x=h");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add /x=f/y=10/y=10 (too deep nesting)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be treated as a valid partition? There could be valid data files in /x=f/y=10/.
@electrum

@luohao luohao changed the title Add system.msck procedure to repair Hive table partitions Add system.sync_partition_metadata procedure to repair Hive table partitions Feb 14, 2019
@luohao luohao changed the title Add system.sync_partition_metadata procedure to repair Hive table partitions Add system.sync_partition_metadata procedure to sync Hive table partitions Feb 14, 2019
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@luohao
Copy link
Member Author

luohao commented Feb 21, 2019

I have seen random failures in the travis-ci test due to ClassNotFound exception:

Socket Factory class not found: java.lang.ClassNotFoundException: Class org.apache.hadoop.net.SocksSocketFactory not found

After long triaging, I seem to find the fix to the issue, as shown in 8976e1e.

Honestly I don't understand the exact reason why this is happening if I have the metastore.commit() outside the try-catch block. I know Hadoop code likes to load classes at random times, but not sure if that's the cause here. My hunch is that for some reason mess up the class loader when going out the block. And when it calls getFilesystem() inside metastore.commit(), Hadoop will complains about ClassNotFound. The symptom I have seen is that after calling the hive procedure, the CALL query and all subsequent queries will fail(I have tested CREATE TABLE, SELECT and they all fail), which indicates permanent damage.

Would love to hear what you guys think about this issue, and what's the right way to fix it.
@dain @electrum @findepi

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented as to the cause of the file system errors. There were also a few nits, otherwise looks good.

The commit title is too long, how about

Add Hive procedure to sync table partitions

Please ping me when updated and I can merge this.

@electrum electrum added this to the 304 milestone Feb 26, 2019
@electrum electrum merged commit 78cde41 into trinodb:master Feb 26, 2019
@electrum electrum mentioned this pull request Feb 26, 2019
6 tasks
@findepi
Copy link
Member

findepi commented Feb 26, 2019

@luohao thanks for delivering this. This is really a useful feature!

@rherasm
Copy link

rherasm commented Sep 30, 2020

HI all,

CALL system.create_empty_partition works fine on my Presto (node_version 0.227), however

CALL system.sync_partition_metadata raises this error:

FAILED [CALL - 0 rows, 0.223 secs] [Code: 29, SQL State: ] Query failed (#20200930_101605_13561_nycb9): Procedure not registered: system.sync_partition_metadata

Does anyone know about this problem? Thx !

@ebyhr
Copy link
Member

ebyhr commented Sep 30, 2020

@rherasm The procedure was added in version 304. Please upgrade your Presto cluster.
https://prestosql.io/download.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add Hive procedure to recover (discover) partitions
7 participants