Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Hive managed Transactional/ACID tables #576

Closed
anirbanch opened this issue Apr 3, 2019 · 13 comments
Closed

Support Hive managed Transactional/ACID tables #576

anirbanch opened this issue Apr 3, 2019 · 13 comments
Assignees
Milestone

Comments

@anirbanch
Copy link

anirbanch commented Apr 3, 2019

Presto fails to run queries against Hive managed tables. In Hive 3.x these tables are defaulted to transactional/insert-only as per https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/hive-overview/content/hive_upgrade_changes.html . Given that this is the default table type in Hive 3.x for all hive managed table, can Presto be enhanced to handle them?

Any one upgrading to Hive 3.x will have diminished use for Presto without this support, as Presto can only handle ‘external’ tables in Hive 3.x

Error Message, when trying to access a ‘transactions/insert-only’ table:

Query 20190402_173707_00215_qkyph failed: Your client does not appear to support insert-only tables.
               To skip capability checks, please set metastore.client.capability.check to false. This setting can be
               set globally, or on the client for the current metastore session. Note that this may lead to incorrect results,
               data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client
               capabilities via get_table_req API.
        at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTable(ThriftHiveMetastore.java:252)
        at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.getTable(BridgingHiveMetastore.java:81)
        at io.prestosql.plugin.hive.metastore.CachingHiveMetastore.loadTable(CachingHiveMetastore.java:274)
        at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
        at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
@findepi findepi changed the title Need support for Hive managed Transactional/acid tables Support Hive managed Transactional/ACID tables Apr 3, 2019
@shubhamtagra
Copy link
Member

Proposal for supporting Insert Only Transactional Hive tables: https://docs.google.com/document/d/1GwSNc_6jSP8N1SyJIHnaMeWvh7JzhGPZ7QKm1DHJqEw/edit?usp=sharing

@electrum @martint @dain please review. I have a wip implementation of this which is able to read Insert Only Transactional tables from Hive3

@dain
Copy link
Member

dain commented Apr 24, 2019

@electrum what do you think about create a high level issue to track this stuff? I would say the proposal above is part of the Hive 3.0 support project, and a sub item of supporting new table organizations. The proposal only covers reading of insert only tables and not write or full acid tables.

@electrum
Copy link
Member

I read the proposal and it seems reasonable. One question is that it mentions deltas, which to my understanding are only for full ACID, not insert-only.

From an implementation perspective, table layouts are deprecated. I’m working on a PR to remove them from the Hive connector. I suggest waiting for that, which should be complete and merged sometime next week, then send a PR on top. I’m happy to review and work with you on it.

@shubhamtagra
Copy link
Member

@electrum I remembered there were discussion around removing table layouts but saw them in master hence went ahead with it, we can wait till your changes land and replace table layout dependency with whatever gets used to fetch partitions.
About the deltas for insert-only tables, they are created for every insert. Difference with full ACID tables is that for deletes in full ACID tables delete_deltas are additionally created.

@Jedda1314Jessie
Copy link

@electrum I have a question whether presto will provide the support for ACID of Hive in the future.Because when I used Hive to develop the data warehouse of my company, I found that some dim tables(scd) and some fact table(accumulator snapshot table) need to update.For that, I opened the transaction support of Hive but it made presto work wrong for the reason that Presto can't read HIVE's tables which support row update or delete.

@shubhamtagra
Copy link
Member

@Jedda1314Jessie we are adding the read support for ACID tables. Review has been opened for the first part to support reading INSERT ONLY ACID tables. I will be adding docs and reviews for full ACID support too soon.

@shubhamtagra
Copy link
Member

Proposal for Full ACID table reads: https://docs.google.com/document/d/1VrF48kqr_paTtF5iSwryRhZvMwL_ovtbjcAedQd7hyk/edit?usp=sharing

I have this implemented and it works fine. There are some performance optimizations possible over this which can be picked once we finalize this solution.

@dain @electrum @martint please have a look.

@heocaf
Copy link

heocaf commented Sep 3, 2019

@stagraqubole are there any latest update or schedule? thanks!

@shubhamtagra
Copy link
Member

@heocaf This is in progress. The PRs are out and under review. I cant comment on the timelines but atleast the PR for Insert Only Tables can go in fast once @electrum does the final review.
Full ACID Tables PR is just started.

@heocaf
Copy link

heocaf commented Sep 9, 2019

@stagraqubole got it, thanks for your reply!
Looks like I need to roll back to an earlier version of HDP :(

@findepi
Copy link
Member

findepi commented Mar 12, 2020

Presto 331 (pending release) can read from Hive transactional/ORC ACID tables.

The outstanding things are:

All the above is tracked by the Hive 3 umbrella issue #1218

@findepi findepi closed this as completed Mar 12, 2020
@findepi findepi added this to the 331 milestone Mar 12, 2020
@MohammedLayeeq
Copy link

I am using Presto version 0.232, and Hive version 3.1.0. Is this integration supports managed tables of Hive3.1.0 Transactional/ACID tables.
I am also facing same error
Query 20200320_145603_00018_rsdg9 failed: Your client does not appear to support insert-only tables. To skip capability checks, please set metastore.client.capa bility.check to false. This setting can be set globally, or on the client for th e current metastore session. Note that this may lead to incorrect results, data loss, undefined behavior, etc. if your client is actually incompatible. You can also specify custom client capabilities via get_table_req API.

@findepi
Copy link
Member

findepi commented Mar 20, 2020

@MohammedLayeeq please try with Presto 331 -- https://prestosql.io/download.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants