Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTAS into Iceberg with timestamp(3) #6658

Closed
sshkvar opened this issue Jan 20, 2021 · 11 comments · Fixed by #13981
Closed

Support CTAS into Iceberg with timestamp(3) #6658

sshkvar opened this issue Jan 20, 2021 · 11 comments · Fixed by #13981
Assignees
Labels
enhancement New feature or request

Comments

@sshkvar
Copy link
Contributor

sshkvar commented Jan 20, 2021

Trino uses 3 digits of subsecond precision in Date and time functions, but Iceberg expects timestamp with 3 digits of subsecond precision.

As a result we can't create Iceberg table from Trino

create table iceberg.examples.test
as select now() as n;

Exception:
Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.

Is it possible to make default precision configurable or allow use of Timestamp with precision (3) for Iceberg tables?

@hashhar
Copy link
Member

hashhar commented Jan 20, 2021

I think this discussion will be more fruitful on the #iceberg channel on Trino Slack since there is a wide audience of other Iceberg users there who might have inputs/opinions.

I'd suggest closing this issue here and moving the discussion over there.

@sshkvar
Copy link
Contributor Author

sshkvar commented Jan 20, 2021

@hashhar thanks for you comment, but I think it is better to leave this issue open, because would be nice to have fix for it.
For me it looks as pretty common use case and would be great to have functionality which cast timestamp(3) to timestamp(6) automatically in Iceberg connector

@hashhar
Copy link
Member

hashhar commented Jan 20, 2021

That makes sense. Let's rename this to be more descriptive then.

cc: @electrum @lxynov

@sshkvar sshkvar changed the title No ability to create Iceberg table Iceberg table can't be created when use Date and time functions with default precision(3) Jan 20, 2021
@sshkvar
Copy link
Contributor Author

sshkvar commented Jan 20, 2021

I have renamed this issue and started discussion in slack

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Nov 26, 2021

I found this issue looking for guidance on how to resolve this for my use case. Help?

Answering my own question: use 'timestamp(6)' instead of 'timestamp' as the SQL type for the Trino table. I'm using a system that's converting Pandas datatypes to Trino datatypes, and that was invisible to me.

@findepi
Copy link
Member

findepi commented Nov 26, 2021

As a result we can't create Iceberg table from Trino

create table iceberg.examples.test
as select now() as n;

use current_timestamp(6) instead

@daniil-dubin
Copy link

@findepi we have thousands of scripts (as well as many other companies I believe do) that we are migrating to iceberg automatically. We can't just mechanically replace all the occurrences of now() or current_timestamp() with current_timestamp(6) as it may brake other places unrelated to iceberg and requires semantical analysis of the script. It will take hundreds of hours of work for the data developers to change the scripts and validate them.
We are hacking iceberg connector to add conversion there but we'd like to avoid doing it as much as possible.

@MichaelTiemannOSC
Copy link

This is my hack to the Trino python client:

@@ -145,7 +145,7 @@ class TrinoTypeCompiler(compiler.GenericTypeCompiler):
        return self.visit_VARBINARY(type_, **kw)

    def visit_DATETIME(self, type_, **kw):
-       return self.visit_TIMESTAMP(type_, **kw)
+       return self.visit_TIMESTAMP(type_, **kw).replace('TIMESTAMP','TIMESTAMP(6)')


class TrinoIdentifierPreparer(compiler.IdentifierPreparer):

@findepi findepi changed the title Iceberg table can't be created when use Date and time functions with default precision(3) Support CTAS into Iceberg with timestamp(3) May 27, 2022
@findepi findepi added the enhancement New feature or request label May 27, 2022
@findepi
Copy link
Member

findepi commented May 30, 2022

FWIW, all the JDBC connectors allow CTAS with timestamp precision other than supported by remote database's storage, including reducing the precision. This was a conscious choice, that we have debated about.

For JDBC connectors this was easy by the virtue of WriteFunction interface than can easily adapt what engine provides to what we store in the remote system, thus implementing the coercion. The unfortunate downsides are

  • the connector needs to implement coercion logic that engine would normally do in INSERT case
  • the CREATE TABLE also allows changing the type, which can be surprising for the users, because CT and CTAS do same logic for transforming from requested Trino type to storage type. Doing it differently would also be surprising to users anyway.

For Iceberg it's not as simple, because we don't have WriteFunction interface (or similar).
Also, we address the first drawback, and let engine coerce the data for CTAS same way as it does for INSERT.
We can do so by introducing intermediate step of planning CTAS query, where connector returns actual Trino types that will be persisted and engine applies coercions as necessary.

cc @losipiuk @martint @alexjo2144 @hashhar @wendigo

@rdblue
Copy link
Contributor

rdblue commented Mar 19, 2023

@findepi, @bitsondatadev, any update on this? We're getting more reports about it and would love to have it fixed somehow.

@bitsondatadev
Copy link
Member

@findepi, @bitsondatadev, any update on this? We're getting more reports about it and would love to have it fixed somehow.

#13981 (comment)

Looks like this was broken down into multiple PRs. Checking with @mdesmet on the status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

8 participants