Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Cassandra tuple type #416

Conversation

leon-gh
Copy link

@leon-gh leon-gh commented Mar 8, 2019

Added support for tuple Cassandra type, followed implementation for #147.

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2019
@trinodb trinodb deleted a comment from cla-bot bot Mar 9, 2019
@trinodb trinodb deleted a comment from cla-bot bot Mar 9, 2019
row(1, "(1,1)"));

assertThat(() -> query(format("INSERT INTO %s.%s.%s (key, value) VALUES (2, (2,2))", CONNECTOR_NAME, KEY_SPACE, tableName)))
.failsWithMessage("Codec not found for requested operation: [frozen<test.type_tuple_insert> <-> java.lang.String]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: This exception message comes from Cassandra library. It would be better to throw more explicit message in seprated PR (other existing types are also the same).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will leave it as it is and will be addressed later by the separate PR .


query(format("INSERT INTO %s.%s.%s (key, value) VALUES (1, (1,1))", CONNECTOR_NAME, KEY_SPACE, tableName));
assertThat(query(format("SELECT * FROM %s.%s.%s", CONNECTOR_NAME, KEY_SPACE, tableName))).containsOnly(
row(1, "(1,1)"));
Copy link
Member

@ebyhr ebyhr Mar 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mapped tuple to presto VARCHAR, therefore the tuple value should be '(1,1)'. Current (1,1) is identified as ARRAY

What we want to check in this test method is INSERT fails like L185. L180-182 are not necessary.

@ebyhr
Copy link
Member

ebyhr commented Mar 9, 2019

Thank you for creating this PR! I left a few comments. If you want to run presto-product-tests in your local environment, below document would be helpful.
https://github.com/prestosql/presto/tree/master/presto-product-tests

Ref: #415

@@ -211,6 +211,7 @@ TIMEUUID VARCHAR
TINYINT TINYINT
VARCHAR VARCHAR
VARIANT VARCHAR
TUPLE VARCHAR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update L220.

Types not mentioned in the table above are not supported (e.g. tuple or UDT).

to

Types not mentioned in the table above are not supported (e.g. UDT).

@kokosing
Copy link
Member

@leon-gh @ebyhr What is the difference between Cassandra tuple and UDT (#364)? Why we are mapping tuple to varchar (instead of something more complex like row)? Isn't Cassandra typetuple frozen<tuple<int, int>> a Presto row(int, int)?

@ebyhr
Copy link
Member

ebyhr commented Mar 11, 2019

@kokosing The main difference is UDT includes the column names like json, but tuple doesn't. Therefore, I thought mapping tuple to varchar is fine.

tuple: (3, 'bar', 2.1)
udt: {id:3, name:'bar', value:2.1}

If we map tuple to row type, perhaps we need to assign dummy column name like below.
{_col1:3, _col2:'bar', _col3:2.1}

@findepi
Copy link
Member

findepi commented Mar 11, 2019

@ebyhr io.prestosql.spi.type.RowType#anonymous creates a row without explicitly naming the fields.

Note: currently, the fields will be show up as fieldN in the CLI (as shown below), but this is not how you access the fields in such a row. Usually you would do cast( ... as row(x integer, ...)) or e.g. cast to json

presto:tiny> select row(1,2,3,4);
                  _col0
------------------------------------------
 {field0=1, field1=2, field2=3, field3=4}

@ebyhr
Copy link
Member

ebyhr commented Mar 12, 2019

@findepi Oh, I didn't know that. Which is better RowType#anonymous or assigning dummy column names? If we use RowType#anonymous, we need to cast for accessing the fields, right?

@leon-gh leon-gh force-pushed the feature/presto-cassandra-add-tuple-type-support branch from c499f78 to 1a06156 Compare March 12, 2019 02:31
@kokosing
Copy link
Member

we need to cast for accessing the fields, right?

Yes, still it is much better than parsing varchar.

@ebyhr
Copy link
Member

ebyhr commented Mar 12, 2019

@kokosing Sorry, I meant anonymous row {field0=1, field1=2, field2=3, field3=4} or dummy row{_col0=1, _col1=2, _col2=3, _col3=4}

@martint
Copy link
Member

martint commented Mar 13, 2019

@ebyhr, where do you see the "field0", "field1", etc. for anonymous rows? We removed all such usages a while ago, but we may have missed something.

If you don't have field names, it's more natural to use RowType#anonymous, which basically models a type such as row(bigint, varchar(10), boolean) vs a named row type which models a type like row(a bigint, b varchar(10), c boolean).

You're right that using anonymous row types requires casting them to one with names before accessing the fields because the dereference operator is undefined in the other case in Presto. For a long time I've been thinking we might want to add support for positional access for row types (as an extension to standard SQL). This could easily be done by allowing row types to support the subscript operator: e.g., x[1]. This needs further analysis and consideration, though.

@findepi
Copy link
Member

findepi commented Mar 13, 2019

I've been thinking we might want to add support for positional access for row types (as an extension to standard SQL). This could easily be done by allowing row types to support the subscript operator: e.g., x[1].

@martint that would be convenient. The current approach (cast to a named row, enumerating all types) is quite verbose, so I sympathise with a temptation to assign dummy field names.
Alternatively, we could have a function to get a field from a row. We would need to extend the type system to make it work reasonably.

@ebyhr
Copy link
Member

ebyhr commented Mar 13, 2019

@martint I heard the "field0" for anonymous from @findepi #416 (comment).

@leon-gh Sorry for consuing you. Let's use RowType#anonymous.

This is related issue about accessing the row field by index.
prestodb/presto#7640

@leon-gh
Copy link
Author

leon-gh commented Mar 13, 2019

@leon-gh Sorry for consuing you. Let's use RowType#anonymous.

OK, just updated the test now and will add a change to convert to RowType#anonymous

@martint
Copy link
Member

martint commented Mar 14, 2019

Alternatively, we could have a function to get a field from a row. We would need to extend the type system to make it work reasonably.

That's the tough part. We would almost need to support dependent types and concepts that I'm not sure how we'd express in Presto today.

In particular, given x :: row(bigint, varchar(1)),

get_field(x, 0) :: bigint
get_field(x, 1) :: varchar(1)

What's the type of the get_field function?

@electrum
Copy link
Member

electrum commented Aug 7, 2019

@leon-gh Apologies, it looks like this slipped through the cracks. Now that we have easy access to fields in anonymous row types using the [] operator, do we want to update this to return Cassandra tuples as anonymous rows?

@leon-gh
Copy link
Author

leon-gh commented Sep 25, 2019

@leon-gh Apologies, it looks like this slipped through the cracks. Now that we have easy access to fields in anonymous row types using the [] operator, do we want to update this to return Cassandra tuples as anonymous rows?

Sorry, been busy with other things. Sure, all I wanted was the C* tables to be supported, didn't really mind that much about the types - as long as I can export/import the data, that's all good.
Are there are examples of anonymous rows being returned so I don't have to re-invent things?

@martint
Copy link
Member

martint commented Oct 9, 2019

Are there are examples of anonymous rows being returned so I don't have to re-invent things?

You need to create a block from the row type associated with that column and then append values to it. See this for an example:

https://github.com/prestosql/presto/blob/master/presto-rcfile/src/main/java/io/prestosql/rcfile/text/BlockEncoding.java#L63-L73
https://github.com/prestosql/presto/blob/master/presto-rcfile/src/main/java/io/prestosql/rcfile/text/StructEncoding.java#L64

@losipiuk
Copy link
Member

Fixed via #8570

@losipiuk losipiuk closed this Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

7 participants