Add support for Cassandra tuple type #416

leon-gh · 2019-03-08T22:12:32Z

Added support for tuple Cassandra type, followed implementation for #147.

presto-cassandra/src/main/java/io/prestosql/plugin/cassandra/CassandraType.java

presto-cassandra/src/test/java/io/prestosql/plugin/cassandra/CassandraTestingUtils.java

presto-cassandra/src/test/java/io/prestosql/plugin/cassandra/TestCassandraConnector.java

...o-product-tests/src/main/java/io/prestosql/tests/cassandra/TestInsertIntoCassandraTable.java

ebyhr · 2019-03-09T06:37:51Z

...o-product-tests/src/main/java/io/prestosql/tests/cassandra/TestInsertIntoCassandraTable.java

+                row(1, "(1,1)"));
+
+        assertThat(() -> query(format("INSERT INTO %s.%s.%s (key, value) VALUES (2, (2,2))", CONNECTOR_NAME, KEY_SPACE, tableName)))
+                .failsWithMessage("Codec not found for requested operation: [frozen<test.type_tuple_insert> <-> java.lang.String]");


fyi: This exception message comes from Cassandra library. It would be better to throw more explicit message in seprated PR (other existing types are also the same).

OK, I will leave it as it is and will be addressed later by the separate PR .

presto-product-tests/src/main/java/io/prestosql/tests/cassandra/TestSelect.java

ebyhr · 2019-03-09T06:42:14Z

...o-product-tests/src/main/java/io/prestosql/tests/cassandra/TestInsertIntoCassandraTable.java

+
+        query(format("INSERT INTO %s.%s.%s (key, value) VALUES (1, (1,1))", CONNECTOR_NAME, KEY_SPACE, tableName));
+        assertThat(query(format("SELECT * FROM %s.%s.%s", CONNECTOR_NAME, KEY_SPACE, tableName))).containsOnly(
+                row(1, "(1,1)"));


You mapped tuple to presto VARCHAR, therefore the tuple value should be '(1,1)'. Current (1,1) is identified as ARRAY

What we want to check in this test method is INSERT fails like L185. L180-182 are not necessary.

ebyhr · 2019-03-09T06:44:26Z

Thank you for creating this PR! I left a few comments. If you want to run presto-product-tests in your local environment, below document would be helpful.
https://github.com/prestosql/presto/tree/master/presto-product-tests

Ref: #415

ebyhr · 2019-03-09T07:13:08Z

presto-docs/src/main/sphinx/connector/cassandra.rst

@@ -211,6 +211,7 @@ TIMEUUID          VARCHAR
 TINYINT           TINYINT
 VARCHAR           VARCHAR
 VARIANT           VARCHAR
+TUPLE             VARCHAR


Let's also update L220.

Types not mentioned in the table above are not supported (e.g. tuple or UDT).

to

Types not mentioned in the table above are not supported (e.g. UDT).

kokosing · 2019-03-11T10:58:24Z

@leon-gh @ebyhr What is the difference between Cassandra tuple and UDT (#364)? Why we are mapping tuple to varchar (instead of something more complex like row)? Isn't Cassandra typetuple frozen<tuple<int, int>> a Presto row(int, int)?

ebyhr · 2019-03-11T12:52:40Z

@kokosing The main difference is UDT includes the column names like json, but tuple doesn't. Therefore, I thought mapping tuple to varchar is fine.

tuple: (3, 'bar', 2.1)
udt: {id:3, name:'bar', value:2.1}

If we map tuple to row type, perhaps we need to assign dummy column name like below.
{_col1:3, _col2:'bar', _col3:2.1}

findepi · 2019-03-11T14:53:01Z

@ebyhr io.prestosql.spi.type.RowType#anonymous creates a row without explicitly naming the fields.

Note: currently, the fields will be show up as fieldN in the CLI (as shown below), but this is not how you access the fields in such a row. Usually you would do cast( ... as row(x integer, ...)) or e.g. cast to json

presto:tiny> select row(1,2,3,4);
                  _col0
------------------------------------------
 {field0=1, field1=2, field2=3, field3=4}

ebyhr · 2019-03-12T02:02:36Z

@findepi Oh, I didn't know that. Which is better RowType#anonymous or assigning dummy column names? If we use RowType#anonymous, we need to cast for accessing the fields, right?

kokosing · 2019-03-12T09:13:28Z

we need to cast for accessing the fields, right?

Yes, still it is much better than parsing varchar.

ebyhr · 2019-03-12T10:24:45Z

@kokosing Sorry, I meant anonymous row {field0=1, field1=2, field2=3, field3=4} or dummy row{_col0=1, _col1=2, _col2=3, _col3=4}

martint · 2019-03-13T06:16:16Z

@ebyhr, where do you see the "field0", "field1", etc. for anonymous rows? We removed all such usages a while ago, but we may have missed something.

If you don't have field names, it's more natural to use RowType#anonymous, which basically models a type such as row(bigint, varchar(10), boolean) vs a named row type which models a type like row(a bigint, b varchar(10), c boolean).

You're right that using anonymous row types requires casting them to one with names before accessing the fields because the dereference operator is undefined in the other case in Presto. For a long time I've been thinking we might want to add support for positional access for row types (as an extension to standard SQL). This could easily be done by allowing row types to support the subscript operator: e.g., x[1]. This needs further analysis and consideration, though.

findepi · 2019-03-13T08:20:56Z

I've been thinking we might want to add support for positional access for row types (as an extension to standard SQL). This could easily be done by allowing row types to support the subscript operator: e.g., x[1].

@martint that would be convenient. The current approach (cast to a named row, enumerating all types) is quite verbose, so I sympathise with a temptation to assign dummy field names.
Alternatively, we could have a function to get a field from a row. We would need to extend the type system to make it work reasonably.

ebyhr · 2019-03-13T08:26:16Z

@martint I heard the "field0" for anonymous from @findepi #416 (comment).

@leon-gh Sorry for consuing you. Let's use RowType#anonymous.

This is related issue about accessing the row field by index.
prestodb/presto#7640

leon-gh · 2019-03-13T23:07:01Z

@leon-gh Sorry for consuing you. Let's use RowType#anonymous.

OK, just updated the test now and will add a change to convert to RowType#anonymous

martint · 2019-03-14T20:52:15Z

Alternatively, we could have a function to get a field from a row. We would need to extend the type system to make it work reasonably.

That's the tough part. We would almost need to support dependent types and concepts that I'm not sure how we'd express in Presto today.

In particular, given x :: row(bigint, varchar(1)),

get_field(x, 0) :: bigint
get_field(x, 1) :: varchar(1)

What's the type of the get_field function?

electrum · 2019-08-07T18:35:05Z

@leon-gh Apologies, it looks like this slipped through the cracks. Now that we have easy access to fields in anonymous row types using the [] operator, do we want to update this to return Cassandra tuples as anonymous rows?

leon-gh · 2019-09-25T05:39:17Z

@leon-gh Apologies, it looks like this slipped through the cracks. Now that we have easy access to fields in anonymous row types using the [] operator, do we want to update this to return Cassandra tuples as anonymous rows?

Sorry, been busy with other things. Sure, all I wanted was the C* tables to be supported, didn't really mind that much about the types - as long as I can export/import the data, that's all good.
Are there are examples of anonymous rows being returned so I don't have to re-invent things?

martint · 2019-10-09T19:55:24Z

Are there are examples of anonymous rows being returned so I don't have to re-invent things?

You need to create a block from the row type associated with that column and then append values to it. See this for an example:

https://github.com/prestosql/presto/blob/master/presto-rcfile/src/main/java/io/prestosql/rcfile/text/BlockEncoding.java#L63-L73
https://github.com/prestosql/presto/blob/master/presto-rcfile/src/main/java/io/prestosql/rcfile/text/StructEncoding.java#L64

losipiuk · 2021-09-21T10:10:52Z

Fixed via #8570

Add support for Cassandra tuple type

3c354d0

cla-bot bot added the cla-signed label Mar 9, 2019

trinodb deleted a comment from cla-bot bot Mar 9, 2019