Make filterColumns to be coherent with other filter methods #6084

kokosing · 2020-11-25T09:18:54Z

Make filterColumns to be coherent with other filter methods

We should not pass column metadata to filterColumns like we do not pass
TableMetadata to filterTables method.

Requiring ColumnMetadata makes it cumbersome to call filterColumns
method as it requires to have column metadata in hand which not always
is trivial.

Motivation: I want to call filterColumns here: #6017

kokosing · 2020-11-25T09:20:59Z

presto-main/src/main/java/io/prestosql/metadata/MetadataListing.java

-                result.put(entry.getKey(), accessControl.filterColumns(
-                        session.toSecurityContext(),
-                        new CatalogSchemaTableName(prefix.getCatalogName(), entry.getKey()),
-                        entry.getValue()));


I think this was motivation why filterColumns was accepting List<ColumnMetadata> parameter and returning List<ColumnMetadata>. It was simply easy to write this particular line and so far this was the only usage of this call.

kokosing · 2020-11-25T09:21:27Z

Motivation: I want to call filterColumns here: #6017

skrzypo987

LGTM apart from the backward-compatibility issue. But that may be a non-existent problem.

skrzypo987 · 2020-11-25T09:29:59Z

presto-hive/src/main/java/io/prestosql/plugin/hive/security/LegacyAccessControl.java

@@ -178,7 +176,7 @@ public void checkCanShowColumns(ConnectorSecurityContext context, SchemaTableNam
    }

    @Override
-    public List<ColumnMetadata> filterColumns(ConnectorSecurityContext context, SchemaTableName tableName, List<ColumnMetadata> columns)
+    public Set<String> filterColumns(ConnectorSecurityContext context, SchemaTableName tableName, Set<String> columns)


What about custom implementations that use this information (if there are any).
Is there any reasonable workaround?

That would be very bizarre to control access to a column basing on its type, column properties or a column comment.

See implementation in prestosql they bother such information. Also I checked all access control implementation I have access to and their are using just a column names.

@electrum Are you aware of such access controls?

@kokosing Are you sure, that you have checked all the "...access control implementation you have access to..."

You could have simply kept the old method with a deprecation annotation!
There could be other healthy workarounds also.

BUT NO, BECAUSE OF THIS MESS AFTER MIGRATING TO presto-348 from 331 OUR ENTIRE DATALAKE VANISHED.

Datalake schemas were populated based on "DESCRIBE ..." (aka information_schema.columns)
One of the mostly used access control system is ranger access control.

Please have a look at here :
https://github.com/apache/ranger/blob/5df512439c85199abacb84675b91d0b704fb61f0/plugin-presto/src/main/java/org/apache/ranger/authorization/presto/authorizer/RangerSystemAccessControl.java#L578

And here:
https://github.com/apache/ranger/blob/5df512439c85199abacb84675b91d0b704fb61f0/ranger-presto-plugin-shim/src/main/java/org/apache/ranger/authorization/presto/authorizer/RangerSystemAccessControl.java#L394

I'll make a detailed issue later, and create a JIRA card on ranger.

Sorry for my anger, it came out of frustration, as we had to roll back just 3hrs after deploying.

Please be aware that presto plugin for ranger has been written some time ago and was never supported by PrestoSQL(now Trino). It will completely cease to work after 351 Trino version because of the packages rename.
Both Trino and Ranger are open-source projects. You cannot expect them to always work with each other.
There are commercial Ranger integrations that are well supported.

@pPanda-beta
I understand the problem you are facing. I guess it took some time to figure out the cause of your problems, and I am sorry you had to go thru this.

Please see #3705 for a similar discussion in the past. Unfortunately, we cannot freeze the security APIs for ever. They need to evolve as the project evolves.
This applies to actually all the APIs in the project -- not only security. We try to avoid breaking changes.
Yes, for every single change you can think about how to support previous version. Then you basically have two choices:

support "previous version" for ever. -- The consequences of doing this is that these "previous versions" add up and significantly slow down development of the project. We believe that this is not a sustainable model for the long run.

remove "previous version" eventually. -- I do not think it would be helpful though. I would guess that the deprecation would go unnoticed only after we eventually remove the "previous version", which brings us back to square one, just we spend more development cycles on that, without producing any benefit to anyone.

Sadly, in a fast-pasted project like this one, integrations need to be rigorously tested and there is no exempt from that.
On the positive side, we make it easy to test against Trino (formerly Presto). For integration tests i recommend using testcontainers library for spinning up Trino in a docker container.
testcontainers/testcontainers-java#3668 adds a pre-packaged container for Trino, and there already exists a previous one for Presto.

Hope that this comment helps drive improvements for you in the long-term.

@skrzypo987 At least I remember last year when I was part of that presto-ranger plugin development it was purely done for prestosql (kka Trinodb)
We have extended SystemAccessControl
See this :
https://github.com/apache/ranger/blob/5df512439c85199abacb84675b91d0b704fb61f0/ranger-presto-plugin-shim/src/main/java/org/apache/ranger/authorization/presto/authorizer/RangerSystemAccessControl.java#L38-L43

Also advanced features like row filtering and masking is only supported with prestosql <3
Prestodb on the other hand is still struggling to make a production ready ranger plugin.

@findepi
"I would guess that the deprecation would go unnoticed only after we eventually remove the "previous version"" <- I would recommend a deprecation with a finite timeline (e.g. 6 months). This can be captured in build plugins. After 6 months build will automatically shouting "Too stale deprecated function found".

I indeed agree Trino (fka prestosql) is a fast paced project. But on the other hand ranger needs to move with a lot of other projects. They build plugin for every systems. So at least couple of months of breathing time should have been given to us to fix this issue.

In Rapido, we already run integration tests using presto-testing library.
We use io.prestosql.testing.DistributedQueryRunner and io.prestosql.testing.StandaloneQueryRunner for that. It takes around 8secs to boot the server.

Those are fast and lot more configurable from java codebase, compared to the official docker image, which involves lots of plugins, costing a boot-up time > 40secs.

Are you sure, that you have checked all the

The wording here is a bit unfortunate. My point was to verify if information from ColumnMetadata are used to determine the access. Points mentioned by you do not use that information either.

I would like to understand more how you hit this issue? As I understand the change went in 348. What version of Ranger did you use? What version of Trino (pka PrestoSQL) does it support? SPI is a complicated contract, it not only requires a plugin to be compilable but also to match Trino expectations that may vary from version to version in some subtle ways. From what I understand trying to compile that Ranger integration with Trino 348 would fail.

we had to roll back just 3hrs after deploying.

Please forgive me, I don't want to be impolite, but I guess you just found a place in your pipeline that is not well covered with tests. A place for improvement.

Okay let me clear some points here.

Me and my team neither maintains prestosql (kka Trino) or presto-ranger plugin.

As part of the community I contribute to ranger and also here I'm trying to help prestosql. The tests for ranger-presto-plugin is already written in ranger-repo and maintained by ranger community. Neither me or my organisation is involved in this.

Since we are just integrating two already tested products, i.e. prestosql-348 and presto-ranger plugin, we aren’t writing any kind of tests, to ensure the compatibility. We do write tests for the presto related products that we build and maintain. But not for other's products

I totally agree the movement towards Trinodb, and my team is well aware of the fact that with trinodb lots of things will be broken. But this change happened before trino (27th Dec 2020) came into picture. We decided to go with prestosql-348 instead of trino, trino releases was readily available. Cause we know things will be broken with Trino. It would be better if we can keep this "trino" renaming aside on this conversation .

The mistake that happened from my side is that we assumed presto-348 is fully backward compatible with prestosql spi 331. We have other plugins also which are using spi-331. We found most of the features were already working except a few. But whichever feature was broken we got explicit errors. This one produced incorrect results without giving any error messages.

Some suggestions from my side:

If this was done in a proper deprecation manner then we may have survived the mess.

If there is any breaking change (like exposed public function/api change), it would be nice to mention on the changelog / release notes. Me and my team had read the changelogs. "...coherent with other filter methods.." never gave any hint on this much big bug.

It is best to follow semver 2.0, frankly speaking 331 -> 348 doesn’t give proper idea on what could be breaking.

For SPI / any public rest api it is good to have a +/- 6 months of compatibility and deprecation window.

It gives enough buffer to other community projects to move in their pace.

Consumers will have more confidence and fearlessly migrate to next version. Thinking about the breaking changes we dropped the plan to migrate to "Trino". Now it seems we have to drop the plan for presto-348 also. I dont want to justify but may be because of those fears only Qubole R60 (latest) is stuck at Prestosql 317 (since 2020-Q1)

How much do we use Trino (fka prestosql) ?

Well we are one of the successful presto on kubernetes adopter with a "pay as you go" model. I can proudly say our presto helm chart can compete with enterprise prestosql distributions like Qubole.
Apart from load based auto scaling, we also have resource groups, gcs based query logging, ranger integration, atlas integration. We maintain multiple presto clusters. Also during releases we use presto gateway to do a proper blue green deployment. And if progress remains like this, then I'm very sorry to say that may be we wont able to upgrade Trino (fka prestosql) more than once every 6 months. Frankly we either need more confidence on backward compatibility or explicit mention of broken features in changelog.

we assumed presto-348 is fully backward compatible with prestosql spi 331. We have other plugins also which are using spi-331. We found most of the features were already working except a few. But whichever feature was broken we got explicit errors. This one produced incorrect results without giving any error messages.

Not strictly related, but there is potential silent incorrect results unless plugin is updated, see https://trino.io/docs/current/release/release-341.html#spi-changes
Hope that helps.

If there is any breaking change (like exposed public function/api change), it would be nice to mention on the changelog / release notes. Me and my team had read the changelogs. "...coherent with other filter methods.." never gave any hint on this much big bug.

The good side is that we did not miss that in release notes, but i see that was not helpful.
What kind of wording do you think would be helpful here?

findepi · 2020-11-25T11:24:22Z

presto-main/src/main/java/io/prestosql/metadata/MetadataListing.java

+                    columns.stream()
+                            .map(ColumnMetadata::getName)
+                            .collect(toImmutableSet()));


you can use mappedCopy here :)

findepi · 2020-11-25T11:24:23Z

presto-main/src/main/java/io/prestosql/metadata/MetadataListing.java

+                    columns.stream()
+                            .filter(column -> allowedColumns.contains(column.getName()))
+                            .collect(toImmutableList()));


you can use filteredCopy that @kokosing added for reuse :)

findepi · 2020-11-25T11:25:44Z

...o-plugin-toolkit/src/main/java/io/prestosql/plugin/base/security/FileBasedAccessControl.java

+                .filter(column -> !restrictedColumns.contains(column))
+                .collect(toImmutableSet());


This can be further simplified with com.google.common.collect.Sets#difference

(separate commit, if you decide to do so)

findepi · 2020-11-25T11:25:50Z

...in-toolkit/src/main/java/io/prestosql/plugin/base/security/FileBasedSystemAccessControl.java

+                .filter(column -> !restrictedColumns.contains(column))
+                .collect(toImmutableSet());


This can be further simplified with com.google.common.collect.Sets#difference

(separate commit, if you decide to do so)

We should not pass column metadata to filterColumns like we do not pass TableMetadata to filterTables method. Requiring ColumnMetadata makes it cumbersome to call filterColumns method as it requires to have column metadata in hand which not always is trivial.

cla-bot bot added the cla-signed label Nov 25, 2020

kokosing requested review from skrzypo987, iirekm and electrum November 25, 2020 09:19

kokosing commented Nov 25, 2020

View reviewed changes

skrzypo987 approved these changes Nov 25, 2020

View reviewed changes

iirekm approved these changes Nov 25, 2020

View reviewed changes

kokosing force-pushed the origin/master/266_pass_only_names branch from aa0de2c to 67f5233 Compare November 25, 2020 10:53

findepi approved these changes Nov 25, 2020

View reviewed changes

kokosing added 2 commits November 25, 2020 21:03

Use static import

6ac400c

kokosing force-pushed the origin/master/266_pass_only_names branch from 67f5233 to 03fbdaa Compare November 25, 2020 20:03

electrum approved these changes Nov 26, 2020

View reviewed changes

kokosing merged commit 28bc79b into trinodb:master Nov 26, 2020

kokosing deleted the origin/master/266_pass_only_names branch November 26, 2020 09:21

kokosing mentioned this pull request Nov 26, 2020

Release notes for 348 #6100

Closed

9 tasks

kokosing added this to the 348 milestone Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make filterColumns to be coherent with other filter methods #6084

Make filterColumns to be coherent with other filter methods #6084

kokosing commented Nov 25, 2020 •

edited by findepi

kokosing Nov 25, 2020

kokosing commented Nov 25, 2020

skrzypo987 left a comment

skrzypo987 Nov 25, 2020

kokosing Nov 25, 2020

pPanda-beta Jan 29, 2021

skrzypo987 Jan 29, 2021

findepi Jan 29, 2021

pPanda-beta Jan 29, 2021

pPanda-beta Jan 29, 2021

kokosing Jan 29, 2021

pPanda-beta Jan 29, 2021

findepi Jan 29, 2021

findepi Nov 25, 2020

findepi Nov 25, 2020

findepi Nov 25, 2020

findepi Nov 25, 2020

		.filter(column -> !restrictedColumns.contains(column))
		.collect(toImmutableSet());

Make filterColumns to be coherent with other filter methods #6084

Make filterColumns to be coherent with other filter methods #6084

Conversation

kokosing commented Nov 25, 2020 • edited by findepi

Choose a reason for hiding this comment

kokosing commented Nov 25, 2020

skrzypo987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BUT NO, BECAUSE OF THIS MESS AFTER MIGRATING TO presto-348 from 331 OUR ENTIRE DATALAKE VANISHED.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Some suggestions from my side:

How much do we use Trino (fka prestosql) ?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented Nov 25, 2020 •

edited by findepi