Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AccessControl functionality to filter columns from SELECT *… #9991

Merged
merged 1 commit into from
Dec 16, 2021

Conversation

rodolfototaro
Copy link

@rodolfototaro rodolfototaro commented Nov 18, 2021

I Implemented the feature without modify the access control and making it configurable (by default is off).
The implementation allow also to create a rowfilter on a column that is inaccessible. This could be useful when you need to partition data between user but you would not to show the discrimination column.

fixes #7461

@cla-bot
Copy link

cla-bot bot commented Nov 18, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot added the cla-signed label Nov 23, 2021
@findepi findepi changed the title #7461 Use AccessControl functionality to filter columns from SELECT *… Use AccessControl functionality to filter columns from SELECT *… Nov 25, 2021
@findepi
Copy link
Member

findepi commented Nov 25, 2021

@rtotarocuebiq how does this PR relate to @youngchen7's #7893?

@rodolfototaro
Copy link
Author

rodolfototaro commented Nov 25, 2021

@findepi It is an alternative implementation that does not need to change the access control signature, so there is not an alternative method to "filterColumns". Moreover it cover the usecase when you need to partition data between user but you would not to show the discrimination column.

@findepi
Copy link
Member

findepi commented Nov 25, 2021

Moreover it cover the usecase when you need to partition data between user but you would not to show the discrimination column.

can you elaborate on that use-case, or exemplify?

@rodolfototaro
Copy link
Author

rodolfototaro commented Nov 25, 2021

Moreover it cover the usecase when you need to partition data between user but you would not to show the discrimination column.

can you elaborate on that use-case, or exemplify?

The need is to have columns that contain technical information related

  • to the owner of the data (tenant-id or user-id)
  • to the goal for which data can be used (related to data license)

so you want give access to a user just to a subset of data but at the same time you would not to show those columns to users. Will exist also users that have right for accessing to the whole table.
So you need to apply the row filters depending by logged user on inaccessibile columns.

@findepi
Copy link
Member

findepi commented Nov 26, 2021

So you need to apply the row filters depending by logged user on inaccessibile columns.

That's a reasonable requirement, and should be possible today, regardless of SELECT * expansion with respect to inaccessible columns.

@rtotarocuebiq do we have existing tests for that?
cc @kokosing

@rodolfototaro
Copy link
Author

@findepi yes in core/trino-main/src/test/java/io/trino/sql/query/TestFilterInaccessibleColumns.java method testRowFilterOnNotAccessibleColumn

@@ -67,6 +67,7 @@
public static final String QUERY_MAX_EXECUTION_TIME = "query_max_execution_time";
public static final String QUERY_MAX_PLANNING_TIME = "query_max_planning_time";
public static final String QUERY_MAX_RUN_TIME = "query_max_run_time";
public static final String QUERY_HIDE_INACCESSIBLE_COLUMNS = "query_hide_inaccessible_columns";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be exposed as a session property

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestions about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing i understand you're concerned about the case where inaccessible columns are hidden with a config, but session property overrides and makes them visible (although still inaccessible) again. That would be a security problem

OTOH, i am thinking about the case where server uses default configuration, and user or group of users would want to leverage this new functionality. Providing session toggle would be nice.

I propose we do provide session toggle, but with validation preventing query_hide_inaccessible_columns be set to false, when config is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is about the query semantic. I don't think user should be allowed to change what given query means at will. I think it may cause other issues. For example auditors may no longer know what give query means.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing I got your point, you prefer to set it at server config level. Can you give me suggestion on where add the property? (i.e. ServerConfig or AccessControlConfig) and which is the better way for accessing to the value from StatementAnalyzer ?

return hideUnaccessibleColumns;
}

@Config("query.hide-inaccessible-columns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConfigDescription

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Config("query.filter-restricted-columns")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like "hide" more.

also "restricted" could be understood as referring to inaccessible & masked columns, whereas we don't want to hide masked columns

@@ -67,6 +67,7 @@
public static final String QUERY_MAX_EXECUTION_TIME = "query_max_execution_time";
public static final String QUERY_MAX_PLANNING_TIME = "query_max_planning_time";
public static final String QUERY_MAX_RUN_TIME = "query_max_run_time";
public static final String QUERY_HIDE_INACCESSIBLE_COLUMNS = "query_hide_inaccessible_columns";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is about the query semantic. I don't think user should be allowed to change what given query means at will. I think it may cause other issues. For example auditors may no longer know what give query means.

tableColumnMap.asMap()
.forEach((key, value) -> {
//Allow to create a rowfilter on a not accessibile column
if (analysis.hasRowFilter(key, identity.getUser())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about column masking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saving columns accessed via row filter for excluding them from the access control if them are not accessed also directly in the statement. I didn't see a similar usecase for masking. Do you have a specific use case in mind so that I can add a test about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Consider a case where not accessible columns are used for column masking for example:

CASE not_accessible_column
    WHEN 'allowed' THEN actual_column
    WHEN 'masked' THEN mask(actual_column)
    ELSE null
END

Please add tests for cases like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, to support it I moved the filtering logic on io.trino.sql.analyzer.StatementAnalyzer.Visitor#analyzeRowFilter and io.trino.sql.analyzer.StatementAnalyzer.Visitor#analyzeColumnMask. Added test cases

@@ -3188,6 +3189,26 @@ private void analyzeSelectAllColumns(
}
}

private List<Field> filterInaccessibleFields(List<Field> fields)
{
if (SystemSessionProperties.isHideInaccesibleColumns(session)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invert condition and remove else block (keyword)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@rodolfototaro
Copy link
Author

I moved the property hide-inaccessible-columns on AccessControlConfig and aligned the pull request with master

tableColumnMap.asMap()
.forEach((key, value) -> {
//Allow to create a rowfilter on a not accessibile column
if (analysis.hasRowFilter(key, identity.getUser())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Consider a case where not accessible columns are used for column masking for example:

CASE not_accessible_column
    WHEN 'allowed' THEN actual_column
    WHEN 'masked' THEN mask(actual_column)
    ELSE null
END

Please add tests for cases like that.

field.getOriginColumnName().isEmpty() ||
field.getOriginTable().isEmpty() ||
field.isHidden() ||
!accessControl.filterInaccessibleColumns(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define filterInaccessibleColumns in StatementAnalyzer. It seem that it is more semantic related thing than access.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

.filter(field ->
field.getOriginColumnName().isEmpty() ||
field.getOriginTable().isEmpty() ||
field.isHidden() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need above conditions? getVisibleFields already checks if field is hidden. Also I would prefer to avoid other conditions to as they may change semantic of how things work today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I removed the condition

@@ -3188,6 +3188,21 @@ private void analyzeSelectAllColumns(
}
}

private List<Field> filterInaccessibleFields(List<Field> fields)
{
ImmutableList<Field> authorizedFields = fields.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. inline authorizedFields

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return hideInaccesibleColumns;
}

@Config("access-control.hide-inaccessible-columns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it to FeaturesConfig and add @ConfigDescription

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I moved it in FeaturesConfig and make accessibile in Metadata

accessControl.deny(privilege("columns.data_type", SELECT_COLUMN));
// SHOW COLUMNS and DESCRIBE fail since data_type isn't visible
assertThatThrownBy(() -> assertions.query("SHOW COLUMNS FROM nation")).hasMessage("Access Denied: Cannot select from columns [ordinal_position, extra_info, table_schema, column_name, data_type, comment, table_name] in table or view local.information_schema.columns");
assertThatThrownBy(() -> assertions.query("DESCRIBE nation")).hasMessage("Access Denied: Cannot select from columns [ordinal_position, extra_info, table_schema, column_name, data_type, comment, table_name] in table or view local.information_schema.columns");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query should not fail. If you are able to query table then surely you should be able to to DESCRIBE it. Columns that got hidden should be simply filtered here. Same comment for SHOW COLUMNS FROM nation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added specific test for Describe and SHOW COLUMNS, this test check that are not accessible if I deny privilege on Information_schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this test. You are revoking access to a metadata table which should be always accessible. I think it does not make sense. Please have tests where you only manipulate access on test tables like nation.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't put enough attention to analysis or statement analyzer yet. I would like first to reach a state where tests are expressing the thing we would like to have.

Notice that row filter and column masking is like a view, where the user given with it is like owner. We should still do access control checks for them. Please take a look how view access control works.

core/trino-main/src/main/java/io/trino/FeaturesConfig.java Outdated Show resolved Hide resolved
return hideInaccesibleColumns;
}

@Config("experimental.hide-inaccessible-columns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have experimental. here? I think this features is not going to affect a lot of users as it will be disabled by default so there should be no change for them.

/**
* Return true if columns must be hidden if inaccessible, default value is false
*/
default boolean isHideInaccesibleColumns()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this simply belongs to StatementAnalyzer. It is about the syntax not metadata. Metadata is not changed when the toggle is enabled or not. We just handle it differently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

accessControl.deny(privilege("columns.data_type", SELECT_COLUMN));
// SHOW COLUMNS and DESCRIBE fail since data_type isn't visible
assertThatThrownBy(() -> assertions.query("SHOW COLUMNS FROM nation")).hasMessage("Access Denied: Cannot select from columns [ordinal_position, extra_info, table_schema, column_name, data_type, comment, table_name] in table or view local.information_schema.columns");
assertThatThrownBy(() -> assertions.query("DESCRIBE nation")).hasMessage("Access Denied: Cannot select from columns [ordinal_position, extra_info, table_schema, column_name, data_type, comment, table_name] in table or view local.information_schema.columns");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this test. You are revoking access to a metadata table which should be always accessible. I think it does not make sense. Please have tests where you only manipulate access on test tables like nation.

@rodolfototaro
Copy link
Author

I haven't put enough attention to analysis or statement analyzer yet. I would like first to reach a state where tests are expressing the thing we would like to have.

Notice that row filter and column masking is like a view, where the user given with it is like owner. We should still do access control checks for them. Please take a look how view access control works.

Thx for the suggestion I was misunderstanding the Identity meaning in the View Expression so now I simplified my PR a lot.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much better now. Actually no changes in analyzer.

return fields;
}
//collect fields by table
ListMultimap<QualifiedObjectName, Field> tableFieldsMap = Multimaps.newListMultimap(new HashMap<QualifiedObjectName, Collection<Field>>(), () -> new LinkedList<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this that complex?

As I understand these fields always come from the same relation (maybe you do checkArgument to assert that), so do we need to use multimap here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of join query like:
SELECT * FROM nation,customer WHERE customer.nationkey = nation.nationkey AND nation.name = 'FRANCE' AND customer.name='Customer#000001477'
I receive here fields from multiple tables

and in case of queries like:
SELECT * FROM (select 'test')
SELECT * FROM (SELECT concat(name,'-test') FROM nation WHERE name = 'FRANCE')

I receive fields without OriginaleTable/OriginaleColumn
so I must manage also those cases.

I'll add tests for that.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few minor comments. Thank you! Please rebase and squash commits. Make sure that commit message is well formatted and explains that change you are posting. Also please add this new property to the docs.

private static void validateHideInaccesibleColumns(boolean value, boolean defaultValue)
{
if (defaultValue == true && value == false) {
throw new TrinoException(INVALID_SESSION_PROPERTY, "hide_inaccessible_columns cannot be set to false if default value is true");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not much user friendly error message, it is very technical. Maybe:

hide_inaccessible_columns cannot be disabled with session property when it was enabled with configuration

Also use HIDE_INACCESSIBLE_COLUMNS constant

Please make sure you have tests for this.

new NodeMemoryConfig(),
new DynamicFilterConfig(),
new NodeSchedulerConfig()));
Assertions.assertThatNoException().isThrownBy(() -> sessionPropertyManager.validateSystemSessionProperty(SystemSessionProperties.HIDE_INACCESSIBLE_COLUMNS, "true"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import?

Set<String> accessibleColumns = accessControl.filterColumns(
session.toSecurityContext(),
table.asCatalogSchemaTableName(),
tableFields.stream().map(field -> field.getOriginColumnName().get()).collect(Collectors.toSet()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please extract variable for column names, also please use toImmutableSet()

.collect(toImmutableList()));
});

//filter at the end for preserving the global order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set<String> accessibleColumns = accessControl.filterColumns(
session.toSecurityContext(),
table.asCatalogSchemaTableName(),
tableFields.stream().map(field -> field.getOriginColumnName().get()).collect(toImmutableSet()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

tableFields.stream()
     .map(field -> field.getOriginColumnName().get())
     .collect(toImmutableSet()));

List<Field> accessibleFields = new ArrayList<>();

//collect fields by table
ListMultimap<QualifiedObjectName, Field> tableFieldsMap = Multimaps.newListMultimap(new HashMap<QualifiedObjectName, Collection<Field>>(), () -> new LinkedList<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListMultimap<QualifiedObjectName, Field> tableFieldsMap = ArrayListMultimap.create();

core/trino-main/src/main/java/io/trino/FeaturesConfig.java Outdated Show resolved Hide resolved
@@ -124,6 +124,18 @@ In addition, Trino :doc:`provides an API </develop/system-access-control>` that
allows you to create a custom access control method, or to extend an existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosabua Can you please double check wording in the docs here?

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed for docs and suggested changes. Also need confirmation if this is a global property or a catalog property.

With hide-inaccessible-columns configuration toggle now it is
possible to exclude non accessible from  SELECT * statement
@kokosing kokosing merged commit e8bcfe2 into trinodb:master Dec 16, 2021
@kokosing kokosing mentioned this pull request Dec 16, 2021
8 tasks
@kokosing
Copy link
Member

Merged, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Extend AccessControl functionality to filter columns from SELECT * instead of throwing
4 participants