Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update StatementAnalyzer and extend AccessControl interface and to filter columns from table schema instead of throwing #7893

Closed
wants to merge 1 commit into from

Conversation

youngchen7
Copy link
Contributor

@youngchen7 youngchen7 commented May 12, 2021

Initial PR for AccessControl interface changes, initial implementations, and StatementAnalyzer integration.

Will update specific AccessControl implementations (possibly FileBased) to support this feature.

Fixes #7461

@cla-bot
Copy link

cla-bot bot commented May 12, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Young Chen.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented May 12, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Young Chen.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests.

@cla-bot
Copy link

cla-bot bot commented May 26, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Young Chen.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented May 26, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@youngchen7
Copy link
Contributor Author

Cleaned up the git commit history a bit to remove non github emails - just submitted the CLA as well.

@youngchen7
Copy link
Contributor Author

Please add tests.

Added some unit tests similar to rowFilter and columnMask. However, one concern I have is the interaction with INSERT/UPDATE/DELETE.

We would typically expect the AccessControl implementation to restrict access to INSERT/UPDATE/DELETE if a user doesn't have full access to the schema. However, it's possible for this to not be implemented so we would probably want a defensive check in the StatementAnalyzer.

Perhaps this would be better implemented via getTableSchemaFilteredColumns or getFilteredColumns similar to how getRowFilter and getColumnMask are done. This would let us easily check access via .isEmpty() and add restrictions in the engine itself.

Or could there be a use case for inserting partial data into a table and having the inaccessible columns set to default? It seems like this would be unexpected/error prone behavior for the most part in my opinion.

What are your thoughts?

@cla-bot
Copy link

cla-bot bot commented Jun 2, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@martint martint self-requested a review June 2, 2021 14:59
@cla-bot
Copy link

cla-bot bot commented Jun 4, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Jun 4, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@martint
Copy link
Member

martint commented Jun 11, 2021

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jun 11, 2021
@cla-bot
Copy link

cla-bot bot commented Jun 11, 2021

The cla-bot has been summoned, and re-checked this pull request!

@@ -222,10 +222,15 @@
void checkCanShowColumns(SecurityContext context, CatalogSchemaTableName table);

/**
* Filter the list of columns to those visible to the identity.
* Filter the list of columns to those visible to the identity when querying metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a better explanation. What's the difference between "when querying metadata" vs "when querying the table"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterColumns hides table metadata when running DESCRIBE or SHOW TABLE, as well as when querying the INFORMATION_SCHEMA tables. However, when actually selecting data from the table it doesn't hide the corresponding columns. I'm not really sure how else to describe this so any suggestions are welcome!

I also can't really think of a practical reason why we would want to control these separately. @kokosing mentioned that it would be beneficial to be able to hide column metadata & throw errors on access denied, vs hiding columns completely so that's why I'm leaving it here.

(Slack link: https://trinodb.slack.com/archives/CP1MUNEUX/p1620418412085600?thread_ts=1618261084.236200&cid=CP1MUNEUX)

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits.

This looks good to me. I would like to hear from others if they like it. CC: @martint @findepi

@findepi
Copy link
Member

findepi commented Jun 22, 2021

This looks good to me. I would like to hear from others if they like it. CC: @martint @findepi

if you ask about my opinion, I can only repeat that for me this shouldn't be in AccessControl at all.
#7461 (comment)
https://trinodb.slack.com/archives/CP1MUNEUX/p1620422082086900?thread_ts=1618261084.236200&cid=CP1MUNEUX

@youngchen7
Copy link
Contributor Author

Please squash commits.

This looks good to me. I would like to hear from others if they like it. CC: @martint @findepi

Ah does the merge process squash commits? I can rebase and squash all on my branch but I figured having history here would be useful.

@kokosing
Copy link
Member

Ah does the merge process squash commits? I

We usually do Rebase and merge. Sometimes we do Squash and merge in case where external contributors are not well familiar with git, so we could save communication round trips.

I figured having history here would be useful.

That depends on history kind. We care a lot about the history, so this is why how change is structured between commits and what are the commit messages is also a part of code review. However we prefer not merge anything where change goes back and forth between commits in the same PR. This makes entire git history much cleaner. Notice that writing code is creative process and it is natural to got back and forth when writing the code, however when visiting history such small steps are not helpful.

@youngchen7 Would you like to add the above to https://github.com/trinodb/trino/blob/master/DEVELOPMENT.md (in separate PR)? I see it is not clear to the community members sometimes and it is not the first time I explain this. That way it will be easier for others to contribute.

@youngchen7
Copy link
Contributor Author

Ah does the merge process squash commits? I

We usually do Rebase and merge. Sometimes we do Squash and merge in case where external contributors are not well familiar with git, so we could save communication round trips.

I figured having history here would be useful.

That depends on history kind. We care a lot about the history, so this is why how change is structured between commits and what are the commit messages is also a part of code review. However we prefer not merge anything where change goes back and forth between commits in the same PR. This makes entire git history much cleaner. Notice that writing code is creative process and it is natural to got back and forth when writing the code, however when visiting history such small steps are not helpful.

@youngchen7 Would you like to add the above to https://github.com/trinodb/trino/blob/master/DEVELOPMENT.md (in separate PR)? I see it is not clear to the community members sometimes and it is not the first time I explain this. That way it will be easier for others to contribute.

Sounds good - I'll open a new PR for the DEVELOPMENT.md change. I'll squash the commits in this PR.

@youngchen7 youngchen7 force-pushed the trino-7461 branch 2 times, most recently from be3a750 to 2e0e909 Compare June 28, 2021 17:15
@youngchen7
Copy link
Contributor Author

@youngchen7 Would you like to add the above to https://github.com/trinodb/trino/blob/master/DEVELOPMENT.md (in separate PR)? I see it is not clear to the community members sometimes and it is not the first time I explain this. That way it will be easier for others to contribute.

#8411

@martint martint self-requested a review June 28, 2021 20:36
/**
* Filter the list of columns to those visible to the identity when querying the table.
*/
Set<String> filterTableSchema(SecurityContext context, QualifiedObjectName tableName, Set<String> columns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this method different from filterColumns above?

  • Why do we need two methods?
  • What's the occasion where SELECT * would want to give you different set of columns than SHOW COLUMNS FROM ...

@kokosing
Copy link
Member

I think we should continue with #9991, the simple feature toogle is easier to reason than changing the access control I think.

@findepi
Copy link
Member

findepi commented Nov 29, 2021

I think we should continue with #9991, the simple feature toogle is easier to reason than changing the access control I think.

@kokosing to me, it's not about simplicity, but what's the right thing to do.
per #7893 (comment) we shouldn't change AccessControl at all.

@bitsondatadev
Copy link
Member

👋 @youngchen7 - this PR is inactive and doesn't seem to be under development, and it might already be implemented in #7461. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Extend AccessControl functionality to filter columns from SELECT * instead of throwing
5 participants