Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior: Columns referenced by access control rules are always included in eventlistener.TableInfo.columns #21600

Open
eeshugerman opened this issue Apr 17, 2024 · 3 comments
Assignees

Comments

@eeshugerman
Copy link

eeshugerman commented Apr 17, 2024

Scenarios

1) SELECTing an unmasked column

Given

  • An event listener plugin is installed which implements queryCompleted(QueryCompletedEvent queryCompletedEvent)
  • A table (my_table) has a "mask" access control rule applied to a column (my_masked_column), where the rule expression references another column (my_rule_input_column) in the table
  • A SQL statement referencing a third column is executed: SELECT my_selected_column FROM my_table;

Expected behavior

In the event listener plugin, queryCompletedEvent.getMetadata().getTables()[0].getColumns() includes only my_selected_column.

Actual behavior

getColumns() includes two columns: my_selected_column and my_rule_input_column.

2) SELECTing a masked column

Given

  • An event listener plugin is installed which implements queryCompleted(QueryCompletedEvent queryCompletedEvent)
  • A table (my_table) has a "mask" access control rule applied to a column (my_selected_column), where the rule expression references another column (my_rule_input_column) in the table
  • A SQL statement referencing the masked column is executed: SELECT my_selected_column FROM my_table;

Expected behavior

In the event listener plugin, queryCompletedEvent.getMetadata().getTables()[0].getColumns() includes only my_selected_column.

Actual behavior

getColumns() includes two columns: my_selected_column and my_rule_input_column.

3) SELECTing from a filtered table

Given

  • An event listener plugin is installed which implements queryCompleted(QueryCompletedEvent queryCompletedEvent)
  • A table (my_table) has a "filter" access control rule applied to it, where the rule expression references a column (my_rule_input_column) in the table
  • A SQL statement referencing another column in the table is executed: SELECT my_selected_column FROM my_table;

Expected behavior

In the event listener plugin, queryCompletedEvent.getMetadata().getTables()[0].getColumns() includes only my_selected_column.

Actual behavior

getColumns() includes two columns: my_selected_column and my_rule_input_column.

Discussion

To me, scenario (1) definitely looks like a bug. I see no argument for how the current behavior makes sense or is otherwise desirable.

Scenario (2) and (3) are a bit murkier. On the one hand, it's not what I would expect, nor is it what I happen to need to meet the requirements for the plugin I'm building. On the other hand, for these cases, the result of the query is dependent on my_rule_input_column, so I can see an argument for how the current behavior makes sense.

If it is decided that the current behavior is the desired behavior for scenarios (2) and (3), my next question would be: What if an event listener plugin wants to know exactly which columns are referenced explicitly (or via views) by a SQL statement? Can we add another field for that?

Full reproducible example

This example demonstrates scenarios (1) and (3).

With this rules.json:

{
  "catalogs": [
    {
      "catalog": ".*",
      "allow": "all"
    }
  ],
  "tables": [
    {
      "catalog": "tpch",
      "schema": "tiny",
      "table": "customer",
      "privileges": ["SELECT"],
      "columns": [
        {
          "name": "acctbal",
          "mask": "nationkey + 1.0"
        }
      ],
      "filter": "custkey > 800"
    },
    {
      "catalog": "tpch",
      "schema": "tiny",
      "table": ".*",
      "privileges": ["SELECT"]
    }
  ]
}

Then, for this query:

select name from tpch.tiny.customer limit 10;

I see this in the web UI at /ui/api/query/{queryId}?pretty:

  "referencedTables" : [
    {
      "catalog": "tpch",
      "schema": "tiny",
      "table": "customer",
      "authorization": "jeff@immuta.com",
      "filters": [
        "(custkey > 800)"
      ],
      "columns": [
        {
          "column": "nationkey"
        },
        {
          "column": "custkey"
        },
        {
          "column": "name"
        }
      ],
      "directlyReferenced": true
    }
  ],

Whereas I would expect to see only name in the columns array.

@kokosing
Copy link
Member

@Praveen2112 Can you please take a look?

@Praveen2112
Copy link
Member

@kokosing Okay

@Praveen2112 Praveen2112 self-assigned this Apr 18, 2024
@eeshugerman
Copy link
Author

Hi @Praveen2112, any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants