bigquery dataset view access creates circular dependency #2686

davehowell · 2018-12-17T04:50:03Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.11
provider.google v1.19.1

Affected Resource(s)

google_bigquery_dataset
google_bigquery_table

Terraform Configuration Files

resource "google_bigquery_dataset" "public" {
  dataset_id    = "public"
  access {
    role           = "READER"
    group_by_email = "${var.external_group_email}"
  }
}

resource "google_bigquery_dataset" "private" {
  dataset_id    = "private"
// This is a circular dependency
// The private dataset requires access for public dataset views that require private dataset tables that require private dataset.
// It should be possible to do this separately e.g. using IAM Binding:
  access {
    view {
      project_id = "${var.project}"
      dataset_id = "public"
      table_id   = "a_view_that_selects_from_private_tables"
    }
  }
}

resource "google_bigquery_table" "real_table" {
  dataset_id  = "${google_bigquery_dataset.private.dataset_id}"
  table_id    = "real_table"
  schema = "${file("schema_real_table.json")}"
}

resource "google_bigquery_table" "filtered_real_table" {
  dataset_id  = "${google_bigquery_dataset.public.dataset_id}"
  table_id    = "filtered_real_table"
  view {
    query = <<SQL
SELECT some columns
FROM `${var.project}`.public.real_table
WHERE user_email = SESSION_USER()
SQL
    use_legacy_sql = "false"
  }

 // As a side note, it seems like adding an explicit dependency here is required because Terraform can't be expected to parse the BigQuery inline SQL view definition an reflect other bigquery table resources.
 depends_on = ["google_bigquery_table.real_table"]
}

Debug Output

Error: Error applying plan:
2 error(s) occurred:

google_bigquery_table.filtered_real_table: googleapi: Error 404: Not found: Dataset project_example:private, notFound
google_bigquery_dataset.private: 1 error(s) occurred:
google_bigquery_dataset.private: googleapi: Error 400: View project_example:public.filtered_real_table to authorize not found., invalid

Panic Output

NA

Expected Behavior

Either the bigquery dataset's "access entries" for view access should be deferred until after the datasets, tables and views are created, or we need an explicit new resource for adding view access to datasets, called something like google_bigquery_dataset_access_entries

Actual Behavior

The view can't be created because it depends on another dataset,
That other dataset can't be created because it depends on the view

A circular dependency exists.

Steps to Reproduce

Run the example provided above.

Note it is incomplete, it needs a var.project and an existing GCP project, I'm not defining projects in tf, although you could modify it to also create a project.

It's also using var.external_group_email to illustrate the use case, however, this access to the public dataset can just be deleted, it's not related to the error.

Important Factoids

Is this a bug, or perhaps the docs are wrong and I can't use this for my use case? I think it is the former, based on what other GCP clients allow (see python example below)

In the documentation for google_bigquery_dataset it says:

(Optional) A view from a different dataset to grant access to. Queries executed against that view will have read access to tables in this dataset.

In the test defined for this, testAccBigQueryDatasetWithViewAccess it is not actually testing a view that references tables in the dataset. The test uses a view that appears to be referencing some other project: SELECT state FROM [lookerdata:cdc.project_tycho_reports]

I'm not sure with the current setup that it is possible to create a dataset with view access for a view that "will have read access to tables in this dataset", without creating a circular dependency, as I have experienced.

I think the access entries need to be applied separately.

In my example this would result in a dag with this dependency order

private dataset
private table(s)
public dataset
public view(s)
private dataset's access entries for the public view(s)

References

#2051 is a request to add IAM bindings and other IAM things for bigquery, which is somewhat related. View access to a dataset is not part of IAM though.

Test for the bigquery_dataset view access : https://github.com/terraform-providers/terraform-provider-google/blob/2.0.0/google/resource_bigquery_dataset_test.go

#644 original issue requesting this access control functionality

Functionality added in this PR: #1931

Workaround

I am using this Python script to add dependencies after the datasets, views and tables are created. Sorry it's not golang:

In your tf file, create a data external resource , and make sure it has explicit depends_on defined for all the views you want access entries created for and also the dataset the views need access to.

This will apply permission to ALL views in the dataset passed in to the query.public attribute. If you want to only specify specific view permission you'll have to modify the script and allow a list to be passed in to the query parameter.
If you define access blocks in the dataset (e.g. for roles or special groups) then subsequent runs of this workaround will lead to the resource definition for the dataset removing the view access which will then be re-added by the python script. i.e. it will cause changes to permissions even though technically nothing has changed. One potential workaround to that is use a lifecycle { ignore_changes = ["access"] } block in the dataset resource and make it ignore changes to access. That ignore would need to be temporarily removed if you do want to make real access changes.

data "external" "private_dataset_view_access" {
  program = ["python", "${path.module}/bq_private_dataset_access_entries.py"]

  query = {
    gcp_project = "${var.project}"
      private = "${google_bigquery_dataset.private.dataset_id}" # the dataset_id that views required access to
    public = "${google_bigquery_dataset.public.dataset_id}"  # the dataset where all the views exist
  }

  depends_on = ["google_bigquery_dataset.private", "google_bigquery_table.filtered_real_table"]
}

bq_private_dataset_access_entries.py

#!/usr/bin/env python
"""
Grant permission to all the SQL views in a public Bigquery dataset to a private dataset.
e.g. all views in dataset `public` can access underlying tables that the views use,
even though the underlying tables exist in another dataset called `private`.

Note, Biquery calls these `datasets` where really they are different catalogs/databases.
Permissions will be overwritten!

See
https://cloud.google.com/bigquery/docs/access-control#overview
https://cloud.google.com/bigquery/docs/share-access-views

"""

import sys
import os
import json
from google.cloud import bigquery

# some global defaults for direct invocation of this script testing purposes
DEV_PROJECT = 'dev_gcp_project_name_here_for_testing'
PRIVATE_DATASET = 'private'
PUBLIC_DATASET = 'public'

current_dir = os.path.dirname(__file__)

# Parse the query from terraform
env = None
if len(sys.argv) > 1:
    env = sys.argv[1]
    if env not in ('--dev':
        raise SyntaxError("cli args expected --dev")
    private_dataset_id = PRIVATE_DATASET
    public_dataset_id = PUBLIC_DATASET
    if env == '--dev':
        gcp_project = DEV_PROJECT
else:
    query = json.load(sys.stdin)
    # Required args from terraform
    gcp_project = query['gcp_project']
    private_dataset_id = query['private']
    public_dataset_id = query['public']

client = bigquery.Client(project=gcp_project)

# Get the private dataset
private_dataset_ref = client.dataset(private_dataset_id)
private_dataset = bigquery.Dataset(private_dataset_ref)
private_dataset = client.get_dataset(private_dataset)

# Get the public dataset
public_dataset_ref = client.dataset(public_dataset_id)
public_dataset = bigquery.Dataset(public_dataset_ref)
public_dataset = client.get_dataset(public_dataset)

# Get all the public views
public_view_list_items = [view
                for view in client.list_tables(public_dataset_ref)
                if view.table_type == 'VIEW'
                ]

# Authorize all the views in the public dataset to access the private dataset
access_entries = private_dataset.access_entries
for public_view_item in public_view_list_items:
    public_view_ref = public_dataset.table(public_view_item.table_id)
    view = bigquery.Table(public_view_ref)
    view = client.get_table(view)
    access_entries.append(
        bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
    )

# Applying access entries is idempotent, de-duplication necessary for subsequent runs
deduped_access_entries = list()
seen_access_entries = set()
for access_entry in access_entries:
    hashed_entry = hash(frozenset(access_entry.entity_id))
    if hashed_entry not in seen_access_entries:
        seen_access_entries.add(hashed_entry)
        deduped_access_entries.append(access_entry)

private_dataset.access_entries = deduped_access_entries
private_dataset = client.update_dataset(
    private_dataset, ['access_entries'])  

# Output simplified result to Terraform.
dataset_json = private_dataset.to_api_repr()
return_value = dict()
return_value['projectId'] = dataset_json['datasetReference']['projectId']
return_value['datasetId'] = dataset_json['datasetReference']['datasetId']
return_value['access'] = json.dumps(dataset_json['access'])

json.dump(return_value, sys.stdout)
sys.stdout.write('\n')

The text was updated successfully, but these errors were encountered:

davehowell · 2018-12-17T04:51:26Z

@chrismorgan-hb @danawillow I would really appreciate your comments, or close if this is not a bug, thanks.

danawillow · 2018-12-17T19:10:12Z

I think having it as a separate resource is going to be the cleanest solution to this. When Terraform is running, it creates the dependency graph and then creates/updates each resource individually in order. So, if you were to create everything without the view and then manually edit your config to add the view later, that should work just fine (as long as you aren't introducing a circular dependency via interpolation). However, I agree that users should be able to set up their infrastructure by running Terraform just once, so this should be fixed. Since this seems like something that can be worked around, I don't think we can realistically prioritize fixing it immediately, but if you open a PR for the fix someone will look at it.

davehowell · 2018-12-17T22:55:54Z

Thank you Dana for your comments. If I get around to learning golang then sure I would love to submit a PR, but realistically that's unlikely in the short term. In the meantime yes I've worked around this using the above python script called by a data.external resource where I have defined explicit depends_on. ~~I'll post that here~~ I've added that to the issue described above if others find that a useful workaround, however perhaps a note should be made in the docs that access block currently doesn't work as intended for views.

I agree its probably the least common use case for adding access so not a high priority especially given the workaround.

One important thing to note if anyone does get around to this is that "access entries" for bigquery datasets are overwrite-only. For example, if you define a dataset with access for a bunch of things, and then later try to update it with view access, it will overwrite the access to just views. In the script above I have considered this and pre-fetch the existing entries before appending the view access and overwriting.

nat-henderson · 2018-12-18T00:02:36Z

The "overwrite-only" nature won't be a problem - that's a common pattern in bigquery IAM. Added the help-wanted label to solicit PRs for this one. :)

davehowell · 2018-12-18T23:51:59Z

One side effect I found when I "create everything without the view and then manually edit your config to add the view later" is that it ends up always thinking there is a change in the dataset resource.
I assume this is the order of things, e.g in a dataset with 1 access entry defined in the google.bigquery_dataset and 1 defined in a data.external script

terraform apply modifies the real-world infrastructure to have 1 access entry as defined in google.bigquery_dataset
The state file is updated to reflect that the dataset has 1 access entry
The script re-adds the 2 access entries
The state file still says it has 1 access entry

The next time I run terraform apply it

does a refresh from the real-world infrastructure and sees that there are 2 access entries, compared to the 1 in the state file, which
causes it to change the real world infrastructure back to 1 access entry
script then adds the second access entry

It's not a huge deal but does result in a short period of time where view access is revoked, which is not ideal. For now I've added a lifecycle { ignore_changes = ["access"] } block to the dataset resource, but it's pretty agricultural.

danawillow · 2018-12-21T20:42:55Z

That particular behavior is intended- one of Terraform's features is detecting drift. So if you create the resource with Terraform, change it later, and then run Terraform again, it'll try to change it back to what you've specified in your config.

Tony-Proum · 2020-02-07T16:08:32Z

trying to figure out how to find a way to solve this issue, I found this issue hashicorp/terraform#4149
It make me thought that it could be related and that this circular dependency issue could not be easily solved without "Partial/Progressive Configuration Changes" in the Terraform core project

Tony-Proum · 2020-02-07T16:18:56Z

Also maybe this terraform cloud feature may help, https://www.hashicorp.com/blog/creating-infrastructure-pipelines-with-terraform-cloud-run-triggers/ ?

hden · 2020-04-03T01:17:13Z

Now that GoogleCloudPlatform/magic-modules#3312 is merged, can we move forward to fined grained access control such as #5520?

danawillow · 2020-04-03T01:45:02Z

@hden, that PR added fine-grained access control.

hden · 2020-04-03T02:25:56Z

I meant as a terraform resource, or did I missed the document somewhere?

danawillow · 2020-04-03T17:12:22Z

Once it gets released (it should appear in version 3.17.0) it'll be usable as a google_bigquery_dataset_access resource. Here's a preview of what the docs page will look like: https://github.com/terraform-providers/terraform-provider-google/blob/master/website/docs/r/bigquery_dataset_access.html.markdown

mbeken · 2020-04-10T03:23:18Z

Hi - just curious, is there an expected release date for 3.17.0?
Thanks

danawillow · 2020-04-10T16:21:31Z

Monday!

mbeken · 2020-04-10T16:36:26Z

That is awesome! I could have guessed that based on your history of releasing weekly! Big impediment removed! Thank you.

ghost · 2020-05-03T13:50:00Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

ghost added the bug label Dec 17, 2018

danawillow added the new-resource label Dec 17, 2018

nat-henderson added the help wanted label Dec 18, 2018

danawillow added the priority/2 label Dec 3, 2019

griffinator76 mentioned this issue Dec 13, 2019

"terraform apply" is not idempotent with a bigquery dataset with an access #5152

Closed

rileykarson mentioned this issue Dec 13, 2019

Ability to edit existing BigQuery dataset ACL access #3990

Closed

danawillow mentioned this issue Dec 16, 2019

Updating IAM Members at bigquery dataset level #5174

Closed

danawillow added enhancement size/m and removed bug labels Dec 16, 2019

danawillow added this to the Goals milestone Dec 16, 2019

danawillow mentioned this issue Feb 3, 2020

google_bigquery_dataset_iam_member #5520

Closed

danawillow modified the milestones: Goals, Near-Term Goals Feb 10, 2020

danawillow modified the milestones: Near-Term Goals, Sprint 8 Feb 24, 2020

danawillow self-assigned this Mar 20, 2020

rileykarson modified the milestones: Sprint 8, Sprint 9 Mar 23, 2020

danawillow mentioned this issue Mar 27, 2020

add bigquery_dataset_access fine-grained resource GoogleCloudPlatform/magic-modules#3312

Merged

danawillow closed this as completed in GoogleCloudPlatform/magic-modules#3312 Apr 3, 2020

modular-magician mentioned this issue Apr 3, 2020

add bigquery_dataset_access fine-grained resource #6035

Merged

This was referenced Apr 3, 2020

add bigquery_dataset_access fine-grained resource ansible-collections/google.cloud#187

Merged

add bigquery_dataset_access fine-grained resource hashicorp/terraform-provider-google-beta#1924

Merged

ghost locked and limited conversation to collaborators May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigquery dataset view access creates circular dependency #2686

bigquery dataset view access creates circular dependency #2686

davehowell commented Dec 17, 2018 •

edited

davehowell commented Dec 17, 2018

danawillow commented Dec 17, 2018

davehowell commented Dec 17, 2018 •

edited

nat-henderson commented Dec 18, 2018

davehowell commented Dec 18, 2018

danawillow commented Dec 21, 2018

Tony-Proum commented Feb 7, 2020

Tony-Proum commented Feb 7, 2020

hden commented Apr 3, 2020

danawillow commented Apr 3, 2020

hden commented Apr 3, 2020

danawillow commented Apr 3, 2020

mbeken commented Apr 10, 2020 •

edited

danawillow commented Apr 10, 2020

mbeken commented Apr 10, 2020

ghost commented May 3, 2020

bigquery dataset view access creates circular dependency #2686

bigquery dataset view access creates circular dependency #2686

Comments

davehowell commented Dec 17, 2018 • edited

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

Workaround

davehowell commented Dec 17, 2018

danawillow commented Dec 17, 2018

davehowell commented Dec 17, 2018 • edited

nat-henderson commented Dec 18, 2018

davehowell commented Dec 18, 2018

danawillow commented Dec 21, 2018

Tony-Proum commented Feb 7, 2020

Tony-Proum commented Feb 7, 2020

hden commented Apr 3, 2020

danawillow commented Apr 3, 2020

hden commented Apr 3, 2020

danawillow commented Apr 3, 2020

mbeken commented Apr 10, 2020 • edited

danawillow commented Apr 10, 2020

mbeken commented Apr 10, 2020

ghost commented May 3, 2020

davehowell commented Dec 17, 2018 •

edited

davehowell commented Dec 17, 2018 •

edited

mbeken commented Apr 10, 2020 •

edited