**Note about define custom apachebeam template and run that template**
- content: build custom apache beam data pipeline flex template and run it with
- reference link: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/473749daea6b1f1d5a6a8826093a970f03ce0517/dataflow/flex-templates/streaming_beam


In [None]:
#list google cloud services with --filter flag, filter services with name contain [app_name]
#gcloud services list --available take too long
!gcloud services list --available --filter="name~[app_name]" 

In [None]:
#enable api service: app engine, cloud scheduler, 
gcloud services enable $service_name

In [None]:
#create cloud storage bucket with command: gsutil mb
export BUCKET="your-gcs-bucket"
gsutil mb gs://$BUCKET

In [None]:
# create topic and subscription with command: gcloud pubsub
export TOPIC="messages"
export SUBSCRIPTION="$TOPIC"

gcloud pubsub topics create $TOPIC
gcloud pubsub subscriptions create --topic $TOPIC $SUBSCRIPTION

In [None]:
#create scheduled job to push data to pubsub with job name is positive-rating-publisher, schedule, topic, message-body
gcloud scheduler jobs create pubsub positive-ratings-publisher \
  --schedule="* * * * *" \
  --topic="$TOPIC" \
  --message-body='{"url": "https://beam.apache.org/", "review": "positive"}'
#trigger job with run command
gcloud scheduler jobs run positive-ratings-publisher

#create scheduled job which push message to
gcloud scheduler jobs create pubsub negative-ratings-publisher \
  --schedule="*/2 * * * *" \
  --topic="$TOPIC" \
  --message-body='{"url": "https://beam.apache.org/", "review": "negative"}'
#trigger the job
gcloud scheduler jobs run negative-ratings-publisher

In [None]:
#make bigquery dataset with bq mk
#bq cli reference: https://cloud.google.com/bigquery/docs/reference/bq-cli-reference
export PROJECT="$(gcloud config get-value project)"
export DATASET="beam_samples"
export TABLE="streaming_beam"

bq mk --dataset "$PROJECT:$DATASET"

In [None]:
#build apache beam container image with cloud build command

#set config to use kaniki tools as cache tools
gcloud config set builds/use_kaniko True
#build docker container and submit it to container registry
gcloud builds submit --tag "$dataflow_image" .

In [None]:
#build flex template with gcloud dataflow flex-template build command. this command will build template from docker image and store it as a json file in at cloud storage
#metadata.json reference link: https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#metadata
#metadata.json is a file which we define template meta information like name, description, parameters
gcloud dataflow flex-template build $template_path \
  --image "$dataflow_image" \
  --sdk-language "PYTHON" \
  --metadata-file "metadata.json"

In [None]:
#run dataflow flex template with command: gcloud dataflow flex-template run
#in our case, we want to run flex template from cloud function
export REGION="us-central1"

# Run the Flex Template.
gcloud dataflow flex-template run "streaming-beam-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location "$TEMPLATE_PATH" \
    --parameters input_subscription="projects/$PROJECT/subscriptions/$SUBSCRIPTION" \
    --parameters output_table="$PROJECT:$DATASET.$TABLE" \
    --region "$REGION"

In [None]:
#copy test_data.csv to trigger bucket
gsutil cp ./data/test_data.csv gs://fce2845e810918fb-gcf-trigger-bucket/

**Using Individual service account for your functions**
- to deploy cloud function with individual service account. Cloud functions will use indivisual service-account for authentication
- we can use individual service account at deployment
- reference link: https://cloud.google.com/functions/docs/securing/function-identity#individual 

**Securities and Permissions for pipeline on Google Cloud**
- to understand securities and permission for dataflow pipeline on google cloud
- dataflow pipeline have two types of service account
    - dataflow service account:
        - this service account is used as a resource manager for dataflow pipeline. For example create vm and assign job to vm workder
        - this account is created and managed by google. Do not touch it.
        - format of the serive account email: service-<project-number>@dataflow-service-producer-prod.iam.gserviceaccount.com
        - reference link: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#df-service-account
    - worker service account
        - worker service account is service account used by worker (compute engine vm) to access your data pipeline files and other resources
        - worker service account must have two below role to create, run and exame job:
            - roles/dataflow.admin
            - roles/dataflow.worker
        - default workder service account
            - by default dataflow worker will use compute engine default sevice account for authentication
            - compute engine default service account will be auto-created when you enable compute engine api.
            - compute engine default service account format: <project-number>-compute@developer.gserviceaccount.com
            - compute engine default service account have some predefine permission which make authentication eaiser but it is recommented to use custom service account in production for more detail access control.
        - specify a user-managed worker service account
            - we can specify custom worker service account on job deployment. In our case is api call
            - for more detail for roles and permmission for the user-managed worker service account take a look at reference link: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker-service-account 

In [None]:
#goal of today:
    #specify temp bucket for dataflow flex template. this should solve permission denied for create temp directory
    #grant access to bucket and job for default worker service account
    #for grant access to bucket take a look at reference link: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#accessing_gcs

#assign compute engine default service account (dataflow worker service account) using gutils acl:
    # acl (access control list): is a set of command line tool to grant access to specific bucket to specific account
    # reference for gsutil acl command: https://cloud.google.com/storage/docs/gsutil/commands/acl

#gran worker service account to temp, src, dest
gsutil acl ch -u "$worker_sa:OWNER" $bucket_one
gsutil acl ch -u "$worker_sa:OWNER" $bucket_two
gsutil acl ch -u "$worker_sa:OWNER" $bucket_three

**Understand stagging and temp location**
- reference link: https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#understand_staging_location_and_temp_location
- stagging location is where files is written during stagging process
- temp location is where fiesl is written during execution step
- we can specify stagging location and temp location when create flex template job both with cli or api call

In [None]:
#copy test_data.csv to trigger bucket
gsutil cp ./data/test_data.csv gs://fce2845e810918fb-gcf-trigger-bucket/

googleapiclient.errors.HttpError: <HttpError 403 when requesting https://dataflow.googleapis.com/v1b3/projects/airflow-gke-338120-352104/locations/asia-southeast1/flexTemplates:launch?alt=json returned "(c3d2125d4ad0a248): Current user cannot act as service account 149838564778-compute@developer.gserviceaccount.com. Enforced by Org Policy constraint constraints/dataflow.enforceComputeDefaultServiceAccountCheck. https://cloud.google.com/iam/docs/service-accounts-actas Causes: (c3d2125d4ad0ac16): Current user cannot act as service account 149838564778-compute@developer.gserviceaccount.com. Please grant your user account one of [Owner, Editor, Service Account Actor] roles, or any other role that includes the iam.serviceAccounts.actAs permission. See https://cloud.google.com/iam/docs/service-accounts-actas for additional details.". Details: "(c3d2125d4ad0a248): Current user cannot act as service account 149838564778-compute@developer.gserviceaccount.com. Enforced by Org Policy constraint constraints/dataflow.enforceComputeDefaultServiceAccountCheck. https://cloud.google.com/iam/docs/service-accounts-actas Causes: (c3d2125d4ad0ac16): Current user cannot act as service account 149838564778-compute@developer.gserviceaccount.com. Please grant your user account one of [Owner, Editor, Service Account Actor] roles, or any other role that includes the iam.serviceAccounts.actAs permission. See https://cloud.google.com/iam/docs/service-accounts-actas for additional details.">

In [None]:
#get service account 
gcloud iam service-accounts get-iam-policy $worker_sa \
    --format=json > policy.json
    

In [None]:
#what is the error:
    #can not set policy for resource
    #because do not have Permission iam.serviceAccounts.setIamPolicy
│ Error: Error setting IAM policy for service account 'projects/airflow-gke-338120-352104/serviceAccounts/149838564778-compute@developer.gserviceaccount.com': googleapi: Error 403: Permission iam.serviceAccounts.setIamPolicy is required to perform this operation on service account projects/airflow-gke-338120-352104/serviceAccounts/149838564778-compute@developer.gserviceaccount.com., forbidden

In [None]:
#gran roles/iam.serviceAccountUser to worker_sa
!gcloud iam service-accounts add-iam-policy-binding $worker_sa \
    --member="serviceAccount:${worker_sa}" --role="roles/iam.serviceAccountUser"

In [None]:
#gran roles/iam.serviceAccountUser to worker_sa
!gcloud iam service-accounts add-iam-policy-binding $worker_sa \
    --member="serviceAccount:${function_sa}" --role="roles/iam.serviceAccountUser"

In [None]:
#copy file from local to gs bucket to trigger dataflow
gsutil cp ./data/test_data.csv gs://fce2845e810918fb-gcf-trigger-bucket/