The Medium article for this use case :
- Copy the file from the project
gcs_input_file/input_teams_stats_raw.json
to the input bucket - Create the
team_stat
BigQuery
table, the script and theBigQuery
schema are proposed in thebigquery_table_scripts
folder
mvn compile exec:java \
-Dexec.mainClass=fr.groupbees.application.TeamLeagueApp \
-Dexec.args=" \
--project=gb-poc-373711 \
--runner=DataflowRunner \
--jobName=team-league-java-job-$(date +'%Y-%m-%d-%H-%M-%S') \
--region=europe-west1 \
--streaming=false \
--zone=europe-west1-d \
--tempLocation=gs://mazlum_dev/dataflow/temp \
--gcpTempLocation=gs://mazlum_dev/dataflow/temp \
--stagingLocation=gs://mazlum_dev/dataflow/staging \
--serviceAccount=sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com \
--inputJsonFile=gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json \
--inputFileSlogans=gs://mazlum_dev/team_league/input/json/input_team_slogans.json \
--teamLeagueDataset=mazlum_test \
--teamStatsTable=team_stat \
--jobType=team_league_java_ingestion_job \
--failureOutputDataset=mazlum_test \
--failureOutputTable=job_failure \
--failureFeatureName=team_league \
" \
-Pdataflow-runner
Build image with Dockerfile, Cloud Build and all the needed dependencies installed in the container :
gcloud builds submit --tag europe-west1-docker.pkg.dev/gb-poc-373711/internal-images/dataflow/team-league-java:latest .
mvn clean package
gcloud dataflow flex-template build gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json \
--image-gcr-path "europe-west1-docker.pkg.dev/gb-poc-373711/internal-images/dataflow/team-league-java:latest" \
--sdk-language "JAVA" \
--flex-template-base-image JAVA11 \
--metadata-file "config/metadata.json" \
--jar "target/teams-league-0.1.0.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="fr.groupbees.application.TeamLeagueApp"
gcloud dataflow flex-template run "team-league-java-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location "gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json" \
--project=gb-poc-373711 \
--region=europe-west1 \
--temp-location=gs://mazlum_dev/dataflow/temp \
--staging-location=gs://mazlum_dev/dataflow/staging \
--parameters serviceAccount=sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com \
--parameters inputJsonFile=gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json \
--parameters inputFileSlogans=gs://mazlum_dev/team_league/input/json/input_team_slogans.json \
--parameters teamLeagueDataset=mazlum_test \
--parameters teamStatsTable=team_stat \
--parameters jobType=team_league_java_ingestion_job \
--parameters failureOutputDataset=mazlum_test \
--parameters failureOutputTable=job_failure \
--parameters failureFeatureName=team_league
export PROJECT_ID={{your_project_id}}
export LOCATION={{your_location}}
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config dataflow-deploy-template-dockerfile-all-dependencies.yaml \
--substitutions _REPO_NAME="internal-images",_IMAGE_NAME="dataflow/team-league-java",_IMAGE_TAG="latest",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_SDK_LANGUAGE="JAVA",_METADATA_FILE="config/metadata.json" \
--verbosity="debug" .
Deploy the Dataflow template with Cloud Build and build the image and create spec file with flex-template command
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config dataflow-deploy-template.yaml \
--substitutions _REPO_NAME="internal-images",_IMAGE_NAME="dataflow/team-league-java",_IMAGE_TAG="latest",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_SDK_LANGUAGE="JAVA",_FLEX_TEMPLATE_BASE_IMAGE="JAVA11",_METADATA_FILE="config/metadata.json",_JAR="target/teams-league-0.1.0.jar",_FLEX_TEMPLATE_JAVA_MAIN_CLASS="fr.groupbees.application.TeamLeagueApp" \
--verbosity="debug" .
### Run the Dataflow job with Cloud Build
```shell
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config dataflow-run-template.yaml \
--substitutions _JOB_NAME="team-league-java",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_TEMP_LOCATION="gs://mazlum_dev/dataflow/temp",_STAGING_LOCATION="gs://mazlum_dev/dataflow/staging",_SA_EMAIL="sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com",_INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json",_SIDE_INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_team_slogans.json",_TEAM_LEAGUE_DATASET="mazlum_test",_TEAM_STATS_TABLE="team_stat",_JOB_TYPE="team_league_java_ingestion_job",_FAILURE_OUTPUT_DATASET="mazlum_test",_FAILURE_OUTPUT_TABLE="job_failure",_FAILURE_FEATURE_NAME="team_league" \
--verbosity="debug" .
gcloud beta builds triggers create github \
--project=$PROJECT_ID \
--region=$LOCATION \
--name="run-dataflow-unit-tests-java" \
--repo-name=dataflow-java-ci-cd \
--repo-owner=tosun-si \
--branch-pattern=".*" \
--build-config=dataflow-run-tests.yaml \
--include-logs-with-status \
--verbosity="debug"
Build image from Dockerfile with all dependencies image and create spec file using a manual trigger on Github repository
gcloud beta builds triggers create manual \
--project=$PROJECT_ID \
--region=$LOCATION \
--name="deploy-dataflow-template-team-league-java-dockerfile" \
--repo="https://github.com/tosun-si/dataflow-java-ci-cd" \
--repo-type="GITHUB" \
--branch="main" \
--build-config="dataflow-deploy-template-dockerfile-all-dependencies.yaml" \
--substitutions _REPO_NAME="internal-images",_IMAGE_NAME="dataflow/team-league-java",_IMAGE_TAG="latest",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_SDK_LANGUAGE="JAVA",_METADATA_FILE="config/metadata.json" \
--verbosity="debug"
Build image and create spec file with flex-template command using a manual trigger on Github repository
gcloud beta builds triggers create manual \
--project=$PROJECT_ID \
--region=$LOCATION \
--name="deploy-dataflow-template-team-league-java" \
--repo="https://github.com/tosun-si/dataflow-java-ci-cd" \
--repo-type="GITHUB" \
--branch="main" \
--build-config="dataflow-deploy-template.yaml" \
--substitutions _REPO_NAME="internal-images",_IMAGE_NAME="dataflow/team-league-java",_IMAGE_TAG="latest",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_SDK_LANGUAGE="JAVA",_FLEX_TEMPLATE_BASE_IMAGE="JAVA11",_METADATA_FILE="config/metadata.json",_JAR="target/teams-league-0.1.0.jar",_FLEX_TEMPLATE_JAVA_MAIN_CLASS="fr.groupbees.application.TeamLeagueApp" \
--verbosity="debug"
gcloud beta builds triggers create manual \
--project=$PROJECT_ID \
--region=$LOCATION \
--name="run-dataflow-template-team-league-java" \
--repo="https://github.com/tosun-si/dataflow-java-ci-cd" \
--repo-type="GITHUB" \
--branch="main" \
--build-config="dataflow-run-template.yaml" \
--substitutions _JOB_NAME="team-league-java",_METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java.json",_TEMP_LOCATION="gs://mazlum_dev/dataflow/temp",_STAGING_LOCATION="gs://mazlum_dev/dataflow/staging",_SA_EMAIL="sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com",_INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json",_SIDE_INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_team_slogans.json",_TEAM_LEAGUE_DATASET="mazlum_test",_TEAM_STATS_TABLE="team_stat",_JOB_TYPE="team_league_java_ingestion_job",_FAILURE_OUTPUT_DATASET="mazlum_test",_FAILURE_OUTPUT_TABLE="job_failure",_FAILURE_FEATURE_NAME="team_league" \
--verbosity="debug"
Execute the script export_env_variables.sh
:
./scripts/export_env_variables.sh
Run the build_image_and_spec_flex_template.go
script that build the Dockerfile and create the spec file in
the Cloud Storage bucket for Flex Template :
go run build_image_and_spec_flex_template.go
Run the run_flex_template.go
script that run the Flex Template and the Dataflow job :
go run build_image_and_spec_flex_template.go
gcloud iam workload-identity-pools create "gb-github-actions-ci-cd-pool-gcloud" \
--project="gb-poc-373711" \
--location="global" \
--display-name="Pool for CI CD Github actions"
gcloud iam workload-identity-pools providers create-oidc "gb-github-actions-ci-cd-provider-gcloud" \
--project="gb-poc-373711" \
--location="global" \
--workload-identity-pool="gb-github-actions-ci-cd-pool-gcloud" \
--display-name="CI CD Github Actions provider" \
--attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
--issuer-uri="https://token.actions.githubusercontent.com"
Add the Workload Identity Provider to a member of a Service Account with the good role. The provider can authenticate to GCP and impersonate the SA
gcloud iam service-accounts add-iam-policy-binding "sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com" \
--project="gb-poc-373711" \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/projects/975119474255/locations/global/workloadIdentityPools/github-actions-ci-cd-pool/attribute.repository/tosun-si/dataflow-java-ci-cd"
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config create-workload-identity-ci-cd-github-actions-plan.yaml \
--substitutions _ENV=dev,_TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config create-workload-identity-ci-cd-github-actions-apply.yaml \
--substitutions _ENV=dev,_TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .