You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possible to use the same pipeline as the tutorial where Airflow & AirflowScheduler are running locally but pull data from BigQuery?
Yes
How does GCP authentication work in this scenario?
It works pretty much the same as kubeflow example, you need to have GOOGLE_APPLICATION_CREDENTIALS setup in your env(in airflow scheduler and webserver console), and set '--project=xxx' in your beam_pipeline_args
So basically you can use the taxi_pipeline_simple.py example, with example gen change to BigQueryExampleGen, and beam_pipeline_args setup in additional_pipeline_args
'beam_pipeline_args': [
'--runner=DirectRunner',
'--project=xxx',
],
To debug,
You can try the query in BigQuery web ui console to see if it work, and then try the same query in code
(the query in taxi_pipeline_kubeflow.py should work for you too)
Hi there,
After going through the workshop tutorial, I am attempting to build my own pipeline ingesting from BigQuery rather than a CSV.
The only example using BigQuery is
taxi_pipeline_kubeflow.py
which assumes execution on GCP.Is it possible to use the same pipeline as the tutorial where Airflow & AirflowScheduler are running locally but pull data from BigQuery?
How does GCP authentication work in this scenario?
I have tried this snippet along with editing
bigquery_default
under admin>connections in the Airflow webapp with no luckThe text was updated successfully, but these errors were encountered: