## Setup

### Create Remote Connection

The CONNECTION resource is the secure, IAM-governed "handshake" between BigQuery and other Google Cloud services, most notably Vertex AI and Google Cloud Storage.

When a connection is created, it is associated with a unique service account. This service account acts as a proxy for BigQuery, and it must be granted the appropriate IAM roles to interact with external services. For instance, to call a Vertex AI model, the connection's service account needs the
Vertex AI User (roles/aiplatform.user) role.

This mechanism ensures that all interactions are authenticated, authorized, and auditable, adhering to the principle of least privilege and satisfying enterprise security requirements

In [None]:
!bq mk --connection --location=US \
    --project_id=$GOOGLE_CLOUD_PROJECT \
    --connection_type=CLOUD_RESOURCE masterclass

In [None]:
SERVICE_ACCT = !bq show --format=prettyjson --connection $GOOGLE_CLOUD_PROJECT.us.masterclass | grep "serviceAccountId" | cut -d '"' -f 4
SERVICE_ACCT_EMAIL = SERVICE_ACCT[-1]
print(SERVICE_ACCT_EMAIL)

In [None]:
import os
PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))
!gcloud projects add-iam-policy-binding --format=none --condition=None $PROJECT_ID --member=serviceAccount:$SERVICE_ACCT_EMAIL --role=roles/aiplatform.user

In [None]:
!gcloud projects get-iam-policy $PROJECT_ID --flatten=bindings --filter=bindings.members:serviceAccount:$SERVICE_ACCT_EMAIL --format='value(bindings.role)'

### Create Dataset & Register Models

The remote model is the primary abstraction layer that makes in-database AI possible. Using a standard CREATE MODEL statement with a REMOTE WITH CONNECTION clause, a data engineer can register an externally hosted model as a callable object within a BigQuery dataset.

This remote model can point to a powerful foundation model hosted on Vertex AI, such as Google's Gemini family or partner models from Anthropic and Mistral AI, or even a custom-trained model deployed on a Vertex AI Endpoint.

The significance of this abstraction cannot be overstated. It transforms what would otherwise be a complex programming task—involving setting up a development environment, using a client library, managing API keys, and handling HTTP requests—into a simple, familiar SQL function call like ML.GENERATE_TEXT or ML.PREDICT.

![GenAI Workflow](https://cloud.google.com/static/bigquery/images/gen-ai-workflow.png)

The data engineer operates entirely within the BigQuery environment, while the platform handles the underlying mechanics of invoking the model, passing the data, and returning the results. This is the core mechanism that "brings the model to the data," eliminating the need for complex MLOps pipelines for a wide range of use cases.

In [None]:
%%bigquery
CREATE SCHEMA IF NOT EXISTS masterclass
OPTIONS (
    description = 'Data Lakehouse Mastery masterclass at the Google Cloud Summit North 2025 in Germany',
    location = 'US');