# Overview
TBD

# Setup
Let's get started by preparing your environment. We'll begin with authentication and configuration, which are required for all subsequent API calls.

## Package installs
Install all required Python packages. Run this cell only once after starting a new kernel.

In [None]:
%pip install google google-cloud-retail google-cloud-storage google-cloud-bigquery pandas
%pip install google-cloud-bigquery-storage pyarrow tqdm bigquery-magics
%pip install google-cloud-bigquery[pandas] jupyterlab
%pip install fsspec gcsfs
%pip install matplotlib seaborn plotly
%pip install --upgrade ipython-sql

## Authentication and GCP settings
Before we can interact with the Retail API, we need to authenticate with Google Cloud and set up our project context. This ensures all API calls are authorized and associated with the correct GCP project. If authentication fails, you'll be prompted to log in interactively. The `project_id` variable will be used throughout the notebook.

**About `project_id` and Application Default Credentials (ADC):**

- **`project_id`**: This uniquely identifies your Google Cloud project. All API requests, resource creation, and billing are tied to this project. Setting the correct `project_id` ensures your operations are performed in the intended environment and resources are properly tracked.

- **Application Default Credentials (ADC)**: ADC is a mechanism that allows your code to automatically find and use your Google Cloud credentials. Running the `gcloud auth application-default login` command sets up ADC by generating credentials that client libraries (like the Retail API) can use to authenticate API calls on your behalf.

**Why this matters:**  
Proper authentication and project selection are essential for secure, authorized access to Google Cloud resources. Without these, API calls will fail or may affect the wrong project. ADC simplifies credential management, especially in development and notebook environments.

In [1]:
import subprocess

try:
  # Try to get an access token
  subprocess.check_output(
    ['gcloud', 'auth', 'application-default', 'print-access-token'],
    stderr=subprocess.STDOUT
  )
  print("Already authenticated with Application Default Credentials.")
except subprocess.CalledProcessError:
  # If it fails, prompt for login
  print("No valid ADC found. Running interactive login...")
  !gcloud auth application-default login

Already authenticated with Application Default Credentials.


## Imports
Import all necessary libraries for API access, data analysis, and visualization.

In [None]:
from google.cloud.retail_v2 import SearchServiceClient, ProductServiceClient, PredictionServiceClient
from google.cloud.retail_v2.types import product, search_service, ListProductsRequest, SearchRequest, PredictRequest, UserEvent
from google.protobuf.field_mask_pb2 import FieldMask
from google.protobuf.json_format import MessageToDict
import pandas as pd
import http.client as http_client
import logging
import re
from IPython.display import display_html
from matplotlib import pyplot as plt
import seaborn as sns

# enabling BigQuery magics
%load_ext bigquery_magics

# configuring default optoins for pandas
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

## Global variables
With authentication complete, let's define some key variables that will be used in all our API calls. These include resource names and placements, which specify the context for search and recommendation requests.

**What is a 'placement'?**  
A placement is a configuration resource in the Retail API that determines how and where a model is used for serving search or recommendation results. Placements define the context (such as search, browse, or recommendation) and can be customized for different pages or user experiences.

**Why might you have multiple placements or branches?**  
- You may have different placements for various parts of your site or app, such as a homepage recommendation carousel, a category browse page, or a personalized search bar.
- Multiple branches allow you to manage different versions of your product catalog (e.g., staging vs. production, or A/B testing different product sets).

**Example scenarios:**
- Using a "default_search" placement for general product search, and a "recently_viewed_default" placement for showing users their recently viewed items.
- Having separate branches for testing new product data before rolling it out to all users, or for running experiments with different recommendation models.

In [3]:
import google.auth
import google.auth.exceptions

# Authenticate with Google Cloud and get the default project ID
try:
  credentials, project_id = google.auth.default()
  print(f"Using project ID: {project_id}")
except google.auth.exceptions.DefaultCredentialsError:
  print("Google Cloud Authentication failed. Please configure your credentials.")
  print("You might need to run 'gcloud auth application-default login'")
  project_id = None # Set to None or a default
  
  
# Define the default placement for search and recommendations
DEFAULT_SEARCH = (
  f"projects/{project_id}/locations/global/catalogs/default_catalog/"
  "placements/default_search" # Use default_search unless you have a specific browse placement
)
RECENTLY_VIEWED_DEFAULT = (
  f"projects/{project_id}/locations/global/catalogs/default_catalog/"
  "placements/recently_viewed_default"
)
DEFAULT_BRANCH = f"projects/{project_id}/locations/global/catalogs/default_catalog/branches/0"

Using project ID: artilekt-vaisc-csb


## Utils
To make our analysis easier, we'll use some utility functions for data conversion and HTTP logging. These will help us convert API responses to Pandas DataFrames for analysis and enable detailed logging for troubleshooting.

In [None]:
import pandas as pd
from google.protobuf.json_format import MessageToDict

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

def json2df(products_list):
  if products_list:
    products_dicts = [dict(sorted(MessageToDict(p._pb).items())) for p in products_list]
    df = pd.json_normalize(products_dicts)
    return df
  else:
    print("No products returned or an error occurred.")
    return pd.DataFrame()

from contextlib import contextmanager

@contextmanager
def http_logging(log_http: bool):
    """
    Context manager to enable/disable HTTP logging for Google API clients.
    Usage:
        with http_logging(log_http):
            # code that needs HTTP logging
    """
    import http.client as http_client
    import logging
    root_logger = logging.getLogger()
    original_http_debuglevel = http_client.HTTPConnection.debuglevel
    original_log_level = root_logger.level
    try:
        if log_http:
            print("\n--- [INFO] Enabling HTTP Logging (forcing REST transport) ---")
            logging.basicConfig()
            root_logger.setLevel(logging.DEBUG)
            http_client.HTTPConnection.debuglevel = 1
            print("--- [INFO] Using REST transport. ---")
        yield
    finally:
        if log_http:
            http_client.HTTPConnection.debuglevel = original_http_debuglevel
            root_logger.setLevel(original_log_level)
            print("--- [INFO] HTTP Logging & Root Log Level Restored ---")