# Table of Contents
- [Setup](#setup)
- [Package installs](#package-installs)
- [Imports](#imports)
- [Authentication and Google Cloud settings](#authentication-and-gcp-settings)
- [Global variables](#global-variables)
- [Retail API wrappers](#retail-api-wrappers)
  - [List Products](#list-products)
  - [Search / Browse](#search--browse)
  - [Recommendations](#recommendations)
- [Utils](#utils)
- [Tests](#tests)

# Overview
This notebook demonstrates how to use the Google Cloud Retail API for product search, recommendations, and event analysis. Follow the steps below to set up your environment, authenticate, and explore the API.

# Setup
Let's get started by preparing your environment. We'll begin with authentication and configuration, which are required for all subsequent API calls.

## Package installs
Install all required Python packages. Run this cell only once after starting a new kernel.

In [1]:
%pip install google google-cloud-retail google-cloud-storage google-cloud-bigquery pandas
%pip install google-cloud-bigquery-storage pyarrow tqdm bigquery-magics
%pip install google-cloud-bigquery[pandas] jupyterlab
%pip install fsspec gcsfs
%pip install matplotlib seaborn plotly
%pip install --upgrade ipython-sql

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Authentication and GCP settings
Before we can interact with the Retail API, we need to authenticate with Google Cloud and set up our project context. This ensures all API calls are authorized and associated with the correct GCP project. If authentication fails, you'll be prompted to log in interactively. The `project_id` variable will be used throughout the notebook.

**About `project_id` and Application Default Credentials (ADC):**

- **`project_id`**: This uniquely identifies your Google Cloud project. All API requests, resource creation, and billing are tied to this project. Setting the correct `project_id` ensures your operations are performed in the intended environment and resources are properly tracked.

- **Application Default Credentials (ADC)**: ADC is a mechanism that allows your code to automatically find and use your Google Cloud credentials. Running the `gcloud auth application-default login` command sets up ADC by generating credentials that client libraries (like the Retail API) can use to authenticate API calls on your behalf.

**Why this matters:**  
Proper authentication and project selection are essential for secure, authorized access to Google Cloud resources. Without these, API calls will fail or may affect the wrong project. ADC simplifies credential management, especially in development and notebook environments.

In [2]:
# If needed, run this command to authenticate interactively
!gcloud auth application-default login

https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=zVLOy6BLjvmyMWCw0ORh8icDssZzAp&access_type=offline&code_challenge=0z9qDeHZhCLMQAfmiL-42Vo99qQFm79nSvJ5lmERkx4&code_challenge_method=S256
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=zVLOy6BLjvmyMWCw0ORh8icDssZzAp&access_type=offline&code

## Imports
Import all necessary libraries for API access, data analysis, and visualization.

In [3]:
import google.auth
from google.cloud.retail_v2 import SearchServiceClient, ProductServiceClient, PredictionServiceClient
from google.cloud.retail_v2.types import product, search_service, ListProductsRequest, SearchRequest, PredictRequest, UserEvent
from google.protobuf.field_mask_pb2 import FieldMask
from google.protobuf.json_format import MessageToDict
import pandas as pd
import http.client as http_client
import logging
import re
from IPython.display import display_html
from matplotlib import pyplot as plt
import seaborn as sns

%load_ext bigquery_magics

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

## Authentication and Google Cloud settings
Before you can interact with the Retail API, you must authenticate with Google Cloud and set up your project context. This ensures all API calls are authorized and associated with the correct Google Cloud project. If authentication fails, you'll be prompted to log in interactively. The `project_id` variable will be used throughout the notebook.

**About `project_id` and Application Default Credentials (ADC)**

- **`project_id`**: This uniquely identifies your Google Cloud project. All API requests, resource creation, and billing are tied to this project. Setting the correct `project_id` ensures your operations are performed in the intended environment and resources are properly tracked.

- **Application Default Credentials (ADC)**: ADC is a mechanism that allows your code to automatically find and use your Google Cloud credentials. Running the `gcloud auth application-default login` command sets up ADC by generating credentials that client libraries (like the Retail API) can use to authenticate API calls on your behalf.

**Why this matters**  
Proper authentication and project selection are essential for secure, authorized access to Google Cloud resources. Without these, API calls will fail or may affect the wrong project. ADC simplifies credential management, especially in development and notebook environments.

In [4]:
# Authenticate with Google Cloud and get the default project ID
try:
  credentials, project_id = google.auth.default()
  print(f"Using project ID: {project_id}")
except google.auth.exceptions.DefaultCredentialsError:
  print("Google Cloud Authentication failed. Please configure your credentials.")
  print("You might need to run 'gcloud auth application-default login'")
  project_id = None # Set to None or a default

# If needed, run this command to authenticate interactively
!gcloud auth application-default login

Using project ID: artilekt-vaisc-csb
https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=hGhSluG3Gqf7M1pDFLQPuWSIiDDMNz&access_type=offline&code_challenge=vp-2ZLKOZTDQIFqO1y8YkVHUBOMTlOrpolao8qaHpO0&code_challenge_method=S256
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=hGhSluG3Gqf7M1pDFL

## Global variables
With authentication complete, let's define some key variables that will be used in all your API calls. These include resource names and placements, which specify the context for search and recommendation requests.

**What is a placement?**  
A placement is a configuration resource in the Retail API that determines how and where a model is used for serving search or recommendation results. Placements define the context (such as search, browse, or recommendation) and can be customized for different pages or user experiences.

**Why might you have multiple placements or branches?**  
- You may have different placements for various parts of your site or app, such as a homepage recommendation carousel, a category browse page, or a personalized search bar.
- Multiple branches allow you to manage different versions of your product catalog (e.g., staging vs. production, or A/B testing different product sets).

**Example scenarios:**
- Using a "default_search" placement for general product search, and a "recently_viewed_default" placement for showing users their recently viewed items.
- Having separate branches for testing new product data before rolling it out to all users, or for running experiments with different recommendation models.

In [5]:
# Authenticate with Google Cloud and get the default project ID
try:
  credentials, project_id = google.auth.default()
  print(f"Using project ID: {project_id}")
except google.auth.exceptions.DefaultCredentialsError:
  print("Google Cloud Authentication failed. Please configure your credentials.")
  print("You might need to run 'gcloud auth application-default login'")
  project_id = None # Set to None or a default
  
  
# Define the default placement for search and recommendations
DEFAULT_SEARCH = (
  f"projects/{project_id}/locations/global/catalogs/default_catalog/"
  "placements/default_search" # Use default_search unless you have a specific browse placement
)
RECENTLY_VIEWED_DEFAULT = (
  f"projects/{project_id}/locations/global/catalogs/default_catalog/"
  "placements/recently_viewed_default"
)
DEFAULT_BRANCH = f"projects/{project_id}/locations/global/catalogs/default_catalog/branches/0"

Using project ID: artilekt-vaisc-csb


## Utils
To make your analysis easier, you'll use some utility functions for data conversion and HTTP logging. These will help you convert API responses to Pandas DataFrames for analysis and enable detailed logging for troubleshooting.

In [6]:
import pandas as pd
from google.protobuf.json_format import MessageToDict

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

def json2df(products_list):
  if products_list:
    products_dicts = [dict(sorted(MessageToDict(p._pb).items())) for p in products_list]
    df = pd.json_normalize(products_dicts)
    return df
  else:
    print("No products returned or an error occurred.")
    return pd.DataFrame()

from contextlib import contextmanager

@contextmanager
def http_logging(log_http: bool):
    """
    Context manager to enable/disable HTTP logging for Google API clients.
    Usage:
        with http_logging(log_http):
            # code that needs HTTP logging
    """
    import http.client as http_client
    import logging
    root_logger = logging.getLogger()
    original_http_debuglevel = http_client.HTTPConnection.debuglevel
    original_log_level = root_logger.level
    try:
        if log_http:
            print("\n--- [INFO] Enabling HTTP Logging (forcing REST transport) ---")
            logging.basicConfig()
            root_logger.setLevel(logging.DEBUG)
            http_client.HTTPConnection.debuglevel = 1
            print("--- [INFO] Using REST transport. ---")
        yield
    finally:
        if log_http:
            http_client.HTTPConnection.debuglevel = original_http_debuglevel
            root_logger.setLevel(original_log_level)
            print("--- [INFO] HTTP Logging & Root Log Level Restored ---")

# Retail API wrappers
The following functions wrap the Google Cloud Retail API for listing products, searching, and getting recommendations.

### About Retail API product listing

The Retail API provides several core services for managing and interacting with your product catalog. Two key components referenced here are:

- **ProductServiceClient**: This is the main client for interacting with product resources in your catalog. It allows you to list, create, update, and delete products. When you want to retrieve a list of products (optionally filtered by category or other attributes), you use this client.

- **ListProductsRequest**: This is the request object you pass to the `list_products` method of `ProductServiceClient`. It specifies parameters such as which catalog branch to query, filters to apply, how many products to return per page, and which fields to include in the response.

#### Business use case

Calling the List Products API is essential for:
- **Catalog validation**: Ensuring your product data is correctly ingested and available.
- **Building product listings**: Powering category pages, admin dashboards, or inventory management tools.
- **Data preparation**: Exporting product data for analytics, reporting, or downstream processing.

This API is different from the **Search** and **Predict** APIs:
- **ListProducts** is a direct, unpersonalized retrieval of catalog items, optionally filtered by category or other attributes.
- **Search** provides full-text, faceted, and personalized search results, optimized for end-user queries.
- **Predict** generates personalized recommendations based on user events and context.

#### Key parameters

When creating a `ListProductsRequest` and `ProductServiceClient`, you can specify:

- **parent**: The resource name of the branch to list products from (e.g., `projects/{project_id}/locations/global/catalogs/default_catalog/branches/0`).
- **filter**: A filter string to restrict results (e.g., by category, price, availability).
- **page_size**: The maximum number of products to return in one call.
- **read_mask**: Specifies which fields to include in the response (e.g., all fields or a subset).
- **page_token**: For pagination, to retrieve the next set of results.

When initializing the `ProductServiceClient`, you can also specify:
- **transport**: Choose between `"grpc"` (default) or `"rest"` for debugging/logging.
- **credentials**: Custom authentication credentials if needed.

These parameters give you fine-grained control over how you access and manage your product catalog, making the List Products API a foundational tool for catalog operations.

In [7]:
def list_products(
  category: str,
  page_size: int = 10,
  log_http: bool = False
):
  """
  Lists products using the provided parameters to create a ListProductsRequest object.
  Args:
    category (str): Category name to filter products. If empty, returns all products.
    page_size (int): Number of products to return per page.
    log_http (bool): Enable HTTP logging for debugging.
  Returns:
    list: List of product objects or None if an error occurs.
  Raises:
    Exception: If the API call fails.
  """
  # Create a ProductServiceClient and build the request
  product_client = ProductServiceClient(transport="rest" if log_http else None)
  filter_string = ""  # No category filter by default
  read_mask = FieldMask(paths=["*"])
  try:
    with http_logging(log_http):
      list_products_request = ListProductsRequest(
        parent=DEFAULT_BRANCH,
        filter=filter_string,
        page_size=page_size,
        read_mask=read_mask
      )
      print("--- [INFO] Calling Retail API ListProducts... ---")
      products = []
      for product in product_client.list_products(request=list_products_request):
        # Optionally filter by category
        if category:
          product_categories = getattr(product, "categories", [])
          if isinstance(product_categories, str):
            product_categories = [product_categories]
          if category not in product_categories:
            continue
        products.append(product)
      if not products:
        print("No products found.")
      # else:
        # print(f"Found {len(products)} products.")
      return products
  except Exception as e:
    print(f"An error occurred during the list_products request: {e}")
    return None

# Example usage

# # List all products (no category filter)
# products = list_products(category="", page_size=50)
# print(f"Products (no category filter): {products}")

# # List products that have "Bags" in their categories
# products_bags = list_products(category="Bags", page_size=50)
# print(f"Products in 'Bags' category: {products_bags}")

# # Enable HTTP logging for debugging
# products_debug = list_products(category="Bags", page_size=50, log_http=True)
# print(f"Products with HTTP logging: {products_debug}")

## Search/Browse
With a list of products in hand, let's move on to searching and browsing. The Retail API supports full-text search and faceted browsing, allowing you to filter by category, use search queries, and personalize results with a visitor ID. This is the foundation for building search bars, category pages, and personalized shopping experiences. You'll use the results here for comparison and further analysis later in the lab.

Key options:
- `category`: Filter search results by category.
- `query`: Enter a search term (e.g., "shoes").
- `visitor_id`: Track user sessions for personalization.
- `page_size`: Number of results per page.

---

### About `SearchServiceClient` and `SearchRequest`

**`SearchServiceClient`** is the main client for interacting with the Retail API's search functionality. It is used to send search requests and receive results, supporting both full-text search and faceted browsing.

**`SearchRequest`** is the request object passed to the client. It supports a variety of parameters:

- **`placement`**: Specifies the serving configuration (e.g., default search placement).
- **`visitor_id`**: Unique identifier for the user/session, enabling personalization.
- **`query`**: The search term entered by the user (e.g., "shoes"). If omitted, the API performs a "browse" (category listing) instead of a search.
- **`filter`**: Restricts results by attributes such as category, price, or custom fields (e.g., `categories: ANY("Bags")`).
- **`page_size`**: Number of results to return per page.
- **`page_token`**: For pagination, to fetch additional results.
- **`order_by`**: Sort results by a specific field (e.g., price).
- **`facet_specs`**: Request facet counts for fields (e.g., brand, color).
- **`boost_spec`**: Boost or bury certain products in the results.
- **`params`**: Additional custom parameters for advanced use cases.
- **`page_categories`**: Used to provide context for the page (e.g., category pages).

#### When to use which parameters?

- **Search (with `query`)**:  
  Use when the user enters a search term in a search bar.  
  Example: `query="running shoes"`, `category="Shoes"`, `visitor_id="user123"`

- **Browse (without `query`)**:  
  Use for category or collection pages, where you want to list products by category or attribute, but not by a search term.  
  Example: `category="Bags"`, `query=""`, `visitor_id="user123"`

- **Personalization**:  
  Always provide a unique `visitor_id` to enable personalized ranking and recommendations.

- **Faceting and filtering**:  
  Use `filter` and `facet_specs` to enable users to refine results by attributes like brand, price, or color.

#### Use cases and scenarios

- **Search bar**:  
  User types "sneakers" in the search bar.  
  → Set `query="sneakers"`, optionally filter by category or price.

- **Category page (Browse)**:  
  User navigates to the "Bags" category.  
  → Set `category="Bags"`, leave `query=""` (empty string).

- **Personalized results**:  
  Always pass a consistent `visitor_id` for each user/session to get personalized search and browse results.

- **Faceted navigation**:  
  User selects filters like "brand: Nike" and "color: red".  
  → Add these as filters in the `filter` parameter.

---

**Formal difference: Search vs. Browse**

- **Search**:  
  - `query` is set (non-empty string).
  - Returns results matching the search term, ranked by relevance.
  - Used for keyword-based product discovery.

- **Browse**:  
  - `query` is empty or omitted.
  - Returns products based on category or other filters, not keyword relevance.
  - Used for category pages, collection pages, or when users are exploring rather than searching.

*Although both use the same API endpoint and client, the presence or absence of the `query` parameter formally distinguishes a "search" from a "browse" operation.*

In [8]:
def search_products(
  category: str,
  page_size: int = 10,
  query: str = "",
  visitor_id: str = "noname",
  log_http: bool = False
):
  """
  Executes a search using the provided parameters to create a SearchRequest object.
  Args:
    category (str): Category name to filter products.
    page_size (int): Number of products to return per page.
    query (str): Search query string.
    visitor_id (str): Visitor ID for the search event.
    log_http (bool): Enable HTTP logging for debugging.
  Returns:
    SearchResponse or None if an error occurs.
  Raises:
    Exception: If the API call fails.
  """
  try:
    with http_logging(log_http):
      search_client = SearchServiceClient(transport="rest" if log_http else None)
      search_request = SearchRequest(
        placement=DEFAULT_SEARCH,
        visitor_id=visitor_id,
        page_size=page_size,
        filter=f'categories: ANY("{category}")' if category else "",
        query=query,
        page_categories=[category]
      )
      print("--- [INFO] Calling Retail API Search... ---")
      search_response = search_client.search(search_request)
      print("\n---search response---")
      if not search_response.results:
        print("The search returned no matching results.")
      else:
        print(f"Found {search_response.total_size} results. Showing first {len(search_response.results)}:")
      return search_response
  except Exception as e:
    print(f"An error occurred during the search request: {e}")
    return None

## Recommendations
Now that you've explored and searched the catalog, let's see how the Retail API can generate personalized product recommendations. By providing user event data (such as a home-page view or product click), you can get tailored suggestions for each visitor. This is the engine behind features such as "recently viewed" or "you may also like". The recommendations generated here can be used to enhance user experience and drive engagement.

**About `PredictionServiceClient` and `PredictRequest`:**

- **`PredictionServiceClient`** is the main client for interacting with the Retail API's recommendation (predict) service. It is used to request product recommendations based on user events and context.
- **`PredictRequest`** is the request object you pass to the client. It specifies the placement (type of recommendation), user event details, number of results, and additional parameters.

**Key parameters you can pass:**
- **`placement`**: Specifies the recommendation model and context (e.g., "recently_viewed_default", "similar_items").
- **`user_event`**: A dictionary or object describing the user's action (e.g., home-page view, product detail view, add-to-cart) and context (visitor ID, product IDs, etc.).
- **`visitor_id`**: Unique identifier for the user/session, enabling personalized recommendations.
- **`page_size`**: Number of recommendations to return.
- **`params`**: Additional custom parameters (e.g., `{"returnProduct": True}` to include product details in the response).

**Use cases and scenarios:**
- **Recently Viewed**: Show products the user has recently interacted with by sending a "home-page-view" or "detail-page-view" event.
- **You May Also Like**: Recommend similar or complementary products based on a user's browsing or purchase history.
- **Personalized Homepage**: Generate a set of recommendations for a returning visitor based on their past behavior.
- **Cart Recommendations**: Suggest products to add to the cart based on current cart contents and user profile.

Key options:
- `user_event`: Describe the user's action and context.
- `visitor_id`: Identify the user/session.
- `page_size`: Number of recommendations to return.

By customizing the `user_event` and `placement`, you can power a variety of recommendation scenarios tailored to your users' needs.

In [9]:
def predict_products(
  user_event: dict = None,
  page_size: int = 10,
  visitor_id: str = "noname",
  log_http: bool = False
):
  """
  Calls the Retail API Predict service to get product recommendations.
  Args:
    user_event (dict): User event dict. Must include at least 'visitor_id'.
    page_size (int): Number of results to return.
    visitor_id (str): Visitor ID for the event.
    log_http (bool): Enable HTTP logging for debugging.
  Returns:
    PredictResponse or None if an error occurs.
  Raises:
    Exception: If the API call fails.
  """
  placement = RECENTLY_VIEWED_DEFAULT
  if user_event is None:
    user_event = {
      "visitor_id": visitor_id,
      "event_type": "home-page-view"
    }
  try:
    with http_logging(log_http):
      predict_client = PredictionServiceClient(transport="rest" if log_http else None)
      user_event_obj = UserEvent(**user_event)
      predict_request = PredictRequest(
        placement=placement,
        user_event=user_event_obj,
        page_size=page_size,
        params={
          "returnProduct": True
        }
      )
      print("--- [INFO] Calling Retail API Predict... ---")
      predict_response = predict_client.predict(request=predict_request)
      if not predict_response.results:
        print("The predict call returned no recommendations.")
      else:
        print(f"Found {len(predict_response.results)} recommendations.")
      return predict_response
  except Exception as e:
    print(f"An error occurred during the predict request: {e}")
    return None

# Tests

The previous sections focused on building convenience wrappers around the Retail APIs. In the following cells, you test these wrappers in practice. The first two tests are designed to compare the results of the ListProducts and Search APIs. This comparison demonstrates that the 'browse' (search without a query) returns results in a different order, as determined by the underlying ML model evaluation.

## List products
Example: List products in the 'Bags' category and display the first 5 results.

In [10]:
# Invoke list_products with 'Bags' category
products_list = list_products(category="Bags", page_size=50)
print("Results from list_products (category='Bags'):")
products_list_df = json2df(products_list[:5])
display(products_list_df)

--- [INFO] Calling Retail API ListProducts... ---
Results from list_products (category='Bags'):


Unnamed: 0,availability,categories,id,images,name,primaryProductId,title,type,uri,priceInfo.currencyCode,priceInfo.price,priceInfo.originalPrice
0,IN_STOCK,[Bags],GGCOGBJC100999,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGCOGBJC100999.jpg'}],projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGCOGBJC100999,GGCOGBJC100999,#IamRemarkable Tote,PRIMARY,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/IamRemarkable+Tote,USD,8.0,8.0
1,OUT_OF_STOCK,[Bags],GGCOGBJD157199,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGCOGBJD157199.jpg'}],projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGCOGBJD157199,GGCOGBJD157199,Google Land & Sea Tote Bag,PRIMARY,https://shop.googlemerchandisestore.com/Google+Redesign/Lifestyle/Google+Land+and+Sea+Tote+Bag,USD,20.0,20.0
2,IN_STOCK,[Bags],GGOEABRB130499,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEABRB130499.jpg'}],projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEABRB130499,GGOEABRB130499,Android Iconic Backpack,PRIMARY,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Android+Iconic+Backpack,USD,25.0,25.0
3,IN_STOCK,[Bags],GGOEGBBC122499,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBBC122499.jpg'}],projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBBC122499,GGOEGBBC122499,Google Campus Bike Carry Pouch,PRIMARY,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Google+Google+Campus+Bike+Carry+Pouch,USD,8.0,8.0
4,OUT_OF_STOCK,[Bags],GGOEGBBD121999,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBBD121999.jpg'}],projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBBD121999,GGOEGBBD121999,Google Striped Penny Pouch,PRIMARY,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Google+Google+Striped+Penny+Pouch,USD,9.0,9.0


## Browse products
Example: Search for products in the 'Bags' category and display the first 5 results.

In [11]:
# Invoke search_products with 'Bags' category
products_search = list(search_products(category="Bags", page_size=50))
print("Results from search_products (category='Bags'):")
products_search_df = json2df(products_search[:5])
display(products_search_df)

--- [INFO] Calling Retail API Search... ---

---search response---
Found 44 results. Showing first 44:
Results from search_products (category='Bags'):


Unnamed: 0,id,product.name,product.categories,product.title,product.priceInfo.currencyCode,product.priceInfo.price,product.priceInfo.originalPrice,product.availability,product.uri,product.images
0,GGOEGBJC122399,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBJC122399,[Bags],Google Campus Bike Tote Navy,USD,11.0,11.0,IN_STOCK,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Google+Google+Campus+Bike+Tote+Navy,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBJC122399.jpg'}]
1,GGOEGBBJ131999,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBBJ131999,[Bags],Google Cork Pencil Pouch,USD,8.0,8.0,IN_STOCK,https://shop.googlemerchandisestore.com/Google+Redesign/Office/Google+Cork+Pencil+Pouch,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBBJ131999.jpg'}]
2,GGOEGBJD143099,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBJD143099,[Bags],Google Austin Campus Tote,USD,11.0,11.0,IN_STOCK,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Google+Austin+Campus+Tote,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBJD143099.jpg'}]
3,GGOEGBJJ171099,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBJJ171099,[Bags],Google Campus Tote,USD,16.0,16.0,OUT_OF_STOCK,https://shop.googlemerchandisestore.com/Google+Redesign/Google+Campus+Tote,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/noimage.jpg'}]
4,GGCOGBJD157199,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGCOGBJD157199,[Bags],Google Land & Sea Tote Bag,USD,20.0,20.0,OUT_OF_STOCK,https://shop.googlemerchandisestore.com/Google+Redesign/Lifestyle/Google+Land+and+Sea+Tote+Bag,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGCOGBJD157199.jpg'}]


## Compare results
Compare the outputs of list_products and search_products side by side.

In [12]:
import pandas as pd
from IPython.display import display_html

max_rows = 5  # Set the maximum number of rows to display

# Create side-by-side comparison of products from list_products and search_products
products_list_df = json2df(list_products(category="Bags", page_size=50)[:max_rows])
products_search_df = json2df(list(search_products(category="Bags", page_size=50))[:max_rows])

# Extract only the 'title' column from each dataframe and rename for clarity
df1 = products_list_df[['id', 'title']].rename(columns={'id': 'list_products_id', 'title': 'list_products_title'})
df2 = products_search_df[['id', 'product.title']].rename(columns={'id': 'search_products_id', 'product.title': 'search_products_title'})

# Reset index to align rows side by side
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)

# Concatenate side by side
side_by_side = pd.concat([df1, df2], axis=1)

display(side_by_side)

--- [INFO] Calling Retail API ListProducts... ---
--- [INFO] Calling Retail API Search... ---

---search response---
Found 44 results. Showing first 44:


Unnamed: 0,list_products_id,list_products_title,search_products_id,search_products_title
0,GGCOGBJC100999,#IamRemarkable Tote,GGOEGBJC122399,Google Campus Bike Tote Navy
1,GGCOGBJD157199,Google Land & Sea Tote Bag,GGOEGBBJ131999,Google Cork Pencil Pouch
2,GGOEABRB130499,Android Iconic Backpack,GGOEGBJD143099,Google Austin Campus Tote
3,GGOEGBBC122499,Google Campus Bike Carry Pouch,GGOEGBJJ171099,Google Campus Tote
4,GGOEGBBD121999,Google Striped Penny Pouch,GGCOGBJD157199,Google Land & Sea Tote Bag


## Get product details from BigQuery
Example: Query product details for a specific product ID from BigQuery.

The `%%bigquery` command is a "cell magic" available in IPython and Jupyter Notebook environments when the `google-cloud-bigquery` library and its `bigquery_magics` extension are loaded (usually via `%load_ext bigquery_magics`).

**What it is:**
Cell magics are special commands that start with `%%` and apply to the entire cell. The `%%bigquery` magic allows you to write and execute Google BigQuery SQL queries directly within a code cell, just as if you were using the BigQuery console or another SQL tool.

**How it works:**
When you execute a cell starting with `%%bigquery`:
1.  The magic command captures the rest of the cell's content as a SQL query.
2.  It uses the currently authenticated Google Cloud credentials (often Application Default Credentials, as set up earlier) and the default project ID to connect to BigQuery.
3.  It sends the SQL query to your BigQuery project for execution.
4.  By default, the results of the query are returned as a Pandas DataFrame and displayed in the notebook. You can also assign the result to a Python variable.

**Usefulness in data analysis:**
The `%%bigquery` magic is extremely useful for data analysis workflows within Jupyter Notebooks for several reasons:
*   **Simplicity:** It allows data scientists and analysts to query vast datasets stored in BigQuery using familiar SQL syntax without writing extensive Python boilerplate code for client library setup, query execution, and result parsing.
*   **Direct integration:** Results are typically loaded directly into Pandas DataFrames, making it seamless to transition from data retrieval to data manipulation, analysis, and visualization using Python's rich data science ecosystem.
*   **Iterative exploration:** It facilitates quick, iterative exploration of data. You can rapidly run queries, inspect results, refine your SQL, and re-run, all within the interactive notebook environment.
*   **Reproducibility:** SQL queries embedded in notebook cells are saved as part of the notebook, making the data retrieval step clear and reproducible.
*   **Parameterization:** You can use Python variables within your SQL queries by prefixing them with `$` (e.g., `SELECT * FROM my_table WHERE column = $python_variable`), allowing for dynamic query generation.

In the cell below, `%%bigquery` is used to fetch all columns (`SELECT *`) for a product with a specific `id` from the `retail.products` table in BigQuery. The result will be displayed as a table in the notebook output.

In [None]:
%%bigquery
SELECT * FROM `artilekt-vaisc-csb.retail.products` where id = 'GGOEGBJD122199'

KeyboardInterrupt: 

## Identify user events associated with the products

To effectively test the prediction functionality, you need relevant `visitorId`s. A good way to find these is by analyzing historical user events to see which visitors interacted with the products we're interested in.

In this step, you focus on identifying visitors who have viewed products within the 'Bags' category. You previously listed and searched for products in this category (in cells 22, 24, and 26), and will now use those product IDs to filter a dataset of historical user events. This will give you a list of `visitorId`s that are likely to be relevant for generating personalized recommendations related to 'Bags'.

To retrieve this data, you access a dump of user events that is stored in a Google Cloud Storage bucket. The following Python cell demonstrates how we can directly read this data (in JSON format) into a Pandas DataFrame. This showcases the versatility of Jupyter Notebooks for data analysis and testing, allowing us to seamlessly integrate data retrieval from cloud storage, data manipulation with Pandas, and interaction with APIs like the Retail API, all within a single environment.

In [15]:
df_events = pd.read_json('gs://artilekt-vaisc-csb_retail/recent_retail_events.json', lines=True)
df_events.head()

# Assume 'bag_products' is a list of product IDs you want to filter for
bag_products = ['GGCOGBJD157199', 'GGOEABRB130499', 'GGOEGBBJ131999']

# Extract product id from the first productDetails entry for each row
df_events['product_id'] = df_events['productDetails'].apply(
  lambda x: x[0]['product']['id'] if isinstance(x, list) and x and 'product' in x[0] and 'id' in x[0]['product'] else None
)

# Filter rows where product_id is in bag_products
if 'product_id' not in df_events.columns:
    print("'product_id' column not found in df_events. Check the structure of your input data.")
else:
    filtered = df_events[df_events['product_id'].isin(bag_products)][['visitorId', 'product_id']]
    
filtered.head()

Unnamed: 0,visitorId,product_id
11,GA1.3.494294221.1619828856,GGOEABRB130499
297,GA1.3.294040182.1619872768,GGOEABRB130499
468,GA1.3.847874442.1619893172,GGOEABRB130499
706,"""""",GGOEABRB130499
740,GA1.3.1623205614.1614051318,GGOEABRB130499


## Recommend a product
Example: Get product recommendations for a specific visitor.

In [16]:
# Get the first visitorId from the filtered DataFrame
visitor_id = filtered['visitorId'].iloc[0]

# Invoke predict_products using the fetched visitor_id
predict_response = predict_products(visitor_id=visitor_id, page_size=5)

print("Results from predict_products:")
recommendations_df = json2df(predict_response.results)
display(recommendations_df)


--- [INFO] Calling Retail API Predict... ---
Found 2 recommendations.
Results from predict_products:


Unnamed: 0,id,metadata.product.primaryProductId,metadata.product.name,metadata.product.categories,metadata.product.title,metadata.product.id,metadata.product.@type,metadata.product.uri,metadata.product.images,metadata.product.type,metadata.product.availability,metadata.product.priceInfo.currencyCode,metadata.product.priceInfo.price,metadata.product.priceInfo.originalPrice
0,GGOEGBRJ114199,GGOEGBRJ114199,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEGBRJ114199,[Bags],Google Utility BackPack,GGOEGBRJ114199,type.googleapis.com/google.cloud.retail.v2main.Product,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Google+Utility+BackPack,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGBRJ114199.jpg'}],PRIMARY,IN_STOCK,USD,120.0,120.0
1,GGOEABRB130499,GGOEABRB130499,projects/72145653914/locations/global/catalogs/default_catalog/branches/0/products/GGOEABRB130499,[Bags],Android Iconic Backpack,GGOEABRB130499,type.googleapis.com/google.cloud.retail.v2main.Product,https://shop.googlemerchandisestore.com/Google+Redesign/Bags/Android+Iconic+Backpack,[{'uri': 'https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEABRB130499.jpg'}],PRIMARY,IN_STOCK,USD,25.0,25.0
