# Abstract

Ad revenue has become the primary lifeline for many modern companies. To optimize this stream, platforms don't just show ads at random; they enlist hundreds of engineers and scientists to build algorithms that maximize payouts without compromising the user experience. These systems process millions of interactions to capture the predictive signals that drive business outcomes. Recently, the ad-tech industry has begun pivoting from traditional machine learning toward more sophisticated methods like graphs. In this post, we examine **Pinterest's** history, strategy, and recent innovations. We then present a prototype Graph Neural Network (GNN) designed to optimize ad relevance, using a Graph Attention Network (GAT) framework trained on synthetic user data.

# Introduction

![](https://cdn.britannica.com/34/267034-050-91CC3AFB/Pinterest-logo.jpg)

Pinterest was founded in 2010 by Ben Silbermann, Paul Sciarra, and Evan Sharp. The idea came from a "failed" app idea called Tote, which was supposed to help users shop for items via their mobile devices. From the project, Ben and Paul found that many of their users enjoyed saving the items (images) and exploring items via pictures. Essentially, they discovered that users enjoyed a **visual search** experience. This became the vision for Pinterest. That is, they brought in Evan to help design a digital search experience that felt like creating your own digital pinboard. 

After just a year of official operation, Pinterest emerged as one of the most popular platforms in the US, surpassing 10 million monthly active users by 2012. While initially viewed as a niche social media site where users curated and shared "boards," the company eventually pivoted. They moved away from the traditional "social network" label to brand themselves as **the internet’s premier visual search engine**.

Following the lead of search giants like Google, Pinterest introduced **promoted pins** in 2013. This allowed companies to promote content via **ad auctions**, turning user discovery into a scalable business model. By the time the platform went public in 2019, Pinterest was grossing nearly $1.2 billion in annual ad revenue.

Today, that number is nearing \$4.2 billion in revenue from ads. While Pinterest does have other streams of revenue, their primary revenue driver is their ad platform. Hence, Pinterest spends large amounts of money into the infrastructure and R&D of their largest moneymaker.

# Ad Auctions

In [1]:
#Import libraries
import polars as pl
import numpy as np 
import matplotlib.pyplot as plt

import torch
import torch.nn.functional as F
from torch_geometric.data import HeteroData
from torch_geometric import transforms as T
from torch_geometric.nn import HeteroConv, GATConv

from sklearn.metrics import roc_auc_score

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
#Generate user table
num_users = 10_000
age_groups = ["18-25", "26-34", "35-43", "44+"]
age_probs = [.5, .3, .1, .1]
interests = ["Fashion", "Food", "Exercise", "Home", "Animals", "Sports", "Movies"]

users = pl.DataFrame({
    "user_id": np.arange(num_users),
    "age_group": np.random.choice(age_groups, p=age_probs, size=num_users),
    "main_interest": np.random.choice(interests, size=num_users)
})

In [3]:
#Generate item table
num_items = 50_000

items = pl.DataFrame({
    "item_id": np.arange(num_items),
    "category": np.random.choice(interests, size=num_items),
    "is_ad": np.random.choice([True, False], size=num_items, p=[.1, .9])
})

In [4]:
def gen_edges(user_info, items_info, n_obs, r_seed=42):
    np.random.seed(r_seed)
    interests = items_info['category'].unique().to_list()
    
    user_activity_prob = np.random.beta(a=2, b=7, size=user_info.shape[0])
    user_activity_prob /= user_activity_prob.sum()
    user_ids = np.random.choice(user_info['user_id'], size=n_obs, p=user_activity_prob)
    
    user_interest_map = dict(zip(user_info['user_id'], user_info['main_interest']))
    user_main_interests = np.array([user_interest_map[uid] for uid in user_ids])
    
    boost_threshold = 0.4 / 1.4
    selector = np.random.rand(n_obs)
    random_cats = np.random.choice(interests, size=n_obs)
    selected_cats = np.where(selector < boost_threshold, user_main_interests, random_cats)
    
    cat_item_map = {}
    for cat in interests:
        cat_df = items_info.filter(pl.col("category") == cat)
        cat_item_map[cat] = {
            "ids": cat_df["item_id"].to_numpy(),
            "is_ad": cat_df["is_ad"].to_numpy()
        }
    
    target_ids = np.empty(n_obs, dtype=np.int64)
    is_ad_flags = np.empty(n_obs, dtype=bool)
    
    for cat in interests:
        mask = (selected_cats == cat)
        count = mask.sum()
        if count > 0:
            pool = cat_item_map[cat]
            rand_idx = np.random.randint(0, len(pool["ids"]), size=count)
            target_ids[mask] = pool["ids"][rand_idx]
            is_ad_flags[mask] = pool["is_ad"][rand_idx]

    return pl.DataFrame({
        "user_id": user_ids,
        "target_id": target_ids,
        "category": selected_cats,
        "is_ad": is_ad_flags
    })

In [5]:
#Generate edges
df = gen_edges(users, items, 200_000)

In [6]:
#Get new user ids
user_mapping = df.select("user_id").unique().with_row_index("user_gnn_id")

In [7]:
#Get new non ad ids
non_ad_mapping = df.filter(pl.col("is_ad") == False).select("target_id").unique().with_row_index("non_ad_ids")

In [8]:
#Get ad ids
ad_mapping = df.filter(pl.col("is_ad") == True).select("target_id").unique().with_row_index("ad_ids")

In [9]:
#Join user mapping
df = df.join(user_mapping, on='user_id', how='left')

In [10]:
#Create two new dfs for ads and non ads
df_ads = df.filter(pl.col("is_ad") == True).join(ad_mapping, on='target_id', how='left')
df_non_ads = df.filter(pl.col("is_ad") == False).join(non_ad_mapping, on='target_id', how='left')

In [11]:
#Convert non ads to tensor
non_ad_x = torch.from_numpy(df_non_ads.select("non_ad_ids", "category").to_dummies("category").sort("non_ad_ids").to_numpy().astype(np.float32))

In [12]:
#Convert ads to tensor
ad_x = torch.from_numpy(df_ads.select("ad_ids", "category").to_dummies("category").sort("ad_ids").to_numpy().astype(np.float32))

In [13]:
#Generate user one tensor
user_x = torch.from_numpy(
    users.select(["user_id", "age_group", "main_interest"])
    .to_dummies(["age_group", "main_interest"])
    .sort("user_id")
    .drop("user_id")
    .to_numpy()
    .astype(np.float32)
)

In [14]:
#Create user-non ad edge tensor
non_ad_edge = torch.from_numpy(df_non_ads.select("user_gnn_id", "non_ad_ids").to_numpy().astype(int).T).to(torch.long)

In [15]:
#Create user-ad edge tensor
ad_edge = torch.from_numpy(df_ads.select("user_gnn_id", "ad_ids").to_numpy().astype(int).T).to(torch.long)

In [16]:
#Build hetero object and nodes
hetero_data = HeteroData()
hetero_data['user'].x = user_x
hetero_data['non_ad'].x = non_ad_x
hetero_data['ad'].x = ad_x

In [17]:
#Add edges to hetero object
hetero_data['user', 'ad'].edge_index = ad_edge
hetero_data['user', 'non_ad'].edge_index = non_ad_edge
hetero_data = T.ToUndirected()(hetero_data)

In [18]:
#Prepare data for train test split
split_prep = T.RandomLinkSplit(
    num_val=0.1,
    num_test=0.1,
    is_undirected=True,
    add_negative_train_samples=True,
    edge_types=[('user', 'ad')],
    rev_edge_types=[('ad', 'rev_user')]
)

In [19]:
#Get train, val, test split
train, val, test = split_prep(hetero_data)

In [20]:
#Define GAT model
class AdRelevanceGAT(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels, num_heads=4):
        super().__init__()
        
        self.conv1 = HeteroConv({
            ('user', 'to', 'ad'): GATConv((-1, -1), hidden_channels, heads=num_heads, add_self_loops=False),
            ('user', 'to', 'non_ad'): GATConv((-1, -1), hidden_channels, heads=num_heads, add_self_loops=False),
            ('ad', 'rev_to', 'user'): GATConv((-1, -1), hidden_channels, heads=num_heads, add_self_loops=False),
            ('non_ad', 'rev_to', 'user'): GATConv((-1, -1), hidden_channels, heads=num_heads, add_self_loops=False),
        }, aggr='sum')

        self.conv2 = HeteroConv({
            ('user', 'to', 'ad'): GATConv((-1, -1), out_channels, heads=1, add_self_loops=False),
            ('user', 'to', 'non_ad'): GATConv((-1, -1), out_channels, heads=1, add_self_loops=False),
            ('ad', 'rev_to', 'user'): GATConv((-1, -1), out_channels, heads=1, add_self_loops=False),
            ('non_ad', 'rev_to', 'user'): GATConv((-1, -1), out_channels, heads=1, add_self_loops=False),
        }, aggr='sum')

    def forward(self, x_dict, edge_index_dict):
        x_dict = self.conv1(x_dict, edge_index_dict)
        x_dict = {key: F.elu(x) for key, x in x_dict.items()}
        
        x_dict = self.conv2(x_dict, edge_index_dict)
        return x_dict

In [21]:
#Define relevance class
class RelevanceLinkPredictor(torch.nn.Module):
    def forward(self, x_user, x_ad, edge_label_index):
        user_idx, ad_idx = edge_label_index
        # Dot product relevance score
        return (x_user[user_idx] * x_ad[ad_idx]).sum(dim=-1)

In [22]:
#Define full GNN
class FullGNNModel(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.encoder = AdRelevanceGAT(hidden_channels, out_channels)
        self.decoder = RelevanceLinkPredictor()

    def forward(self, x_dict, edge_index_dict, edge_label_index):
        z_dict = self.encoder(x_dict, edge_index_dict)
        # We only care about predicting 'user' to 'ad' links for the auction
        return self.decoder(z_dict['user'], z_dict['ad'], edge_label_index)

In [23]:
#Initialize training loop and training parameters
model = FullGNNModel(hidden_channels=32, out_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.BCEWithLogitsLoss()

def train_one_epoch(data):
    model.train()
    optimizer.zero_grad()
    
    pred = model(data.x_dict, data.edge_index_dict, data['user', 'ad'].edge_label_index.to(torch.long))
    target = data['user', 'ad'].edge_label.to(torch.float)
    
    loss = criterion(pred, target)
    loss.backward()
    optimizer.step()
    return loss.item()

In [24]:
#Perform training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
train_data = train.to(device)

for epoch in range(1, 51):
    loss = train_one_epoch(train_data)
    if epoch % 10 == 0:
        print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

Epoch: 010, Loss: 89.9706
Epoch: 020, Loss: 21.5330
Epoch: 030, Loss: 6.1087
Epoch: 040, Loss: 4.2366
Epoch: 050, Loss: 2.0209


In [25]:
#Perform test evalution
@torch.no_grad()
def evaluate_relevance(test_data):
    model.eval()
    
    pred = model(test_data.x_dict, test_data.edge_index_dict, test_data['user', 'ad'].edge_label_index)
    
    pred = torch.sigmoid(pred).cpu().numpy()
    target = test_data['user', 'ad'].edge_label.cpu().numpy()
    
    auc = roc_auc_score(target, pred)
    return auc

test_auc = evaluate_relevance(test.to(device))
print(f"Test AUC: {test_auc:.4f}")

Test AUC: 0.6428


In [26]:
#Define function for ranking ads
@torch.no_grad()
def rank_ads(user_gnn_id, ad_candidates, data):
    model.eval()
    
    edge_label_index = torch.tensor([
        [user_gnn_id] * len(ad_candidates),
        ad_candidates
    ], dtype=torch.long).to(device)
    
    logits = model(data.x_dict, data.edge_index_dict, edge_label_index)
    probs = torch.sigmoid(logits).cpu().numpy()
    
    ranked_ads = sorted(zip(ad_candidates, probs), key=lambda x: x[1], reverse=True)
    return ranked_ads

In [27]:
df.filter(pl.col("user_id") == 1000)

user_id,target_id,category,is_ad,user_gnn_id
i64,i64,str,bool,u32
1000,17607,"""Home""",false,8959
1000,10296,"""Food""",false,8959
1000,21024,"""Food""",false,8959
1000,100,"""Movies""",false,8959
1000,3575,"""Movies""",false,8959
…,…,…,…,…
1000,4291,"""Food""",false,8959
1000,22716,"""Movies""",true,8959
1000,19724,"""Animals""",false,8959
1000,5088,"""Movies""",true,8959


In [28]:
users.filter(pl.col("user_id") == 1000)

user_id,age_group,main_interest
i64,str,str
1000,"""18-25""","""Food"""


In [29]:
items.filter(pl.col("is_ad") == True).head(10)

item_id,category,is_ad
i64,str,bool
20,"""Home""",True
22,"""Home""",True
26,"""Animals""",True
32,"""Food""",True
35,"""Sports""",True
38,"""Sports""",True
61,"""Animals""",True
64,"""Exercise""",True
67,"""Fashion""",True
68,"""Animals""",True


In [30]:
#Test ad rank function
rank_ads(1000, np.arange(10), hetero_data)

  edge_label_index = torch.tensor([


[(2, 1.0),
 (1, 0.9999999),
 (0, 0.83670586),
 (8, 0.24470966),
 (6, 0.17266823),
 (4, 0.1658946),
 (5, 0.11218947),
 (3, 0.09960052),
 (9, 0.09643514),
 (7, 0.0514786)]

In [31]:
#TODO: Fix model overfit