# Your First Click Model: Click Through Rate

This section examines the session data and computes the probability of relevance using Click-Through-Rate. Roughly the number of clicks divided by the number of sessions. Then we examine wheter there's position bias in that data - that is, consider perhaps that some documents have a higher CTR only because they show up higher in the search results.

In [1]:
import sys
sys.path.append("..")
sys.path.append("../ltr")
from aips import *
from ltr.judgments import Judgment
from ltr.sdbn_functions import *
import pandas

# if using a Jupyter notebook, includue:
%matplotlib inline

# Listing 11.01
Judgments with binary grades

In [2]:
# Judgment(grade, keywords, doc_id)
sample_judgments = [
  # for 'social network' query
  Judgment(1, "social network", 37799),  # The Social Network
  Judgment(0, "social network", 267752), # #chicagoGirl
  Judgment(0, "social network", 38408),  # Life As We Know It
  Judgment(0, "social network", 28303),  # The Cheyenne Social Club
  # for 'star wars' query
  Judgment(1, "star wars", 11),     # Star Wars
  Judgment(1, "star wars", 1892),   # Return of Jedi
  Judgment(0, "star wars", 54138),  # Star Trek Into Darkness
  Judgment(0, "star wars", 85783),  # The Star
  Judgment(0, "star wars", 325553)  # Battlestar Galactica
]

# Listing 11.02
Judgments with probablistic grades

In [3]:
sample_judgments = [
  Judgment(0.99, "social network", 37799),  # The Social Network
  Judgment(0.01, "social network", 267752), # #chicagoGirl
  Judgment(0.01, "social network", 38408),  # Life As We Know It
  Judgment(0.01, "social network", 28303),  # The Cheyenne Social Club
  Judgment(0.99, "star wars", 11),     # Star Wars
  Judgment(0.80, "star wars", 1892),   # Return of Jedi
  Judgment(0.20, "star wars", 54138),  # Star Trek Into Darkness
  Judgment(0.01, "star wars", 85783),  # The Star
  Judgment(0.20, "star wars", 325553)  # Battlestar Galactica
]

# Listing 11.03

Viewing session 2 of query `transformers dark of the moon` in retrotech. Here we inspect one of the sessions. We encourage you to examine other sessions

In [4]:
all_sessions()

Unnamed: 0,sess_id,query,rank,doc_id,clicked
0,50002,blue ray,0.0,600603141003,True
1,50002,blue ray,1.0,827396513927,False
2,50002,blue ray,2.0,24543672067,False
3,50002,blue ray,3.0,719192580374,False
4,50002,blue ray,4.0,885170033412,True
...,...,...,...,...,...
74995,5001,transformers dark of the moon,10.0,47875841369,False
74996,5001,transformers dark of the moon,11.0,97363560449,False
74997,5001,transformers dark of the moon,12.0,93624956037,False
74998,5001,transformers dark of the moon,13.0,97363532149,False


In [5]:
sessions = all_sessions()
products = fetch_products(doc_ids=sessions["doc_id"].unique())

def print_series_data(series_data, column):
    #pandas.set_option("display.width", 76)
    dataframe = series_data.to_frame(name=column).sort_values(column, ascending=False)
    merged = dataframe.merge(products, left_on='doc_id', right_on='upc', how='left')
    print(merged.rename(columns={"upc": "doc_id"})[["doc_id", column, "name"]].set_index("doc_id"))

In [6]:
query = "transformers dark of the moon"
sessions = get_sessions(query, index=False)
ctrs = calculate_ctr(sessions)
print_series_data(ctrs, column="CTR")

                 CTR                                               name
doc_id                                                                 
97360810042   0.0824      Transformers: Dark of the Moon - Blu-ray Disc
47875842328   0.0734  Transformers: Dark of the Moon Stealth Force E...
47875841420   0.0434  Transformers: Dark of the Moon Decepticons - N...
24543701538   0.0364  The A-Team - Widescreen Dubbed Subtitle AC3 - ...
25192107191   0.0352              Fast Five - Widescreen - Blu-ray Disc
786936817218  0.0236  Pirates Of The Caribbean: On Stranger Tides (3...
786936817218  0.0236  Pirates of the Caribbean: On Stranger Tides - ...
97363560449   0.0192  Transformers: Dark of the Moon - Widescreen Du...
47875841406   0.0160  Transformers: Dark of the Moon Autobots - Nint...
400192926087  0.0124  Transformers: Dark of the Moon - Original Soun...
47875842335   0.0106  Transformers: Dark of the Moon Stealth Force E...
97363532149   0.0084  Transformers: Revenge of the Fallen - Wide

In [7]:
query = "transformers dark of the moon"
sessions = get_sessions(query)
print(sessions.loc[3])

                                 query  rank        doc_id  clicked
sess_id                                                            
3        transformers dark of the moon   0.0   47875842328    False
3        transformers dark of the moon   1.0   24543701538    False
3        transformers dark of the moon   2.0   25192107191    False
3        transformers dark of the moon   3.0   47875841420    False
3        transformers dark of the moon   4.0  786936817218    False
3        transformers dark of the moon   5.0   47875842335    False
3        transformers dark of the moon   6.0   97363532149    False
3        transformers dark of the moon   7.0   97360810042     True
3        transformers dark of the moon   8.0   24543750949    False
3        transformers dark of the moon   9.0   36725235564    False
3        transformers dark of the moon  10.0   47875841369    False
3        transformers dark of the moon  11.0   97363560449    False
3        transformers dark of the moon  12.0   9

# Listing 11.04

Simple CTR based judgments for our query. We compute the CTR by taking the number of clicks for a document relative to the number of unique sessions the doc appears in for that query.

In [8]:
#%load -s calculate_ctr ../ltr/sdbn_functions.py
def calculate_ctr(sessions):
    click_counts = sessions.groupby("doc_id")["clicked"].sum()
    sess_counts = sessions.groupby("doc_id")["sess_id"].nunique()
    ctrs = click_counts / sess_counts
    return ctrs.sort_values(ascending=False)

In [9]:
query = "transformers dark of the moon"
sessions = get_sessions(query, index=False)
ctrs = calculate_ctr(sessions)
print_series_data(ctrs, "CTR")

                 CTR                                               name
doc_id                                                                 
97360810042   0.0824      Transformers: Dark of the Moon - Blu-ray Disc
47875842328   0.0734  Transformers: Dark of the Moon Stealth Force E...
47875841420   0.0434  Transformers: Dark of the Moon Decepticons - N...
24543701538   0.0364  The A-Team - Widescreen Dubbed Subtitle AC3 - ...
25192107191   0.0352              Fast Five - Widescreen - Blu-ray Disc
786936817218  0.0236  Pirates Of The Caribbean: On Stranger Tides (3...
786936817218  0.0236  Pirates of the Caribbean: On Stranger Tides - ...
97363560449   0.0192  Transformers: Dark of the Moon - Widescreen Du...
47875841406   0.0160  Transformers: Dark of the Moon Autobots - Nint...
400192926087  0.0124  Transformers: Dark of the Moon - Original Soun...
47875842335   0.0106  Transformers: Dark of the Moon Stealth Force E...
97363532149   0.0084  Transformers: Revenge of the Fallen - Wide

# Figure 11.2

Source code to render CTR judgment's ideal relevance ranking for `transformers dark of the moon`. In other words, our search results ordered from highest CTR to lowest.



In [10]:
query = "transformers dark of the moon"
sessions = get_sessions(query, index=False)
ctrs = calculate_ctr(sessions)
df = ctrs.to_frame(name="ctr").round(4)
print(df)
render_judged(products,
              df.sort_values("ctr", ascending=False),
              grade_col="ctr",
              label=f"Click-Thru-Rate Judgments for q={query}")

                 ctr
doc_id              
97360810042   0.0824
47875842328   0.0734
47875841420   0.0434
24543701538   0.0364
25192107191   0.0352
786936817218  0.0236
97363560449   0.0192
47875841406   0.0160
400192926087  0.0124
47875842335   0.0106
97363532149   0.0084
36725235564   0.0082
93624956037   0.0082
47875841369   0.0074
24543750949   0.0062


Unnamed: 0,ctr,upc,image,name
0,0.0824,97360810042,,Transformers: Dark of the Moon - Blu-ray Disc
1,0.0734,47875842328,,Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii
2,0.0434,47875841420,,Transformers: Dark of the Moon Decepticons - Nintendo DS
3,0.0364,24543701538,,The A-Team - Widescreen Dubbed Subtitle AC3 - Blu-ray Disc
4,0.0352,25192107191,,Fast Five - Widescreen - Blu-ray Disc


# Figure 11.3

Source code to render CTR ideal relevance ranking for `dryer`. Ordering the highest CTR result to the lowest.

In [11]:
query = "dryer"
sessions = get_sessions(query, index=False)
ctrs = calculate_ctr(sessions)
render_judged(products,
              ctrs.to_frame(name="ctr").sort_values("ctr", ascending=False),
              grade_col="ctr",
              label=f"Click-Thru-Rate Judgments for q={query}")

Unnamed: 0,ctr,upc,image,name
0,0.1608,84691226727,,GE - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White
1,0.0816,84691226703,,Hotpoint - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White-on-White
2,0.071,12505451713,,Frigidaire - Semi-Rigid Dryer Vent Kit - Silver
3,0.0576,783722274422,,The Independent - Widescreen Subtitle - DVD
4,0.0572,883049066905,,Whirlpool - Affresh Washer Cleaner


## Listing 11.05

Computing the global CTR of each rank per search ranking to consider whether the click data is biased by position. We look over every search to see what the CTR is when a document is placed in a specific rank.

In [12]:
sessions = all_sessions()
num_sessions = len(sessions["sess_id"].unique())
ctr_by_rank = sessions.groupby("rank")["clicked"].sum() / num_sessions
print(ctr_by_rank)

rank
0.0     0.249727
1.0     0.142673
2.0     0.084218
3.0     0.063073
4.0     0.056255
5.0     0.042255
6.0     0.033236
7.0     0.038000
8.0     0.020964
9.0     0.017364
10.0    0.013982
11.0    0.018582
12.0    0.015982
13.0    0.014509
14.0    0.012327
15.0    0.010200
16.0    0.011782
17.0    0.007891
18.0    0.007273
19.0    0.008145
20.0    0.006236
21.0    0.004473
22.0    0.005455
23.0    0.004982
24.0    0.005309
25.0    0.004364
26.0    0.005055
27.0    0.004691
28.0    0.005000
29.0    0.005400
Name: clicked, dtype: float64


## Listing 11.06

We look at the documents for our query, and notice that certain ones tend to appear higher and others tend to appear lower. If irrelevant ones dominate the top listings, position bias will dominate our training data

In [13]:
# %load -s calculate_average_rank ../ltr/sdbn_functions.py
def calculate_average_rank(sessions):
    avg_rank = sessions.groupby("doc_id")["rank"].mean()
    return avg_rank.sort_values(ascending=True)

In [14]:
sessions = get_sessions("transformers dark of the moon")
average_rank = calculate_average_rank(sessions)
print_series_data(average_rank, "mean_rank")

              mean_rank                                               name
doc_id                                                                    
400192926087    13.0526  Transformers: Dark of the Moon - Original Soun...
97363532149     12.1494  Transformers: Revenge of the Fallen - Widescre...
93624956037     11.3298  Transformers: Dark of the Moon - Original Soun...
97363560449     10.4304  Transformers: Dark of the Moon - Widescreen Du...
47875841369      9.5796     Transformers: Dark of the Moon - PlayStation 3
36725235564      8.6854   Samsung - 40" Class - LCD - 1080p - 120Hz - HDTV
24543750949      7.8626  X-Men: First Class - Widescreen Dubbed Subtitl...
97360810042      7.0130      Transformers: Dark of the Moon - Blu-ray Disc
47875841406      6.1378  Transformers: Dark of the Moon Autobots - Nint...
47875842335      5.2776  Transformers: Dark of the Moon Stealth Force E...
786936817218     4.4444  Pirates Of The Caribbean: On Stranger Tides (3...
786936817218     4.4444  

# Figure 11.4

In [15]:
sessions = get_sessions("transformers dark of the moon")
average_rank = calculate_average_rank(sessions)
render_judged(products, 
              average_rank.to_frame(name="mean_rank").sort_values("mean_rank", ascending=True),
              grade_col="mean_rank",
              label=f"Typical Search Session for q={query}")

Unnamed: 0,mean_rank,upc,image,name
0,0.9808,47875842328,,Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii
1,1.8626,24543701538,,The A-Team - Widescreen Dubbed Subtitle AC3 - Blu-ray Disc
2,2.6596,25192107191,,Fast Five - Widescreen - Blu-ray Disc
3,3.5344,47875841420,,Transformers: Dark of the Moon Decepticons - Nintendo DS
4,4.4444,786936817218,,Pirates Of The Caribbean: On Stranger Tides (3-D) - Blu-ray 3D


In [16]:
query = "dryer"
sessions = get_sessions(query)
average_rank = calculate_average_rank(sessions)
print_series_data(average_rank, "mean")

                 mean                                               name
doc_id                                                                  
856751002097  17.0208                   Practecol - Dryer Balls (2-Pack)
48231011396   16.1548  LG - 3.5 Cu. Ft. 7-Cycle High-Efficiency Washe...
12505527456   15.3526  Smart Choice - 1/2" Safety+PLUS Stainless-Stee...
36725578241   14.7286  Samsung - 7.3 Cu. Ft. 7-Cycle Electric Dryer -...
36725561977   13.8932  Samsung - 3.5 Cu. Ft. 6-Cycle High-Efficiency ...
883929085118  12.9996     A Charlie Brown Christmas - AC3 - Blu-ray Disc
74108007469   12.2940  Conair - 1875-Watt Folding Handle Hair Dryer -...
48231011402   11.4734    LG - 7.1 Cu. Ft. 7-Cycle Electric Dryer - White
12505525766   10.6500        Smart Choice - 6' 30 Amp 3-Prong Dryer Cord
36172950027    9.8758    Tools in the Dryer: A Rarities Compilation - CD
74108096487    9.1230  Conair - Infiniti Cord-Keeper Professional Tou...
14381196320    8.3308                           The

In [17]:
render_judged(products, 
              average_rank.reset_index().sort_values("rank"),
              grade_col="rank",
              label=f"Typical Search Session for q={query}")

Unnamed: 0,rank,upc,image,name
0,1.9124,12505451713,,Frigidaire - Semi-Rigid Dryer Vent Kit - Silver
1,2.829,84691226727,,GE - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White
2,3.5726,883049066905,,Whirlpool - Affresh Washer Cleaner
3,4.4552,84691226703,,Hotpoint - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White-on-White
4,5.1276,74108056764,,Conair - Infiniti Ionic Cord-Keeper Hair Dryer - Light Purple


Up next: [Using SDBN Click Model To Overcome Position Bias](2.sdbn-judgments-to-overcome-position-bias.ipynb)