# Using SDBN Click Model To Overcome Position Bias

This section we use the _Simplified Dynamic Bayesian Network_ (SDBN) to overcome the position bias that we saw with direct Click-Through-Rate. We consider the SDBN judgments and how they compare to just the click through rate.

In [1]:
import sys
sys.path.append('../..')
from ltr.sdbn_functions import all_sessions, get_sessions
from aips import fetch_products, render_judged
import pandas
# if using a Jupyter notebook, includue:
%matplotlib inline

In [14]:
sessions = all_sessions()
products = fetch_products(doc_ids=sessions['doc_id'].unique())

def print_dataframe(dataframe):
    pandas.reset_option("all")
    merged = dataframe.merge(products[["upc", "name"]], left_on='doc_id', right_on='upc', how='left')
    print(merged.rename(columns={"upc": "doc_id"}).set_index("doc_id"))

# Listing 11.7

Click models overcome position bias by learning an examine probability on each ranking. SDBN tracks examines relative to the the last click. This code marks last click position per session so we can compute examine probabilities.

In [3]:
#%load -s calculate_examine_probability ../ltr/sdbn_functions.py
def calculate_examine_probability(sessions):
    last_click_per_session = sessions.groupby(
        ["clicked", "sess_id"])["rank"].max()[True]
    sessions["last_click_rank"] = last_click_per_session
    sessions["examined"] = \
      sessions["rank"] <= sessions["last_click_rank"]
    return sessions

In [16]:
sessions = get_sessions("dryer")
probablity = calculate_examine_probability(sessions).loc[3]
print(probablity)

         query  rank        doc_id  clicked  last_click_rank  examined
sess_id                                                               
3        dryer   0.0   12505451713    False              9.0      True
3        dryer   1.0   84691226727    False              9.0      True
3        dryer   2.0  883049066905    False              9.0      True
3        dryer   3.0   48231011396    False              9.0      True
3        dryer   4.0   74108056764    False              9.0      True
3        dryer   5.0   77283045400    False              9.0      True
3        dryer   6.0  783722274422    False              9.0      True
3        dryer   7.0  665331101927    False              9.0      True
3        dryer   8.0   14381196320     True              9.0      True
3        dryer   9.0   74108096487     True              9.0      True
3        dryer  10.0   74108007469    False              9.0     False
3        dryer  11.0   12505525766    False              9.0     False
3     

# Listing 11.8

Aggregate clicks and examine counts

In [5]:
#%load -s calculate_clicked_examined ../ltr/sdbn_functions.py
def calculate_clicked_examined(sessions):
    sessions = calculate_examine_probability(sessions)
    return sessions[sessions["examined"]] \
        .groupby("doc_id")[["clicked", "examined"]].sum()

In [6]:
sessions = get_sessions("dryer")
clicked_examined_data = calculate_clicked_examined(sessions)
print_dataframe(clicked_examined_data)

              clicked  examined                                 name
doc_id                                                              
12505451713       355      2707  Frigidaire - Semi-Rigid Dryer Ve...
12505525766       268       974  Smart Choice - 6' 30 Amp 3-Prong...
12505527456       110       428  Smart Choice - 1/2" Safety+PLUS ...
14381196320       217      1202             The Mind Snatchers - DVD
36172950027        97       971  Tools in the Dryer: A Rarities C...
36725561977       119       572  Samsung - 3.5 Cu. Ft. 6-Cycle Hi...
36725578241       130       477  Samsung - 7.3 Cu. Ft. 7-Cycle El...
48231011396       166       423  LG - 3.5 Cu. Ft. 7-Cycle High-Ef...
48231011402       213       818  LG - 7.1 Cu. Ft. 7-Cycle Electri...
74108007469       208       708  Conair - 1875-Watt Folding Handl...
74108056764       273      1791  Conair - Infiniti Ionic Cord-Kee...
74108096487       235      1097  Conair - Infiniti Cord-Keeper Pr...
77283045400       276      1625   

# Listing 11.9

We compute a grade - a probability of relevance - by dividing the clicks by examines. This is the kind of dynamic 'click thru rate' of SDBN, that accounts for whether the result was actually seen by users, not just whether it was shown on the screen.

In [7]:
#%load -s calculate_grade ../ltr/sdbn_functions.py
def calculate_grade(sessions):
    sessions = calculate_clicked_examined(sessions)
    sessions["grade"] = sessions["clicked"] / sessions["examined"]
    return sessions.sort_values("grade", ascending=False)

In [8]:
query = "dryer"
sessions = get_sessions(query)
grade_data = calculate_grade(sessions)
print_dataframe(grade_data)

              clicked  examined     grade                                 name
doc_id                                                                        
856751002097      133       323  0.411765     Practecol - Dryer Balls (2-Pack)
48231011396       166       423  0.392435  LG - 3.5 Cu. Ft. 7-Cycle High-Ef...
84691226727       804      2541  0.316411  GE - 6.0 Cu. Ft. 3-Cycle Electri...
74108007469       208       708  0.293785  Conair - 1875-Watt Folding Handl...
12505525766       268       974  0.275154  Smart Choice - 6' 30 Amp 3-Prong...
36725578241       130       477  0.272537  Samsung - 7.3 Cu. Ft. 7-Cycle El...
48231011402       213       818  0.260391  LG - 7.1 Cu. Ft. 7-Cycle Electri...
12505527456       110       428  0.257009  Smart Choice - 1/2" Safety+PLUS ...
74108096487       235      1097  0.214221  Conair - Infiniti Cord-Keeper Pr...
36725561977       119       572  0.208042  Samsung - 3.5 Cu. Ft. 6-Cycle Hi...
84691226703       408      2015  0.202481  Hotpoint 

# Figure 11.6 Source Code

In [9]:
render_judged(products, calculate_grade(sessions), grade_col="grade", label=f"SDBN judgments for q={query}")

Unnamed: 0,grade,upc,image,name
0,0.4118,856751002097,,Practecol - Dryer Balls (2-Pack)
1,0.3924,48231011396,,LG - 3.5 Cu. Ft. 7-Cycle High-Efficiency Washer - White
2,0.3164,84691226727,,GE - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White
3,0.2938,74108007469,,Conair - 1875-Watt Folding Handle Hair Dryer - Blue
4,0.2752,12505525766,,Smart Choice - 6' 30 Amp 3-Prong Dryer Cord


# Listing 11.10 Source Code

In [10]:
query = "transformers dark of the moon"
sessions = get_sessions(query)
grade_data = calculate_grade(sessions)
print_dataframe(grade_data)

              clicked  examined     grade                                 name
doc_id                                                                        
97360810042       412       642  0.641745  Transformers: Dark of the Moon -...
400192926087       62       129  0.480620  Transformers: Dark of the Moon -...
97363560449        96       243  0.395062  Transformers: Dark of the Moon -...
97363532149        42       130  0.323077  Transformers: Revenge of the Fal...
93624956037        41       154  0.266234  Transformers: Dark of the Moon -...
47875842328       367      1531  0.239713  Transformers: Dark of the Moon S...
47875841420       217       960  0.226042  Transformers: Dark of the Moon D...
25192107191       176      1082  0.162662  Fast Five - Widescreen - Blu-ray...
786936817218      118       777  0.151866  Pirates Of The Caribbean: On Str...
786936817218      118       777  0.151866  Pirates of the Caribbean: On Str...
36725235564        41       277  0.148014  Samsung -

# Figure 11.7 Source Code

In [11]:
render_judged(products, calculate_grade(sessions), grade_col="grade", label=f"SDBN judgments for q={query}")

Unnamed: 0,grade,upc,image,name
0,0.6417,97360810042,,Transformers: Dark of the Moon - Blu-ray Disc
1,0.4806,400192926087,,Transformers: Dark of the Moon - Original Soundtrack - CD
2,0.3951,97363560449,,Transformers: Dark of the Moon - Widescreen Dubbed Subtitle - DVD
3,0.3231,97363532149,,Transformers: Revenge of the Fallen - Widescreen Dubbed Subtitle - DVD
4,0.2662,93624956037,,Transformers: Dark of the Moon - Original Soundtrack - CD


Up next: [Dealing with Low Confidence Situations](3.SDBN-Confidence-Bias.ipynb)