# Using SDBN Click Model To Overcome Position Bias

This section we use the _Simplified Dynamic Bayesian Network_ (SDBN) to overcome the position bias that we saw with direct Click-Thru-Rate. We consider the SDBN judgments and how they compare to just the click thru rate.

In [1]:
! cd ../data/retrotech && head signals.csv

import random
import pandas as pd
import numpy as np
import sys
sys.path.append('..')
from aips import *
from session_gen import SessionGenerator
import os
from IPython.core.display import display,HTML

import matplotlib.pyplot as plt
import numpy as np
# if using a Jupyter notebook, includue:
%matplotlib inline

"query_id","user","type","target","signal_time"
"u2_0_1","u2","query","nook","2019-07-31 08:49:07.3116"
"u2_1_2","u2","query","rca","2020-05-04 08:28:21.1848"
"u3_0_1","u3","query","macbook","2019-12-22 00:07:07.0152"
"u4_0_1","u4","query","Tv antenna","2019-08-22 23:45:54.1030"
"u5_0_1","u5","query","AC power cord","2019-10-20 08:27:00.1600"
"u6_0_1","u6","query","Watch The Throne","2019-09-18 11:59:53.7470"
"u7_0_1","u7","query","Camcorder","2020-02-25 13:02:29.3089"
"u9_0_1","u9","query","wireless headphones","2020-04-26 04:26:09.7198"
"u10_0_1","u10","query","Xbox","2019-09-13 16:26:12.0132"


In [2]:
def all_sessions():
    import glob
    sessions = pd.concat([pd.read_csv(f, compression='gzip')
                          for f in glob.glob('*_sessions.gz')])
    return sessions.rename(columns={'clicked_doc_id': 'doc_id'})
    
sessions = all_sessions()
sessions

ValueError: No objects to concatenate

In [None]:
products = fetch_products(doc_ids=sessions['doc_id'].unique())

products

# Listing 11.7

Click models overcome position bias by learning an examine probability on each ranking. SDBN tracks examines relative to the the last click. This code marks last click position per session so we can compute examine probabilities.

In [10]:
# Select all sessions for query 'dryer'
QUERY='dryer'
sdbn_sessions = sessions[sessions['query'] == QUERY].copy().set_index('sess_id')

# Mapping of sess_id -> last_click_per_session
last_click_per_session = sdbn_sessions.groupby(['clicked', 'sess_id'])['rank'].max()[True]

# Mark the last click rank in each session
sdbn_sessions['last_click_rank'] = last_click_per_session

# Set each positions examine to true or false
sdbn_sessions['examined'] = sdbn_sessions['rank'] <= sdbn_sessions['last_click_rank']

# Examine session 3
sdbn_sessions.loc[3]

Unnamed: 0_level_0,query,rank,doc_id,clicked,last_click_rank,examined
sess_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3,dryer,0.0,12505451713,False,9.0,True
3,dryer,1.0,84691226727,False,9.0,True
3,dryer,2.0,883049066905,False,9.0,True
3,dryer,3.0,48231011396,False,9.0,True
3,dryer,4.0,74108056764,False,9.0,True
3,dryer,5.0,77283045400,False,9.0,True
3,dryer,6.0,783722274422,False,9.0,True
3,dryer,7.0,665331101927,False,9.0,True
3,dryer,8.0,14381196320,True,9.0,True
3,dryer,9.0,74108096487,True,9.0,True


# Listing 11.8

Aggregate clicks and examine counts

In [11]:
sdbn = sdbn_sessions[sdbn_sessions['examined']].groupby('doc_id')[['clicked', 'examined']].sum()
sdbn

Unnamed: 0_level_0,clicked,examined
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1
12505451713,355.0,2707.0
12505525766,268.0,974.0
12505527456,110.0,428.0
14381196320,217.0,1202.0
36172950027,97.0,971.0
36725561977,119.0,572.0
36725578241,130.0,477.0
48231011396,166.0,423.0
48231011402,213.0,818.0
74108007469,208.0,708.0


# Listing 11.9

We compute a grade - a probability of relevance - by dividing the clicks by examines. This is the kind of dynamic 'click thru rate' of SDBN, that accounts for whether the result was actually seen by users, not just whether it was shown on the screen.

In [12]:
# Clicks over examines

sdbn['grade'] = sdbn['clicked'] / sdbn['examined']

sdbn = sdbn.sort_values('grade', ascending=False)
sdbn

Unnamed: 0_level_0,clicked,examined,grade
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
856751002097,133.0,323.0,0.411765
48231011396,166.0,423.0,0.392435
84691226727,804.0,2541.0,0.316411
74108007469,208.0,708.0,0.293785
12505525766,268.0,974.0,0.275154
36725578241,130.0,477.0,0.272537
48231011402,213.0,818.0,0.260391
12505527456,110.0,428.0,0.257009
74108096487,235.0,1097.0,0.214221
36725561977,119.0,572.0,0.208042


# Figure 11.8 source code

In [13]:
render_judged(products, sdbn, grade_col='grade', label=f"SDBN judgments for q={QUERY}")

Unnamed: 0,grade,image,upc,name,shortDescription
0,0.411765,,856751002097,Practecol - Dryer Balls (2-Pack),"Suitable for use on most dry cycles; reduces lint, static and wrinkles; improves heat circulation; 2-pack"
1,0.392435,,48231011396,LG - 3.5 Cu. Ft. 7-Cycle High-Efficiency Washer - White,ENERGY STAR QualifiedDigital controls; 7 cycles; SpeedWash cycle; 9 wash options; delay-wash; SenseClean system; 6Motion technology; TrueBalance antivibration system
2,0.316411,,84691226727,GE - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White,Rotary electromechanical controls; 3 cycles; 3 heat selections; DuraDrum interior; Quiet-By-Design
3,0.293785,,74108007469,Conair - 1875-Watt Folding Handle Hair Dryer - Blue,2 heat/speed settings; cool shot button; dual voltage; professional-length line cord
4,0.275154,,12505525766,Smart Choice - 6' 30 Amp 3-Prong Dryer Cord,Heavy-duty PVC insulation; strain relief safety clamp
5,0.272537,,36725578241,Samsung - 7.3 Cu. Ft. 7-Cycle Electric Dryer - White,Soft-touch dial controls; 7 preset drying cycles; 4 temperature settings; powdercoat drum; noise reduction package
6,0.260391,,48231011402,LG - 7.1 Cu. Ft. 7-Cycle Electric Dryer - White,Electronic controls with LED display; 7 cycles; Dial-A-Cycle option; sensor dry system; 5 temperature levels; 5 drying levels; NeveRust drum; LoDecibel quiet operation
7,0.257009,,12505527456,"Smart Choice - 1/2"" Safety+PLUS Stainless-Steel Gas Dryer Connector","Safety+PLUS automatic shut-off valve; leak detection solution; pipe thread sealant; 60,500 BTU; CSA approved"
8,0.214221,,74108096487,Conair - Infiniti Cord-Keeper Professional Tourmaline Ionic Hair Dryer - Fuchsia,Tourmaline ceramic technology; ionic technology; 1875 watts; Cool Shot function; 3 heat settings; 2 speed settings; 5' retractable cord; includes diffuser
9,0.208042,,36725561977,Samsung - 3.5 Cu. Ft. 6-Cycle High-Efficiency Washer - White,ENERGY STAR QualifiedSoft dial touch pad controls; 6 cycles; delay-start; child lock; Vibration Reduction Technology


# Figure 11.9 Source Code

In [14]:
# Mark the last click on each query's session
QUERY='transformers dark of the moon'
sdbn_sessions = sessions[sessions['query'] == QUERY].copy().set_index('sess_id')

last_click_per_session = sdbn_sessions.groupby(['clicked', 'sess_id'])['rank'].max()[True]

sdbn_sessions['last_click_rank'] = last_click_per_session
sdbn_sessions['examined'] = sdbn_sessions['rank'] <= sdbn_sessions['last_click_rank']

sdbn = sdbn_sessions[sdbn_sessions['examined']].groupby('doc_id')[['clicked', 'examined']].sum()
sdbn['grade'] = sdbn['clicked'] / sdbn['examined']

sdbn = sdbn.sort_values('grade', ascending=False)
render_judged(products, sdbn, grade_col='grade', label=f"SDBN judgments for q={QUERY}")


Unnamed: 0,grade,image,upc,name,shortDescription
0,0.641745,,97360810042,Transformers: Dark of the Moon - Blu-ray Disc,\N
1,0.48062,,400192926087,Transformers: Dark of the Moon - Original Soundtrack - CD,\N
2,0.395062,,97363560449,Transformers: Dark of the Moon - Widescreen Dubbed Subtitle - DVD,\N
3,0.323077,,97363532149,Transformers: Revenge of the Fallen - Widescreen Dubbed Subtitle - DVD,\N
4,0.266234,,93624956037,Transformers: Dark of the Moon - Original Soundtrack - CD,\N
5,0.239713,,47875842328,Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii,Transform into an epic hero or a vehicular villain
6,0.226042,,47875841420,Transformers: Dark of the Moon Decepticons - Nintendo DS,Transform into an epic hero or a vehicular villain
7,0.162662,,25192107191,Fast Five - Widescreen - Blu-ray Disc,\N
8,0.151866,,786936817218,Pirates Of The Caribbean: On Stranger Tides (3-D) - Blu-ray 3D,\N
9,0.148014,,36725235564,"Samsung - 40"" Class - LCD - 1080p - 120Hz - HDTV",\N


In [831]:
sdbn

Unnamed: 0_level_0,clicked,examined,orig_grade,prior_a,prior_b,posterior_a,posterior_b,beta_grade
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
24543701538,182.0,1232.0,0.147727,16.0,24.0,198.0,1074.0,0.15566
24543750949,31.0,313.0,0.099042,16.0,24.0,47.0,306.0,0.133144
25192107191,176.0,1082.0,0.162662,16.0,24.0,192.0,930.0,0.171123
36725235564,41.0,277.0,0.148014,16.0,24.0,57.0,260.0,0.179811
47875841369,37.0,251.0,0.14741,16.0,24.0,53.0,238.0,0.182131
47875841406,80.0,626.0,0.127796,16.0,24.0,96.0,570.0,0.144144
47875841420,217.0,960.0,0.226042,16.0,24.0,233.0,767.0,0.233
47875842328,367.0,1531.0,0.239713,16.0,24.0,383.0,1188.0,0.243794
47875842335,53.0,681.0,0.077827,16.0,24.0,69.0,652.0,0.0957
93624956037,41.0,154.0,0.266234,16.0,24.0,57.0,137.0,0.293814
