# Using SDBN Click Model To Overcome Position Bias

This section we use the _Simplified Dynamic Bayesian Network_ (SDBN) to overcome the position bias that we saw with direct Click-Through-Rate. We consider the SDBN judgments and how they compare to just the click through rate.

In [1]:
import sys
sys.path.append('..')
from aips import *
import matplotlib.pyplot as plt
import numpy
import pandas 
#from ltr.sdbn_functions import *
import glob 

# if using a Jupyter notebook, includue:
%matplotlib inline

In [78]:
sessions = all_sessions()
products = fetch_products(doc_ids=sessions['doc_id'].unique())
products

Unnamed: 0,image,upc,name,manufacturer,shortDescription,longDescription,id,_version_
0,"<img height=""100"" src=""../data/retrotech/image...",885909471812,Apple&#xAE; - iPad&#xAE; 2 with Wi-Fi - 16GB -...,Apple&#xAE;,"9.7"" widescreen display; 802.11a/b/g/n Wi-Fi; ...",The all-new thinner and lighter design makes i...,a33cc2d0-caa4-4c45-ab43-e567f7a16306,1798402974632902658
1,"<img height=""100"" src=""../data/retrotech/image...",885909394494,Apple&#xAE; - iPhone 4 with 16GB Memory - Whit...,Apple&#xAE;,"iPhone iOS 4 operating systemWi-Fi3.5"" Retina ...","This slim, powerful iPhone features a high-qua...",b47deb26-5c5c-491b-91a3-ec4117ffc70b,1798402974668554252
2,"<img height=""100"" src=""../data/retrotech/image...",75993997675,Metallica/Slayer/Megadeth/Anthrax: The Big 4 -...,\N,\N,\N,5e10595e-d5c3-4334-a35f-071f4e0ae0dc,1798402974685331461
3,"<img height=""100"" src=""../data/retrotech/image...",22265004302,"Toshiba - 55"" Class - LCD - 1080p - 120Hz - HDTV",Toshiba,\N,"This 55"" flat-panel LCD TV supports stunning h...",6d16a9b6-ee1e-4829-a73b-db62a9d2f295,1798402974714691589
4,"<img height=""100"" src=""../data/retrotech/image...",36725235564,"Samsung - 40"" Class - LCD - 1080p - 120Hz - HDTV",Samsung,\N,"Enjoy video games, movies and more with this S...",aeeba435-f102-4048-85e0-709740b4f924,1798402974723080198
...,...,...,...,...,...,...,...,...
306,"<img height=""100"" src=""../data/retrotech/image...",786936817218,Pirates of the Caribbean: On Stranger Tides - ...,\N,\N,\N,23194de2-d500-4874-aba1-6db47673ad93,1798402975721324549
307,"<img height=""100"" src=""../data/retrotech/image...",30206696622,Star Trek (Score) - Original Soundtrack - CD,Var&#xBF;se Sarabande (USA),\N,\N,da0dfdb1-64a7-4f29-bcde-aec7d2a779a8,1798402975861833759
308,"<img height=""100"" src=""../data/retrotech/image...",886971404722,Star Wars: The Corellian Edition (Snys) - CD,UNKNOWN,\N,\N,7d38f9fc-c967-409c-8afa-8296f57dc3f4,1798402974922309668
309,"<img height=""100"" src=""../data/retrotech/image...",97361301747,Star Trek: Fan Collectives - Fullscreen AC3 Do...,\N,\N,\N,f22c895d-9f74-4a72-ab4d-766e9513f61a,1798402975075401773


# Listing 11.7

Click models overcome position bias by learning an examine probability on each ranking. SDBN tracks examines relative to the the last click. This code marks last click position per session so we can compute examine probabilities.

In [79]:
#%load -s calculate_examine_probability ../ltr/sdbn_functions.py
def calculate_examine_probability(sessions):
    last_click_per_session = sessions.groupby(["clicked", "sess_id"])["rank"].max()[True]
    sessions["last_click_rank"] = last_click_per_session
    sessions["examined"] = sessions["rank"] <= sessions["last_click_rank"]
    return sessions

In [80]:
sessions = get_sessions("dryer")
calculate_examine_probability(sessions).loc[3]

Unnamed: 0_level_0,query,rank,doc_id,clicked,last_click_rank,examined
sess_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3,dryer,0.0,12505451713,False,9.0,True
3,dryer,1.0,84691226727,False,9.0,True
3,dryer,2.0,883049066905,False,9.0,True
3,dryer,3.0,48231011396,False,9.0,True
3,dryer,4.0,74108056764,False,9.0,True
3,dryer,5.0,77283045400,False,9.0,True
3,dryer,6.0,783722274422,False,9.0,True
3,dryer,7.0,665331101927,False,9.0,True
3,dryer,8.0,14381196320,True,9.0,True
3,dryer,9.0,74108096487,True,9.0,True


# Listing 11.8

Aggregate clicks and examine counts

In [81]:
#%load -s calculate_clicked_examined ../ltr/sdbn_functions.py
def calculate_clicked_examined(sessions):
    sessions = caclulate_examine_probability(sessions)
    return sessions[sessions["examined"]] \
        .groupby("doc_id")[["clicked", "examined"]].sum()

In [82]:
sessions = get_sessions("dryer")
calculate_clicked_examined(sessions)

Unnamed: 0_level_0,clicked,examined
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1
12505451713,355,2707
12505525766,268,974
12505527456,110,428
14381196320,217,1202
36172950027,97,971
36725561977,119,572
36725578241,130,477
48231011396,166,423
48231011402,213,818
74108007469,208,708


# Listing 11.9

We compute a grade - a probability of relevance - by dividing the clicks by examines. This is the kind of dynamic 'click thru rate' of SDBN, that accounts for whether the result was actually seen by users, not just whether it was shown on the screen.

In [83]:
#%load -s calculate_grade ../ltr/sdbn_functions.py
def calculate_grade(sessions):
    sessions = calculate_clicked_examined(sessions)
    sessions["grade"] = sessions["clicked"] / sessions["examined"]
    return sessions.sort_values("grade", ascending=False)

In [84]:
query = "dryer"
sessions = get_sessions(query)
calculate_grade(sessions)

Unnamed: 0_level_0,clicked,examined,grade
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
856751002097,133,323,0.411765
48231011396,166,423,0.392435
84691226727,804,2541,0.316411
74108007469,208,708,0.293785
12505525766,268,974,0.275154
36725578241,130,477,0.272537
48231011402,213,818,0.260391
12505527456,110,428,0.257009
74108096487,235,1097,0.214221
36725561977,119,572,0.208042


In [34]:
render_judged(products, calculate_grade(sessions), grade_col="grade", label=f"SDBN judgments for q={query}")

Unnamed: 0,grade,image,upc,name,shortDescription
0,0.411765,,856751002097,Practecol - Dryer Balls (2-Pack),"Suitable for use on most dry cycles; reduces lint, static and wrinkles; improves heat circulation; 2-pack"
1,0.392435,,48231011396,LG - 3.5 Cu. Ft. 7-Cycle High-Efficiency Washer - White,ENERGY STAR QualifiedDigital controls; 7 cycles; SpeedWash cycle; 9 wash options; delay-wash; SenseClean system; 6Motion technology; TrueBalance antivibration system
2,0.316411,,84691226727,GE - 6.0 Cu. Ft. 3-Cycle Electric Dryer - White,Rotary electromechanical controls; 3 cycles; 3 heat selections; DuraDrum interior; Quiet-By-Design
3,0.293785,,74108007469,Conair - 1875-Watt Folding Handle Hair Dryer - Blue,2 heat/speed settings; cool shot button; dual voltage; professional-length line cord
4,0.275154,,12505525766,Smart Choice - 6' 30 Amp 3-Prong Dryer Cord,Heavy-duty PVC insulation; strain relief safety clamp
5,0.272537,,36725578241,Samsung - 7.3 Cu. Ft. 7-Cycle Electric Dryer - White,Soft-touch dial controls; 7 preset drying cycles; 4 temperature settings; powdercoat drum; noise reduction package
6,0.260391,,48231011402,LG - 7.1 Cu. Ft. 7-Cycle Electric Dryer - White,Electronic controls with LED display; 7 cycles; Dial-A-Cycle option; sensor dry system; 5 temperature levels; 5 drying levels; NeveRust drum; LoDecibel quiet operation
7,0.257009,,12505527456,"Smart Choice - 1/2"" Safety+PLUS Stainless-Steel Gas Dryer Connector","Safety+PLUS automatic shut-off valve; leak detection solution; pipe thread sealant; 60,500 BTU; CSA approved"
8,0.214221,,74108096487,Conair - Infiniti Cord-Keeper Professional Tourmaline Ionic Hair Dryer - Fuchsia,Tourmaline ceramic technology; ionic technology; 1875 watts; Cool Shot function; 3 heat settings; 2 speed settings; 5' retractable cord; includes diffuser
9,0.208042,,36725561977,Samsung - 3.5 Cu. Ft. 6-Cycle High-Efficiency Washer - White,ENERGY STAR QualifiedSoft dial touch pad controls; 6 cycles; delay-start; child lock; Vibration Reduction Technology


# Figure 11.10 Source Code

In [35]:
sessions = get_sessions("transformers dark of the moon")
calculate_grade(sessions)

Unnamed: 0_level_0,clicked,examined,grade
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
97360810042,412,642,0.641745
400192926087,62,129,0.48062
97363560449,96,243,0.395062
97363532149,42,130,0.323077
93624956037,41,154,0.266234
47875842328,367,1531,0.239713
47875841420,217,960,0.226042
25192107191,176,1082,0.162662
786936817218,118,777,0.151866
36725235564,41,277,0.148014


Up next: [Dealing with Low Confidence Situations](3.SDBN-Confidence-Bias.ipynb)