# INFO 4271 - Exercise 3 - Indexing

Issued: April 30, 2024

Due: May 6, 2024

Please submit this filled sheet via Ilias by the due date.

---

# 1. Skip Pointers
Skip pointers can be used to accelerate posting list intersection, allowing pointers to be moved either to the next sequential list position or to the position of the skip pointer if one is available.

a) Implement the `intersect\_skip()` function sketched below. Each time you would ordinarily increment a pointer by one, you can alternatively follow a skip pointer, if one is available at the position.

In [1]:
#Intersect two sorted posting lists that contain skip pointers
def intersect_skip(A, B):
    pointerA, pointerB, res = 0, 0, []
    while pointerA < len(A) and pointerB < len(B):
        print("pointerA:", pointerA)
        print("pointerB:", pointerB)
        print("A[pointerA]:", A[pointerA])
        print("B[pointerB]:", B[pointerB])
        if A[pointerA][0] == B[pointerB][0]:
            res.append(A[pointerA][0])
            pointerA += 1
            pointerB += 1
        elif A[pointerA][0] < B[pointerB][0]:
            if A[pointerA][2] is not None and A[pointerA][2] <= B[pointerB][0]:
                pointerA = A[pointerA][1]
            else:
                pointerA += 1  
        else:
            if B[pointerB][2] is not None and B[pointerB][2] <= A[pointerA][0]:
                pointerB = B[pointerB][1]
            else:
                pointerB += 1  
    return res
            
#Posting lists with skip pointers. 
#Entries take the form [docID, index to skip to, docID at that index]
times_skip = [[2, 3, 16], [12, None, None], [15, None, None], [16, 6, 27], [17, None, None], [23, None, None], [27, None, None]]
square_skip = [[3, 2, 12], [8, None, None], [12, 4, 23], [19, None, None], [23, None, None]]

print(intersect_skip(times_skip, square_skip))

pointerA: 0
pointerB: 0
A[pointerA]: [2, 3, 16]
B[pointerB]: [3, 2, 12]
pointerA: 1
pointerB: 0
A[pointerA]: [12, None, None]
B[pointerB]: [3, 2, 12]
pointerA: 1
pointerB: 2
A[pointerA]: [12, None, None]
B[pointerB]: [12, 4, 23]
pointerA: 2
pointerB: 3
A[pointerA]: [15, None, None]
B[pointerB]: [19, None, None]
pointerA: 3
pointerB: 3
A[pointerA]: [16, 6, 27]
B[pointerB]: [19, None, None]
pointerA: 4
pointerB: 3
A[pointerA]: [17, None, None]
B[pointerB]: [19, None, None]
pointerA: 5
pointerB: 3
A[pointerA]: [23, None, None]
B[pointerB]: [19, None, None]
pointerA: 5
pointerB: 4
A[pointerA]: [23, None, None]
B[pointerB]: [23, None, None]
[12, 23]


b) How many pointer increment operations did you need to intersect the two posting lists with the given skip pointers? How many operations would it have been for the same lists without skip pointers?

# 2. Positional Indices
Positional indices include for each posting the exact positions at which the term can be found in the document. This information allows us to satisfy two previously impossible types of queries. 1) Phrase queries require terms to occur adjacently to one another in a specific order. 2) Range queries allow for more leeway between term positions, merely requiring the two
 terms to appear within a specified number of tokens.

Implement the `intersect\_range()` function sketched in the code base. Each time you would ordinarily have reported a match, you will now need to check whether the range requirement is satisfied. As an optional addition, think about making this range check efficient using some of the techniques discussed for general posting list intersection.

In [19]:
#Intersect two sorted posting lists with document-internal proximity requirements.
def intersect_range(A, B, range):
    pointerA, pointerB, res = 0, 0, []
    while pointerA < len(A) and pointerB < len(B):
        print("A[pointerA]:", A[pointerA])
        print("B[pointerB]:", B[pointerB])
        if A[pointerA][0] == B[pointerB][0]:
            posA, posB = A[pointerA][1], B[pointerB][1]
            for i in posA:
                for j in posB:
                    if abs(i - j) <= range:
                        res.append(A[pointerA][0])
                        break
            pointerA += 1
            pointerB += 1
        elif A[pointerA][0] < B[pointerB][0]:
            pointerA += 1
        else:
            pointerB += 1
    return res

#Posting lists with document-internal positional information.           
def intersect_range_efficient(A, B, range): 
    pointerA, pointerB, res = 0, 0, []
    while pointerA < len(A) and pointerB < len(B):
        if A[pointerA][0] == B[pointerB][0]:
            posA, posB = A[pointerA][1], B[pointerB][1]
            i, j = 0, 0
            while i < len(posA) and j < len(posB):
                if abs(posA[i] - posB[j]) <= range:
                    res.append(A[pointerA][0])
                    # break while loop
                    break
                elif posA[i] < posB[j]:
                    i += 1
                else:
                    j += 1
            pointerA += 1
            pointerB += 1
        elif A[pointerA][0] < B[pointerB][0]:
            pointerA += 1
        else:
            pointerB += 1
    return res

    
#Posting lists with document-internal positional information.
#Entries take the form [docID, [pos1, pos2, ...]]
times_range = [[2, [15, 128]], [12, [6, 45, 89, 942]], [15, [13]], [16, [1276, 1500]], [17, [13, 89, 90]], [23, [17, 64]], [27, [456, 629]]]
square_range = [[3, [65, 90]], [8, [67, 94]], [12, [3]], [19, [18, 81, 1881]], [23, [63]]]

print(intersect_range_efficient(times_range, square_range,1))

[12, 23]


# 3. Paper Pick
Don't forget to submit your paper pick at https://forms.gle/SFYUKxiMXZKbs5XCA.