## Bit Manipulation Tricks

- We know that all integers are represented as a series of bits in binary 

- There are some useful tricks that you can do with this bit representation, and this will serve as an interesting record of them

### Right shift == Floor division by 2

- Right shifting an integer is equivalent to taking a floor division by 2 for any integer

In [5]:
val = 123456789
(val >> 1) == (val // 2)

True

### Hamming Weight

- The hamming weight of an integer is the number of 1s in its binary representation
- One interesting way to compute the hamming weight is to use this trick 
    - Take bitwise `&` of a number `n`'s binary representation with `n-1` until it reaches 0
    - The number of bitwise `&`s you can do is equal to the number of 1s in the binary string

- Let `n=10110`
    - `n=10110` & `n-1=10011` = `10010`
    - `n=10010` & `n-1=10001` = `10000`
    - `n=10000` & `n-1=01111` = `00000`
    - Hence 3 1s


In [8]:
def hamming_weight(n):
    count=0
    while n != 0:
        n &= n-1
        count += 1
    return count

print(hamming_weight(12345), bin(12345)[2:])

6 11000000111001


### XOR Tricks

- XOR is symbolically represented by $\oplus$

- Some XOR rules
    - $x \oplus 0 = x$
    - $x \oplus x = 0$
    - XOR is commutative
        - $x \oplus y \oplus x = x \oplus x \oplus y = 0 \oplus y = y$

- Because of these properties, XOR actually has a few very interesting uses

#### In-place Swapping

- Imagine I have values $x,y$
- I want to swap values $x,y$, **without** using a temporary variable
    - That is, I don't want to do this: `_tmp = x`, `x = y`, `y = _tmp`

- But this can be done quite simply with **XOR**
$$\begin{aligned}
    x2 &= x \oplus y \\ \\
    
    y2 &= y \oplus x2 \\
    &= y \oplus x \oplus y \\
    &= x \\ \\

    x2 &= x2 \oplus y2 \\
    &= x \oplus y \oplus x \\
    &= y
\end{aligned}$$

- For clarity, we use x2 and y2 to denote new variables, to demonstrate how the values actually change
    - But notice that this swapping will still work even if we use `x` and `y` directly!
    - Because we don't use $x$ and $y$ again after redefining them 

In [1]:
x = 1293192
y = 5068432

x ^= y
y ^= x
x ^= y

(y == 1293192) and (x == 5068432)

True

#### Find Missing Object 

- Problem: You are given an array $A$ of $N$ objects. You are given a second array $A'$ of $N-1$ objects that is almost identical to $A$, except with 1 object remove. Find the missing object

- This is another example of just applying the **XOR** rules

- Let's XOR all objects in $A$, then XOR every object in $A'$

- By the XOR rules, everything will cancel out except for 1 object, which will be our missing object

In [8]:
import random

N = 100
A = list(range(N))
delete_index = random.randint(0, 99)
del A[delete_index]

ans = 0
for val in list(range(N)):
    ans ^= val

for val in A:
    ans ^= val

print(ans, delete_index)

66 66


- This applies to more than numbers; you can XOR anything that can be represented in binary.

- Here, I show an example with strings
    - To convert a string to integer representation: 
        - Use MD5 hash to produce a 128bit binary value
        - Represent this as an integer (use `int` to convert base2 to base10), because the built-in XOR operation only works with integers
    - Create a hash map that tags integers to the strings. Eventually, the XOR returns the integer representation of the missing string, but you can't "reverse" the MD5 hash, so you need to look up the map from the integer to the string
    - XOR every value in the 2 sets together, and find the integer value of the missing string
    - Look up the missing string from the hashmap

In [57]:
import hashlib

def convert_str_to_binary(inputstr: str):
    md5hash = hashlib.md5(inputstr.encode()).digest()
    binarystr_128bit = ''.join([format(c, '08b') for c in md5hash])
    return int(binarystr_128bit, 2)

def find_missing_str(set1: list, set2: list):
    lookup: dict[int, str] = {
        convert_str_to_binary(x): x 
        for x in set1
    }

    ans = 0
    for string in set1+set2:
        ans ^= convert_str_to_binary(string)

    return lookup.get(ans)

set1 = ['this', 'is', 'a', 'set', 'of', 'strings', 'with', 'a', 'missing', 'value']
set2 = [x for x in set1 if x != 'of']
find_missing_str(set1, set2)

'of'

#### Finding Duplicate Number

- I have an array `A` of size `N+1` that contains all numbers between 1 and $N$, with 1 duplicated number. How do I tell which number is duplicated?

- Do the exact same thing as the above. Then all numbers cancel out, except the duplicated one (call it `M`) which will be XOR-ed by itself 3 times, leaving `M`

#### Finding Two Missing/Duplicate Numbers

- Problem: You are given an array $A$ of $N$ objects. You are given a second array $A'$ of $N-2$ objects that is almost identical to $A$, except with 2 objects removed. Find the 2 missing objects

- Let's imagine using the same algorithm, where we **XOR** $A$ and $A'$

- All elements cancel out, except for the 2 missing ones. Let's call this $u$ and $v$

- But we don't end up with separate elements! We end up with $u \oplus v$ How can we decompose this?

- Let's examine $u \oplus v$ more closely
    - We know that, at each bit, if the bit value of $u$ matches $v$, $u \oplus v$ must be 0
    - So if it is 0 throughout, then $u == v$
    - Otherwise, there must be a bit value at some index $i$ that is 1, where $u$ and $v$ differ

- Based on this, let's break the arrays $A$ and $A'$ into 2 groups
    - Group 1 comprises of all the values in $A$ and $A'$ whose binary representations at index $i$ is 0
    - Group 2 comprises of all the values in $A$ and $A'$ whose binary representations at index $i$ is 1

- We know for a fact that `u` and `v` MUST be in different groups, because we split the groups based on the position $i$ where $u$ and $v$ differ!

- So if I run the same **XOR** algorithm within groups 1 and 2, I am guaranteed to get $u$ and $v$!

In [103]:
import random
import math

N = 100
max_binary_length = math.ceil(math.log2(N))

def int_to_bin(integer: int, max_binary_length: int) -> str:
    return format(integer, f'0{max_binary_length}b')

complete = list(range(N))
missing = list(range(N))
delete_index, delete_index2 = random.sample(population=complete, k=2)
missing = [x for x in missing if x not in [delete_index, delete_index2]]

ans = 0
for val in complete+missing:
    ans ^= val

u_xor_v = ans
print(u_xor_v, int_to_bin(u_xor_v, max_binary_length))

partition_index = 0
for i, x in enumerate(int_to_bin(u_xor_v, max_binary_length)):
    if x != '1':
        continue
    partition_index = i
    break

print(i)

group1_complete = [x for x in complete if int_to_bin(x, max_binary_length)[partition_index] == '1']
group1_missing = [x for x in missing if int_to_bin(x, max_binary_length)[partition_index] == '1']
group2_complete = [x for x in complete if int_to_bin(x, max_binary_length)[partition_index] == '0']
group2_missing = [x for x in missing if int_to_bin(x, max_binary_length)[partition_index] == '0']

u = 0
for val in group1_complete+group1_missing:
    u ^= val

v = 0
for val in group2_complete+group2_missing:
    v ^= val

print(delete_index, delete_index2)
print(u, v)

102 1100110
0
60 90
90 60


#### Finding N missing/duplicate numbers

- Lol this no longer works. Find another way fool