# Message integrity

### Message Authentication Codes

The way to go is MAC (Message Authentication Code).

```raw
the MAC is 2 algorithms: S and V
S(key, message) => tag
V(key, message, tag) => true | false
V(k, m, S(k, m)) = true

the way it works is the following:
- Alice sends message m to Bob
- but Alice wants to make sure that Bob got the exact message that she send
- for this she append tag to the message created via S
- Bob receive the message and the tag and verify it via V
- they both share same secret key k
```

##### The main attack on MAC
- attacker does have chosen message attack, which means for set of messages he is given corresponding tokens t_i <- S(k, m_i)
- attacker's goal is to produce new valid message/token pair (m, t) so that t = S(k, m) and this pair isn't from chosen message attack

So MAC is breaked if attacker can provide valid pair of (m, t) where m is attacker's message

### PRF 2 MAC
PRF can be used as S for MAC:
```raw
S(k, m) = PRF(k, m)
V(k, m, t) = PRF(k, m) == t
```

And for a PRF the security of a MAC holds, but only if output space of this PRF is large because tag length must be big (64, 96, 128 and etc bytes).\
The security of a tag is equal to 1 / |tag|. If tag length = 32, then probability of guessing the tag is 1 / 32 which is very large\
For that PRF output space must be large


AES can be used for MAC:\
- AES takes 16 bytes m
- produce 16 byte t
- by using 128 bit key

But this is small MAC (it takes m as 16 bytes).

##### How to get BIG MAC from SMALL MAC?

There are 2 main solutions to this problem:
- CBC-MAC (banking)
- HMAC (Internet, SSL, SSH)

In [3]:
def CBC_MAC(F, k0, k1, ms):
    x = 0
    for m in ms:
        x = F(k0, m ^ x)
    return F(k1, x)


As show above, CBC_MAC is just a CBC mode of operation but with extra F(k1) at the end.

There is also NMAC (Nested MAC):

In [4]:
def NMAC_PAD(t):
    return t
# F : K -> X -> K (output is in KEY space)
def NMAC(F, k0, k1, ms):
    k = k0
    for m in ms:
        k = F(k, m)
    # t is from K and in most use cases |X| > |K| 
    # so we need padding from K to X
    t = NMAC_PAD(k)
    return F(k1, t)


There is also a parallelizable MAC -- PMAC:

In [5]:
def PMAC(F, P, k, k0, ms):
    # can be done in parallel
    xs = [P(k, i) ^ m if i == (len(ms) - 1) else F(k0, P(k, i) ^ m) for i, m in enumerate(ms)]
    xor_acc = 0
    for x in xs:
        xor_acc ^= x
    return F(k0, xor_acc)

PMAC is also incremetal, which means if you just change ms[i] you don't need to recompute whole PMAC for ms, you can just:
```raw
inv_F(k0, tag) ^ F(k0, ms[i].old) ^ F(k0, ms[i].new)
```

There is also One Time MAC (analog for One Time Pad) where:
- key should be used only once
- it has perfect secrecy in contexts of MACs
- its easy to compute (just uses polynomial)

In [33]:
from math import *
import random

def pack2long(s, N=64):
    assert(N % 8 == 0)
    bytes = [ord(c) for c in s[:N//8]]
    l = 0
    for i, b in enumerate(bytes):
        l = l | (b << i * 8)
    return l

def unpack2str(l, N = 8):
    s = []
    for i in range(N):
        s.append(chr((l >> i * 8) & 0xFF))
    return ''.join(s)

# len(bin(q)) ~ len(bin(m)) OR
# q should be near the block size (m) AND 
# also q is prime
def ONE_TIME_MAC_S(key, m, q):
    k, r = key
    ms = [pack2long(m[ib * 8:(ib + 1) * 8]) for ib in range(ceil(len(m) / 8))]
    mac = r
    power = k
    for m in ms:
        mac = (mac + (m * power) % q) % q
        power = (power * k) % q
    return mac


q = 2 ** 64 - 59 # our block size is 64
key = (random.randint(0, q - 1), random.randint(0, q - 1))
ONE_TIME_MAC_S(key, 'hello world from Russia, Omsk cit', q)

898815276966951074

### Hash Functions

When you think about attacks on MACs, then there is one major attack with CPT:
- attacker can find many of unique (m, t) pairs
- but even so, if MAC is secure, then attacker cannot produce new unique pair (m, t)
- assume we have 2 almost identical messages m0 and m1
- attacker know the pair (m0, t0) and want to produce a valid (m1, t1)
- MAC cannot output "near" tags for "near" messages! So |t0 - t1| should be big

Q: is Hash Function basically a PRF? Not a PRP!

##### Hash Function definition
```raw
H : M -> T where |M| >> |T|

Hash collision is (m0, m1) from M such that:
H(m0) = H(m1) and m0 != m1

H is collision resistant if probability of finding a collision for known m is very low

But one of the major factors for collision resistant is the size of the output space, if output space is {0,1}^8 then finding a collision for known m is >= 1/256 which is big by itself

For good H there is no known "eff" algorithm that can find a collision for known m
Or this algorithm not better then just brute force

But the overall probability of collision can't be very low because |M| >> |T|
```

##### MACs from Hash funcitons
We still got a problem of getting big MAC from small MAC. One of the solution is PRF based MACs. But using Hash functions small MAC can be converted into big MAC:
```raw
SM_I, SM_V = SMALL MAC (for example AES128)
SM_I :: K -> M -> T

H :: M_big -> M

BM_I :: K -> M_big -> T
BM_I key, mbig = SM_I(key, H(mbig))
```

If H NOT collision resistant, then BM_I is NOT a secure MAC, because attacker:
- can get t for known m
- find a collision for this m = m1. H(m) = H(m1) => BM_I(m) = BM_I(m1)
- so (m1, t) is valid tag for m1 and a valid pair!

##### Attack on Hash Functions
The main attack on Hash functions is using birthday paradox (take a look at bday-paradox.py).

If Hash function output space if n bits, then after 1.2 * sqrt(2^n) random hashes there is more then 1/2 probability of a collision.

### Constructing Hash Functions
The main costruction is called Merkle-Damgård Construction and its the way to get a big HASH from small HASH function, or:
```raw
h :: T -> Xs -> T
H :: Xb -> T
H = MDC(h)
```

This construction is guarantee that if h is collision resistant, then whole H is collision resistant. In other words, if H got a collision, then:
- either h got a collision
- or identical messages were used

There is a big proof of that which requires proof by contradiction and etc, but I'll skip it for now

##### How to construct h (compression function)?
The compression functions are mainly constructed from block ciphers using some new constructions, for example:
1. h(H, m) = E(m, H) ^ H
2. h(H, m) = E(m, H) ^ H ^ m
3. h(H, m) = E(H ^ m, H) ^ m
and etc. Some naive constructions such as `h(H,m) + E(m, H)` are not collision resistant

##### HMAC
HMAC is a way to construct a MAC from Hash function