# Chapter 8: Capacity of Neural Networks

## 4. Upper Bounds

### (a) Do Exercise 40.8 in MacKay’s book (MacKay 2003). It is cited here as follows:
    
    `Estimate in bits the total sensory experience that you have had in your life – visual information, auditory information, etc. Estimate how much information you have memorized. Estimate the information content of the works of Shakespeare.
    Compare these with the capacity of your brain assuming you have 10^11 neurons each making 1000 synaptic connections and that the (information) capacity result for one neuron (two bits per connection) applies. Is your brain full yet?`

Note that MacKay is right to suggest using information capacity for this estimate as image and acoustic data are relatively high dimensional and he also suggests 1000 connections per neuron.

#### Ans. 

We first estimate the total sensory experience.
Information transmission rates of the senses are as follows.
\begin{array}{c} \hline
\text{sensory system} & \text{bits per second} \\ \hline
eyes & 10,000,000 \\
skin & 1,000,000 \\
ears & 100,000 \\
smell & 100,000 \\
taste & 1,000 \\ \hline
\end{array}

I have been living for about 23 years, and assume that I on average wake up 16 hours a day. Hence, total sensory experience is $$(10 + 1 + 0.1 + 0.1 + 0.001)Mbps\times 60seconds\times 60minutes \times 16hours\times 365 days\times 23 years=5.42\times 10^{15} bits (2.s.f).$$

<br>
We then estimate how much information is memorized. The intelligent information processing rate is about 50 bits per second. Hence, total information prcessed is about $$50bits\times 60minutes \times 16hours\times 365 days\times 23 years=2.42\times 10^{10} bits (2.s.f).$$
If we assume that information must first be processed before being memorized, this is a upper bound on the information memorized, ignoring the consistant forgetting of information.
</br>

<br>
We then estimate the information content of the works of Shakespeare. According to Marvin Spevack's concordances, Shakespeare's complete works consist of 884,647 words. Since the average English word length is 4.7 and 5 bits per character, thus the information content is $$884,647\times4.7\times5=2.08\times 10^{7}bits.$$
</br>

Assume my brain have 1011 neurons each making 1000 synaptic connections and that the (information) capacity result for one neuron (two bits per connection) applies, my brain capacity is $$10^{11}\times1000\times 2=2\times10^{14}bits.$$
According to this capacity, my brain is not full with either the information memorized or all works of Shakespeare, or them combined. This is expected as I can still think, process and memorize.

### (b) Expand Algorithm 8 to work with more than one binary classification.

#### Ans. 


```
Require: data: array of length n contains d-dimensional vectors x, labels: a column of k possible integers with length n
    function memorize((data, labels))
        thresholds ← 0
        for all rows do
            table[row] ← (sum(x[i][d]), label[row])
            sortedtable ← sort (table, key = column 0)
        end for
        class ← sortedtable[0][1]
        for all rows do
            if not sortedtable[row][1] == class then
                class ← sortedtable[i][1]
                thresholds ← thresholds + 1
            end if
        end for
        minthreshs ← log2(thresholds + 1)
        mec ← (minthreshs ∗ (d + 1)) + minthreshs * k
    end function: mec
```

In [28]:
import math
import numpy as np

def memorize(data, labels):
    thresholds = 0
    table = []
    for i in range(len(data)):
        table.append((sum(data[i]), labels[i]))
        
    sorted_table = sorted(table, key=lambda x: x[0])
    class_label = sorted_table[0][1]
    
    for i in range(1, len(sorted_table)):
        if sorted_table[i][1] != class_label:
            class_label = sorted_table[i][1]
            thresholds += 1
            
    min_threshold = math.log2(thresholds + 1)
    mec = min_threshold * (len(data[0]) + 1) + min_threshold
    
    return mec

n = 20
d = 4
X = np.random.randint(2, size=(n, d))
y = np.random.randint(2, size=n)
mec = memorize(X, y)
mec

22.202638308846552

In [29]:
import math
import numpy as np

def memorize_multiclass(data, labels):
    thresholds = 0
    table = []
    for i in range(len(data)):
        table.append((sum(data[i]), labels[i]))
        
    sorted_table = sorted(table, key=lambda x: x[0])
    class_label = sorted_table[0][1]
    
    for i in range(1, len(sorted_table)):
        if sorted_table[i][1] != class_label:
            class_label = sorted_table[i][1]
            thresholds += 1
            
    min_threshold = math.log2(thresholds + 1)
    mec = min_threshold * (len(data[0]) + 1) + min_threshold * len(set(labels))
    
    return mec

n = 20
d = 4
X = np.random.randint(2, size=(n, d))
y = np.random.randint(3, size=n)
mec = memorize_multiclass(X, y)
mec

27.67545294909838

### (c) Expand Algorithm 8 to work with regression.

#### Ans.
```
Require: data: array of length n contains d-dimensional vectors x, labels: a column of continuous numbers of length n
    function memorize((data, labels, b))
        thresholds ← 0
        for all rows do
            table[row] ← (sum(x[i][d]), label[row])
            sortedtable ← sort (table, key = column 0)
        end for
        class ← sortedtable[0][1]
        for all rows do
            if not abs(sortedtable[row][1] - class) < b then
                class ← sortedtable[i][1]
                thresholds ← thresholds + 1
            end if
        end for
        minthreshs ← log2(thresholds + 1)
        mec ← (minthreshs ∗ (d + 1)) + minthreshs
    end function: mec
```

In [30]:
import math
import numpy as np

def memorize_regression(data, labels, b=0.01):
    thresholds = 0
    table = []
    for i in range(len(data)):
        table.append((sum(data[i]), labels[i]))
        
    sorted_table = sorted(table, key=lambda x: x[0])
    class_label = sorted_table[0][1]
    
    for i in range(1, len(sorted_table)):
        if not abs(sorted_table[i][1] - class_label) < b:
            class_label = sorted_table[i][1]
            thresholds += 1
            
    min_threshold = math.log2(thresholds + 1)
    mec = min_threshold * (len(data[0]) + 1) + min_threshold
    
    return mec

n = 20
d = 4
X = np.random.randint(2, size=(n, d))
y = np.random.rand(n)
mec = memorize_regression(X, y)
mec

25.931568569324174