In [1]:
import numpy as np
import math
import pandas as pd
from hdmm import workload, fairtemplates, error, fairmechanism, matrix, mechanism, templates

# Understanding vectorized databases
This section is to help me better understand and visualize linear queries and query workloads as vectors and matrices. I'll be using the example of a toy dataset and a few queries in Figure 2.1 from the [Li, et. al 2014 paper](https://people.cs.umass.edu/~mcgregor/papers/15-vldbj.pdf) to familiarize myself with working with these objects using numpy.

Load in the toy dataset from the paper's diagram about students' information as `toy`:

In [2]:
toy = pd.read_csv('toydataset.csv')
toy

Unnamed: 0,Name,Gradyear,Gender,Gpa
0,Alice,2012,F,3.8
1,Bob,2011,M,3.1
2,Charlie,2014,M,3.6
3,Dave,2014,M,3.3
4,Evelyn,2013,F,3.9
5,Frank,2011,M,3.2
6,Gary,2015,M,3.5


You can represent the above `toy` database as a vector __x__, where $x_i$ represents the counts of $\phi_i$ for i from 1 through 8. There are 8 spots because there are 4 levels for Gradyear and 2 levels for Gender; 4 * 2 = 8. 

Let R = (name, gradyear, gender, gpa) be the schema for the database. Each $\phi_i$ represents the schema for all M or F in a specific year. For instance, $\phi_1$ represents all the males in 2011, i.e. R(\*, 2011, M, \*), and $\phi_2$ represents all the women in 2011, i.e. R(*, 2011, F, *). 

__x__ is the vector representation of the above database:

In [3]:
x_toy = np.array([2, 0, 0, 1, 0, 1, 2, 0])

Each index has the counts for each schema corresponding to it (i.e., the year and gender pairing). 

Here is a workload of 5 queries, where $w_i$ represent different queries, such as: 

- $w_1$: Students of any gender with gradyear ∈ [2011, 2014]
- $w_2$: Students with gradyear ∈ [2011, 2012]
- $w_3$: Female students with gradyear ∈ [2011, 2012]
- $w_4$: Male students with gradyear ∈ [2011, 2012]
- $w_5$: Difference between 2013 grads and 2014 grads

Combine these queries into a workload matrix called `W_toy`:

In [4]:
W_toy = np.array([[1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 0, 0, 0, 0],
         [0, 1, 0, 1, 0, 0, 0, 0],
         [1, 0, 1, 0, 0, 0, 0, 0], 
         [0, 0, 0, 0, 1, 1, -1, -1]])

Now, you can evaluate each query $w_i$ by dotting it with __x__, or by multiplying the workload matrix by __x__:

In [5]:
np.dot(W_toy[0], x_toy) # evaluating the first query in the workload

6

In [6]:
np.matmul(W_toy, x_toy) # evaluating the entire workload with x

array([ 6,  3,  1,  2, -1])

# Private Multiplicative Weights

In this section, I'll attempt to implement the private multiplicative weights algorithm from the [Hardt, et al 2014 paper](https://guyrothblum.files.wordpress.com/2014/11/hr10.pdf), and I will apply it to four examples. 

In [7]:
def pmw(W, x, eps = 0.1, beta = 0.1):
    '''
    Implement Private Multiplicative Weights Mechanism (PMW) on a workload of linear queries.
    
    - W = workload of queries (M x k numpy array)
    - x = true database (M x 1 numpy array)
    '''
    print(f'original database: {x}')
    print(f'workload: \n{W}')
    
    M = x.size # len of database
    n = x.sum() # sum of database
    k = len(W) # num of queries
    delta = 1/(n*math.log(n, 2))
    
    x_norm = x / np.sum(x)
    #print(f'x_norm: {x_norm}')
    eta = math.log(M, 2)**(1/4)/math.sqrt(n)
    sigma = 10 * math.log(1/delta, 2) * (math.log(M, 2))**(1/4)/(math.sqrt(n)*eps) 
    T = 4 * sigma * (math.log(k, 2) + math.log(1/beta, 2)) # threshold
    
    # initialize synthetic database at time 0 (prior to any queries)
    y_t = np.ones(M)/M
    x_t = np.ones(M)/M # fractional histogram computed in round t
    
    # append to list of databases y_t and x_t
    y_list = [y_t]
    x_list = [x_t]
    
    update_count = 0
    query_answers = []
    d_t_hat_list = []
    
    # iterate through time = [1, k] 
    for t in range(1, k + 1): 
        
        # compute noisy answer by adding Laplacian noise 
        A_t = np.random.laplace(0, sigma, 1)[0]  
        a_t_hat = np.dot(W[t - 1], x) + A_t
        #print(f'a_t_hat: {a_t_hat}')
        
        # compute difference between noisy answer and answer from maintained histogram
        d_t_hat = a_t_hat - np.dot(W[t - 1], x_list[t - 1]) 
        
        # lazy round: use already maintained histogram to answer the query
        if (abs(d_t_hat) <= T): 
            query_answers.append(np.dot(W[t - 1], x_list[t - 1]))
            x_list.append(x_list[t - 1])
            continue
            
        # update round: update the histogram and return the noisy answer, abs(d_t_hat) > T
        else: 
            update_count+=1
            #step a
            r_t = np.zeros(M)
            if d_t_hat > 0:
                r_t = W[t - 1]
            else: 
                r_t = np.ones(M) - W[t - 1]
            for i in range(len(x_t)):
                y_t[i] = x_list[t - 1][i] * math.exp(-eta * W[t - 1][i])
            y_list.append(y_t)
            
            #step b
            x_t = y_t / np.sum(y_t)
            x_list.append(x_t)
            
        #print(f'the algo updated {update_count} times, the update threshold for failure was n * math.log(M)**(1/2): {n * math.log(M)**(1/2)}. n is {n}, and M is {M}')
        print(f'the algo updated {update_count} times, the update threshold for failure was 1 * math.log(M)**(1/2): {1 * math.log(M)**(1/2)}. n is {1}, and M is {M}')
        
        
        if update_count > 1 * math.log(M, 2)**(1/2): #n * math.log(M, 2)**(1/2):
            return "failure"
        else: 
            query_answers.append(a_t_hat / np.sum(x))
    
    # print(f'T (Threshold) = {T}')    
    print(f'query_answers (using pmw): {query_answers}')
    print(f'Update Count = {update_count}')  
    
    return query_answers

# 5/30 - Trying different ektelo workloads

### Ex. 1. Toy Workload from Li, et. al 2014 paper

In [8]:
pmw(W_toy, x_toy) # run pmw using the toy datasets/queries from above

original database: [2 0 0 1 0 1 2 0]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
query_answers (using pmw): [1.0, 0.5, 0.25, 0.25, 0.0]
Update Count = 0


[1.0, 0.5, 0.25, 0.25, 0.0]

### Ex. 2. Identity Total Workload

In [9]:
W1 = workload.IdentityTotal(8).dense_matrix()
pmw(W1, x_toy)

original database: [2 0 0 1 0 1 2 0]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]]
query_answers (using pmw): [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 1.0]
Update Count = 0


[0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 1.0]

### Ex. 3. All Range Workload

In [10]:
W2 = workload.AllRange(8).dense_matrix()
pmw(W2, x_toy)

original database: [2 0 0 1 0 1 2 0]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 1. 1. 1. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1. 1.]
 [0. 0.

[0.125,
 0.25,
 0.375,
 0.5,
 0.625,
 0.75,
 0.875,
 1.0,
 0.125,
 0.25,
 0.375,
 0.5,
 0.625,
 0.75,
 0.875,
 0.125,
 0.25,
 0.375,
 0.5,
 0.625,
 0.75,
 0.125,
 0.25,
 0.375,
 0.5,
 0.625,
 0.125,
 0.25,
 0.375,
 0.5,
 0.125,
 0.25,
 0.375,
 0.125,
 0.25,
 0.125]

### Ex. 4. H2 Workload

In [11]:
W3 = workload.H2(8).dense_matrix()
pmw(W3, x_toy)

original database: [2 0 0 1 0 1 2 0]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 1.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]]
query_answers (using pmw): [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.25, 0.25, 0.25, 0.25, 0.5, 0.5, 1.0]
Update Count = 0


[0.125,
 0.125,
 0.125,
 0.125,
 0.125,
 0.125,
 0.125,
 0.125,
 0.25,
 0.25,
 0.25,
 0.25,
 0.5,
 0.5,
 1.0]

## Ex. 5. Males in one year only

In [12]:
x_ex_5 = np.array([1000, 1210, 0, 1250, 0, 1450, 0, 1720])
pmw(W_toy, x_ex_5)

original database: [1000 1210    0 1250    0 1450    0 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

# 6/22

## 1. Variation in database size

In [13]:
x_very_small = np.array([10, 12, 13, 12, 15, 14, 17, 17])
pmw(W_toy, x_very_small) 

original database: [10 12 13 12 15 14 17 17]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
query_answers (using pmw): [1.0, 0.5, 0.25, 0.25, 0.0]
Update Count = 0


[1.0, 0.5, 0.25, 0.25, 0.0]

In [14]:
x_small = np.array([100, 121, 130, 125, 150, 145, 170, 172])
pmw(W_toy, x_small) 

original database: [100 121 130 125 150 145 170 172]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
query_answers (using pmw): [1.0, 0.5, 0.25, 0.25, 0.0]
Update Count = 0


[1.0, 0.5, 0.25, 0.25, 0.0]

In [15]:
x_large = np.array([1000, 1210, 1300, 1250, 1500, 1450, 1700, 1720])
pmw(W_toy, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [16]:
x_really_large = np.array([10000, 12100, 13000, 12500, 15000, 14500, 17000, 17200])
pmw(W_toy, x_really_large) 

original database: [10000 12100 13000 12500 15000 14500 17000 17200]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [17]:
x_really_large_variant = np.array([5000, 6100, 13000, 9500, 8100, 9000, 12000, 14000])
pmw(W_toy, x_really_large_variant) 

original database: [ 5000  6100 13000  9500  8100  9000 12000 14000]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

Conclusion: Adding variation didn't really change much, x still isn't updating.

## 2. Variation in epsilon

In [18]:
pmw(W_toy, x_large, eps=10000) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [19]:
pmw(W_toy, x_large, eps=12.2) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [20]:
pmw(W_toy, x_large, eps=4.5) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [21]:
pmw(W_toy, x_large, eps=0.1) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
the algo updated 2 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8


'failure'

In [22]:
pmw(W_toy, x_large, eps=0.01) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[ 1  1  1  1  1  1  1  1]
 [ 1  1  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0  0]
 [ 1  0  1  0  0  0  0  0]
 [ 0  0  0  0  1  1 -1 -1]]
the algo updated 1 times, the update threshold for failure was 1 * math.log(M)**(1/2): 1.442026886600883. n is 1, and M is 8
query_answers (using pmw): [1.0033444344420555, 0.5, 0.25, 0.25, 0.0]
Update Count = 1


[1.0033444344420555, 0.5, 0.25, 0.25, 0.0]

## 2. Sparse-Dense

In [23]:
W_identity = workload.Identity(8).dense_matrix()
W_allrange = workload.AllRange(8).dense_matrix()
W_total = workload.Total(8).dense_matrix()

### Sparse -> Dense

In [24]:
W_sd = np.vstack((W_identity, W_allrange))
pmw(W_sd, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0.

'failure'

### Dense -> Sparse

In [25]:
W_ds = np.vstack((W_allrange, W_identity))
pmw(W_ds, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 1. 1. 1. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0.

'failure'

### Sparse -> Dense -> Sparse

In [26]:
W_sds = np.vstack((W_identity, W_allrange, W_identity))
pmw(W_sds, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0.

'failure'

### Dense -> Sparse -> Dense

In [27]:
W_dsd = np.vstack((W_allrange, W_identity, W_allrange))
pmw(W_dsd, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 1. 1. 1. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0.

'failure'

### Randomized

In [28]:
W_random = np.random.permutation(W_sd)
pmw(W_random, x_large) 

original database: [1000 1210 1300 1250 1500 1450 1700 1720]
workload: 
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 0.]
 [0. 0. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 1. 1. 1. 1. 1.]
 [0. 0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 1. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 1.]
 [0. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0.]
 [0. 0. 0.

'failure'

## 4. Try Non-Smooth Databases

## 5. Assess effects on Analysts

- fix update count to test extreme cases - is n 1?
- do i begin graphing things?
- can we talk about smooth vs. non smooth databases