#Ans 1:

$f_i(x) = \frac{\lambda}{2N}x^Tx + \frac{1}{2}||A_ix - y_i||_2^2$

where $A_i$ denotes the ith row in the data matrix A and $y_i$ denotes the ith element in vector y.

#Ans 2:

$g_i(x) = \frac{\lambda}{N}x_i + (A_ix - y_i)A_i$

In [158]:
import numpy as np
import timeit
np.random.seed(1000)

In [159]:
N = 200
d = 10000
lambda_reg = 0.001
eps = np.random.randn(N,1)

In [160]:
A = np.random.randn(N,int(d))

In [161]:
for j in range(A.shape[1]):
  A[:,j] = A[:,j]/np.linalg.norm(A[:,j])

In [162]:
xorig = np.ones((int(d),1))
y = np.dot(A,xorig) + eps

In [163]:
x = np.zeros((int(d),1))
epochs = 1e4
t = 1
arr = np.arange(N)

In [164]:
def evalg(x, lambd):
  arr = np.zeros(64)
  err = np.subtract(np.matmul(A,x),y)
  for i in range(64):
    arr[i] = lambd*x[i] + np.matmul(err.transpose(),A[:,i])
  return arr.reshape(64,1)

In [165]:
def evalg_l(x, i, lambd):
  err = np.subtract(np.matmul(A[i],x),y[i])
  ar = np.add((lambd/N)*x[i],np.multiply(err[0],A[i]))
  return ar.reshape((x.shape[0],1))

In [166]:
start = timeit.default_timer() #start the timer
for epoch in range(int(epochs)):
  #print(epoch)
  np.random.shuffle(arr) #shuffle every epoch
  for i in np.nditer(arr):
    #print(evalg_l(x, i, lambda_reg))
    x = x - (1/t * evalg_l(x, i, lambda_reg))
    #print(x.shape)
    t = t+1
    if t>1e4:
      t = 1
alglab7time = timeit.default_timer() - start #time is in seconds
x_alglab7 = x

print('time:',alglab7time)
print('optimal gradient norm:',np.linalg.norm(evalg(x_alglab7, lambda_reg)))
print((np.linalg.norm(np.subtract(np.matmul(A,x_alglab7),y)))**2)
print((np.linalg.norm(np.subtract(x_alglab7,xorig)))**2)

time: 130.52291859400066
optimal gradient norm: 0.0010363844642636807
3.898103464183208e-08
9838.847456261916


#Ans 3:

For dimension 10000,

Time taken : 130.52 seconds

$||f_\lambda(x^*)||$ = 0.0010363844642636807

$||Ax^*-y||_2^2$ = 3.898103464183208e-08

$||x^*-x_{orig}||^2_2$ = 9838.847456261916

For dimension 20000,

Time taken : 317.23 seconds

$||f_\lambda(x^*)||$ = 0.000741488790987189

$||Ax^*-y||_2^2$ = 7.846033738594737e-09

$||x^*-x_{orig}||^2_2$ = 19824.247821136378 

In [167]:
epochs = [1e4, 5*1e4, 1e5]

for k in epochs:
  start = timeit.default_timer() #start the timer
  for epoch in range(int(k)):
    #print(epoch)
    np.random.shuffle(arr) #shuffle every epoch
    for i in np.nditer(arr):
      #print(evalg_l(x, i, lambda_reg))
      x = x - (1/t * evalg_l(x, i, lambda_reg))
      #print(x.shape)
      t = t+1
      if t>1e4:
        t = 1
  alglab7time = timeit.default_timer() - start #time is in seconds
  x_alglab7 = x

  print('epoch =',k)
  print('time:',alglab7time)
  print('optimal gradient norm:',np.linalg.norm(evalg(x_alglab7, lambda_reg)))
  print('||Ax* - y||^2 =',(np.linalg.norm(np.subtract(np.matmul(A,x_alglab7),y)))**2)
  print('||x*-xorig||^2 =',(np.linalg.norm(np.subtract(x_alglab7,xorig)))**2,'\n')

epoch = 10000.0
time: 128.2377099839996
optimal gradient norm: 0.0010498801077334615
||Ax* - y||^2 = 7.974511564562011e-09
||x*-xorig||^2 = 9845.010811491862 

epoch = 50000.0
time: 652.5266416569993
optimal gradient norm: 0.0010536545622100123
||Ax* - y||^2 = 4.3321540850761296e-08
||x*-xorig||^2 = 9872.291738488982 

epoch = 100000.0
time: 1324.3427896840003
optimal gradient norm: 0.0010515805241716992
||Ax* - y||^2 = 1.4337299772194933e-08
||x*-xorig||^2 = 9924.49763942884 



In [168]:
lambdas = [1000, 100, 10, 1, 0.1, 1e-2, 1e-3]

for k in lambdas:
  start = timeit.default_timer() #start the timer
  for epoch in range(int(1e4)):
    #print(epoch)
    np.random.shuffle(arr) #shuffle every epoch
    for i in np.nditer(arr):
      #print(evalg_l(x, i, lambda_reg))
      x = x - (1/t * evalg_l(x, i, k))
      #print(x.shape)
      t = t+1
      if t>1e4:
        t = 1
  alglab7time = timeit.default_timer() - start #time is in seconds
  x_alglab7 = x

  print('lambda =',k)
  print('time:',alglab7time)
  print('optimal gradient norm:',np.linalg.norm(evalg(x_alglab7, lambda_reg)))
  print('||Ax* - y||^2 =',(np.linalg.norm(np.subtract(np.matmul(A,x_alglab7),y)))**2)
  print('||x*-xorig||^2 =',(np.linalg.norm(np.subtract(x_alglab7,xorig)))**2,'\n')

lambda = 1000
time: 132.2270335000012
optimal gradient norm: 18.972286344711037
||Ax* - y||^2 = 1804.1710838259148
||x*-xorig||^2 = 10404.138641022742 

lambda = 100
time: 130.13892594000026
optimal gradient norm: 2.558747927157985
||Ax* - y||^2 = 24.149105813951127
||x*-xorig||^2 = 10470.577258493173 

lambda = 10
time: 133.58179831500092
optimal gradient norm: 0.359052566475746
||Ax* - y||^2 = 0.4902254969565436
||x*-xorig||^2 = 10528.236331047126 

lambda = 1
time: 133.8488884569997
optimal gradient norm: 0.05737482995387158
||Ax* - y||^2 = 0.012416071171366441
||x*-xorig||^2 = 10472.512922389884 

lambda = 0.1
time: 127.85367920899989
optimal gradient norm: 0.002530193292257295
||Ax* - y||^2 = 1.5112054428170323e-05
||x*-xorig||^2 = 10431.114418388244 

lambda = 0.01
time: 129.58886959200026
optimal gradient norm: 0.0010410672951335497
||Ax* - y||^2 = 3.016938553517417e-07
||x*-xorig||^2 = 10433.507293673983 

lambda = 0.001
time: 129.25343775799956
optimal gradient norm: 0.0010186

In [170]:
N = 200
d = 20000
lambda_reg = 0.001
eps = np.random.randn(N,1)
A = np.random.randn(N,int(d))
for j in range(A.shape[1]):
  A[:,j] = A[:,j]/np.linalg.norm(A[:,j])
xorig = np.ones((int(d),1))
y = np.dot(A,xorig) + eps
x = np.zeros((int(d),1))
epochs = 1e4
t = 1
arr = np.arange(N)

In [173]:
epochs = [1e4, 5*1e4, 1e5]

for k in epochs:
  start = timeit.default_timer() #start the timer
  for epoch in range(int(k)):
    #print(epoch)
    np.random.shuffle(arr) #shuffle every epoch
    for i in np.nditer(arr):
      #print(evalg_l(x, i, lambda_reg))
      x = x - (1/t * evalg_l(x, i, lambda_reg))
      #print(x.shape)
      t = t+1
      if t>1e4:
        t = 1
  alglab7time = timeit.default_timer() - start #time is in seconds
  x_alglab7 = x

  print('d=',d)
  print('epoch =',k)
  print('time:',alglab7time)
  print('optimal gradient norm:',np.linalg.norm(evalg(x_alglab7, lambda_reg)))
  print('||Ax* - y||^2 =',(np.linalg.norm(np.subtract(np.matmul(A,x_alglab7),y)))**2)
  print('||x*-xorig||^2 =',(np.linalg.norm(np.subtract(x_alglab7,xorig)))**2,'\n')

d= 20000
epoch = 10000.0
time: 317.2296431270006
optimal gradient norm: 0.000741488790987189
||Ax* - y||^2 = 7.846033738594737e-09
||x*-xorig||^2 = 19824.247821136378 

d= 20000
epoch = 50000.0
time: 1591.8587656910004
optimal gradient norm: 0.0007160889791013773
||Ax* - y||^2 = 2.7992275183683695e-09
||x*-xorig||^2 = 19840.12558161412 

d= 20000
epoch = 100000.0
time: 3193.5619911930007
optimal gradient norm: 0.0007244234380072702
||Ax* - y||^2 = 1.4640371870989295e-09
||x*-xorig||^2 = 19869.84936113091 



In [174]:
lambdas = [1000, 100, 10, 1, 0.1, 1e-2, 1e-3]

for k in lambdas:
  start = timeit.default_timer() #start the timer
  for epoch in range(int(1e4)):
    #print(epoch)
    np.random.shuffle(arr) #shuffle every epoch
    for i in np.nditer(arr):
      #print(evalg_l(x, i, lambda_reg))
      x = x - (1/t * evalg_l(x, i, k))
      #print(x.shape)
      t = t+1
      if t>1e4:
        t = 1
  alglab7time = timeit.default_timer() - start #time is in seconds
  x_alglab7 = x

  print('lambda =',k)
  print('time:',alglab7time)
  print('optimal gradient norm:',np.linalg.norm(evalg(x_alglab7, lambda_reg)))
  print('||Ax* - y||^2 =',(np.linalg.norm(np.subtract(np.matmul(A,x_alglab7),y)))**2)
  print('||x*-xorig||^2 =',(np.linalg.norm(np.subtract(x_alglab7,xorig)))**2,'\n')

lambda = 1000
time: 321.937267502999
optimal gradient norm: 52.078550955443156
||Ax* - y||^2 = 9028.541478315108
||x*-xorig||^2 = 20148.841727397616 

lambda = 100
time: 317.88425542700134
optimal gradient norm: 5.050087299012066
||Ax* - y||^2 = 72.1379895014913
||x*-xorig||^2 = 20092.575147368454 

lambda = 10
time: 317.07445605200337
optimal gradient norm: 0.12814282514193304
||Ax* - y||^2 = 0.05055126874355229
||x*-xorig||^2 = 20009.276019769448 

lambda = 1
time: 319.341281240002
optimal gradient norm: 0.019772601057733194
||Ax* - y||^2 = 0.0010423501051387723
||x*-xorig||^2 = 20160.33069420957 

lambda = 0.1
time: 319.2264705790003
optimal gradient norm: 0.0018407413913838946
||Ax* - y||^2 = 1.131264503635859e-05
||x*-xorig||^2 = 20134.919054877875 

lambda = 0.01
time: 318.2600023399973
optimal gradient norm: 0.0007365408265413915
||Ax* - y||^2 = 4.323728293206125e-08
||x*-xorig||^2 = 20137.28394519603 

lambda = 0.001
time: 323.295242725002
optimal gradient norm: 0.0007651265188

#Ans 4:

At $\lambda$ = 0.001

For dimension 10000,

For 10000 epochs,

Time taken : 128.2377099839996

$||f_\lambda(x^*)||$ = 0.0010498801077334615

$||Ax^*-y||_2^2$ = 7.974511564562011e-09

$||x^*-x_{orig}||^2_2$ = 9845.010811491862 

For 50000 epochs,

Time taken : 652.5266416569993

$||f_\lambda(x^*)||$ = 0.0010536545622100123

$||Ax^*-y||_2^2$ = 4.3321540850761296e-08

$||x^*-x_{orig}||^2_2$ = 9872.291738488982

For 100000 epochs,

Time taken : 1324.3427896840003

$||f_\lambda(x^*)||$ = 0.0010515805241716992

$||Ax^*-y||_2^2$ = 1.4337299772194933e-08

$||x^*-x_{orig}||^2_2$ = 9924.49763942884

--------------------------------------------------------------------------------

For dimension 20000,

For 10000 epochs,

Time taken : 317.2296431270006

$||f_\lambda(x^*)||$ = 0.000741488790987189

$||Ax^*-y||_2^2$ = 7.846033738594737e-09

$||x^*-x_{orig}||^2_2$ = 19824.247821136378 

For 50000 epochs,

Time taken : 1591.8587656910004

$||f_\lambda(x^*)||$ = 0.0007160889791013773

$||Ax^*-y||_2^2$ = 2.7992275183683695e-09

$||x^*-x_{orig}||^2_2$ = 19840.12558161412

For 100000 epochs,

Time taken : 3193.5619911930007

$||f_\lambda(x^*)||$ = 0.0007244234380072702

$||Ax^*-y||_2^2$ = 1.4640371870989295e-09

$||x^*-x_{orig}||^2_2$ = 19869.84936113091


We can observe that for more no. of epochs, the time required increases rapidly, but no improvement is seen in the case of both failure dimensions.

#Ans 5:

For 100000 epochs,

For dimension 10000,

For $\lambda$ = 1000,

Time taken : 132.2270335000012

$||f_\lambda(x^*)||$ = 18.972286344711037

$||Ax^*-y||_2^2$ = 1804.1710838259148

$||x^*-x_{orig}||^2_2$ = 10404.138641022742 

For $\lambda$ = 100,

Time taken : 130.13892594000026

$||f_\lambda(x^*)||$ = 2.558747927157985

$||Ax^*-y||_2^2$ = 24.149105813951127

$||x^*-x_{orig}||^2_2$ = 10470.577258493173 

For $\lambda = 10$,

Time taken : 133.58179831500092

$||f_\lambda(x^*)||$ = 0.359052566475746

$||Ax^*-y||_2^2$ = 0.4902254969565436

$||x^*-x_{orig}||^2_2$ = 10528.236331047126

For $\lambda = 1$,

Time taken : 133.8488884569997

$||f_\lambda(x^*)||$ = 0.05737482995387158

$||Ax^*-y||_2^2$ = 0.012416071171366441

$||x^*-x_{orig}||^2_2$ = 10472.512922389884 

For $\lambda = 0.1$,

Time taken : 127.85367920899989

$||f_\lambda(x^*)||$ = 0.002530193292257295

$||Ax^*-y||_2^2$ = 1.5112054428170323e-05

$||x^*-x_{orig}||^2_2$ = 10431.114418388244

For $\lambda = 0.01$,

Time taken : 129.58886959200026

$||f_\lambda(x^*)||$ = 0.0010410672951335497

$||Ax^*-y||_2^2$ = 3.016938553517417e-07

$||x^*-x_{orig}||^2_2$ = 10433.507293673983

For $\lambda = 0.001$,

Time taken : 129.25343775799956

$||f_\lambda(x^*)||$ = 0.0010186131887792121

$||Ax^*-y||_2^2$ = 2.654744623840759e-09

$||x^*-x_{orig}||^2_2$ = 10433.536835010902

--------------------------------------------------------------------------------

For dimension 20000,

For $\lambda$ = 1000,

Time taken : 321.937267502999

$||f_\lambda(x^*)||$ = 52.078550955443156

$||Ax^*-y||_2^2$ = 9028.541478315108

$||x^*-x_{orig}||^2_2$ = 20148.841727397616

For $\lambda$ = 100,

Time taken : 317.88425542700134

$||f_\lambda(x^*)||$ = 5.050087299012066

$||Ax^*-y||_2^2$ = 72.1379895014913

$||x^*-x_{orig}||^2_2$ = 20092.575147368454 

For $\lambda = 10$,

Time taken : 317.07445605200337

$||f_\lambda(x^*)||$ = 0.12814282514193304

$||Ax^*-y||_2^2$ = 0.05055126874355229

$||x^*-x_{orig}||^2_2$ = 20009.276019769448

For $\lambda = 1$,

Time taken : 319.341281240002

$||f_\lambda(x^*)||$ = 0.019772601057733194

$||Ax^*-y||_2^2$ = 0.0010423501051387723

$||x^*-x_{orig}||^2_2$ = 20160.33069420957

For $\lambda = 0.1$,

Time taken : 319.2264705790003

$||f_\lambda(x^*)||$ = 0.0018407413913838946

$||Ax^*-y||_2^2$ = 1.131264503635859e-05

$||x^*-x_{orig}||^2_2$ = 20134.919054877875

For $\lambda = 0.01$,

Time taken : 318.2600023399973

$||f_\lambda(x^*)||$ = 0.0007365408265413915

$||Ax^*-y||_2^2$ = 4.323728293206125e-08

$||x^*-x_{orig}||^2_2$ = 20137.28394519603 

For $\lambda = 0.001$,

Time taken : 323.295242725002

$||f_\lambda(x^*)||$ = 0.0007651265188281965

$||Ax^*-y||_2^2$ = 1.5224959819785493e-07

$||x^*-x_{orig}||^2_2$ = 20137.171045621908

We can observe that for both the failure dimensions for larger values of lambda, the value of $||Ax^*-y||^2_2$ is quite large and reduces quickly for smaller values of lambda. The time taken for all values of lambda is similar.

#Ans 6:

Yes, Alg-Lab 7 works for the failure dimensions as it gives the solution in way less time than in the previous exercise.

#Ans 7:

The algorithm runs for the no. of epochs defined, and shuffles the rows of the data matrix for every epoch.

1/t acts as the step length and this algorithm performs well because no matrix inversion or matrix multiplication operation is needed to be performed and they can be computationally very expensive, only vector multiplications are required for the matrix in the gradient calculation.