#### Weighted Regression in =python=



The fact that $T$ and $u$ are &ldquo;independent&rdquo; (or at least
orthogonal) variables means that if we want to compute a
&ldquo;classical&rdquo; regression we&rsquo;d do it something like this:



##### Define independent random variables



In [1]:
%matplotlib inline
import numpy as np
from scipy.stats import multivariate_normal
np.random.seed(seed=90210)
k = 3 # Number of observables in T

mu = [0]*k
Sigma=[[1,0.5,0],
       [0.5,2,0],
       [0,0,3]]

T = multivariate_normal(mu,Sigma)

U = multivariate_normal(cov=0.2)

##### Define =X=



Recall that $X$ can depend on $T$ and $u$.  This dependence needn&rsquo;t be
linear!  For example, suppose $X=T^3D + u$, where $D$ is an
$\ell\times k$ matrix.



##### Construct Sample



To construct a sample of observables $(y,X,T)$ we just use the regression equation,
      plus an assumption about the value of $\beta$:



In [32]:
beta1 = [1/2,1]
beta2 = [-2,4]
beta = np.reshape([beta1,beta2],(2,2))
m = 2
C = [[1,-0],[-0,1]] # True error covariance
C = np.reshape(C,(-1,m))

D = np.random.random(size=(3,2)) # Generate random 3x2 matrix

N=1000 # Sample size

# Now: Transform rvs into a sample
t = T.rvs(N)

u = U.rvs(N) # Replace u with a sample
v = U.rvs(N) #Ensures the first N entries match those from weighted_regression.py


X = (t**3)@D  # Note use of ** operator for exponentiation

y = X@beta + (C@[u,v]).T # Note use of @ operator for matrix multiplication

In [33]:
np.shape((C@[u,v]).T)

(1000, 2)

##### Turn to estimation



So, we now have data on *realizations* $(y,X,T)$.  Now forget
     that we know $\beta$ and let&rsquo;s estimate it, using weighted least
     squares.  As a numerical matter it&rsquo;s better to avoid explicitly
     inverting the $(T^T X)$ matrix; instead we can solve the &ldquo;normal&rdquo;
     equations

\begin{align*}
   X'y &= X' X b + X' u\\
   \mbox{E}(T'u) = 0
\end{align*}



##### Numerical solution



In the classical case we were trying to solve a linear system that
 took the form $Ab=0$, with $A$ a square matrix.  In the present case
 we&rsquo;re also trying to solve a linear system, but with a matrix $A$
 that may have more rows than columns.  Provided the rows are linearly
 independent, this implies that we have an **overidentified** system of
 equations.  We&rsquo;ll return to the implications of this later, but for
 now this also calls for a different numerical approach, using
 `np.linalg.lstsq` instead of `np.linalg.solve`.



In [34]:
from scipy.linalg import inv, sqrtm

b = np.linalg.lstsq(t.T@X,t.T@y)[0] # lstsqs returns several results; we pick the first

e = y - X@b


TXplus = np.linalg.pinv(t.T@X) # Moore-Penrose pseudo-inverse

# Empirical covariance matrix of e
Omega_hat = np.cov(e)
Om_inv = np.linalg.pinv(Omega_hat) #This doesn't seem to be properly inverting the matrix 
#np.linalg.pinv(Omega_hat) @ Omega_hat doesn't return the identity.

TXplus2 = np.linalg.pinv(t.T@Om_inv@X)

b2 = np.linalg.lstsq(t.T@Om_inv@X,t.T@Om_inv@y)[0] # lstsqs returns several results; we pick the first
print(b2)

e2 = y - X@b2
print(np.cov(e.T))
    
# vb = e2.var()*TXplus2@t.T@Omega_hat@t@TXplus2.T  # This code should work if the above works properly
# print(vb)

[[-0.87417906  2.1688006 ]
 [-3.48727804  6.05128036]]
[[ 0.20060124 -0.02247382]
 [-0.02247382  0.2163604 ]]


  b = np.linalg.lstsq(t.T@X,t.T@y)[0] # lstsqs returns several results; we pick the first
  b2 = np.linalg.lstsq(t.T@Om_inv@X,t.T@Om_inv@y)[0] # lstsqs returns several results; we pick the first


In [35]:
np.cov(e.T)

array([[ 0.20060124, -0.02247382],
       [-0.02247382,  0.2163604 ]])

In [36]:
np.linalg.pinv(Omega_hat) @ Omega_hat

array([[ 4.10947343e-04,  1.23018210e-03,  2.64998000e-05, ...,
        -8.67545461e-04, -5.35605863e-04, -1.99754835e-05],
       [ 1.23018210e-03,  3.68258373e-03,  7.93278753e-05, ...,
        -2.59702104e-03, -1.60335079e-03, -5.97971557e-05],
       [ 2.64998000e-05,  7.93278753e-05,  1.70883061e-06, ...,
        -5.59433746e-05, -3.45383623e-05, -1.28811228e-06],
       ...,
       [-8.67545461e-04, -2.59702104e-03, -5.59433746e-05, ...,
         1.83146366e-03,  1.13071040e-03,  4.21699771e-05],
       [-5.35605863e-04, -1.60335079e-03, -3.45383623e-05, ...,
         1.13071040e-03,  6.98078831e-04,  2.60349319e-05],
       [-1.99754835e-05, -5.97971557e-05, -1.28811228e-06, ...,
         4.21699771e-05,  2.60349319e-05,  9.70975841e-07]])