## 15. Multivariate Models

Review of notation from linear algebra:

- If $x$ and $y$ are vectors, then $x^T y = \sum_j x_j y_j$.
- If $A$ is a matrix then $\text{det}(A)$ denotes the determinant of $A$, $A^T$ denotes the transpose of A, and $A^{-1}$ denotes the inverse of $A$ (if the inverse exists).
- The trace of a square matrix $A$, denoted by $\text{tr}(A)$, is the sum of its diagonal elements.
- The trace satisfies $\text{tr}(AB) = \text{tr}(BA)$ and $\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)$.
- The trace satisfies $\text{tr}(a) = a$ if $a$ is a scalar.
- A matrix $\Sigma$ is positive definite if $x^T \Sigma x > 0$ for all non-zero vectors $x$.
- If a matrix $\Sigma$ is symmetric and positive definite, there exists a matrix $\Sigma^{1/2}$, called the square root of $\Sigma$, with the following properties:
    - $\Sigma^{1/2}$ is symmetric
    - $\Sigma = \Sigma^{1/2} \Sigma^{1/2}$
    - $\Sigma^{1/2} \Sigma^{-1/2} = \Sigma^{-1/2} \Sigma^{1/2} = I$ where $\Sigma^{-1/2} = (\Sigma^{1/2})^{-1}$.

### 15.1 Random Vectors

Multivariate models involve a random vector $X$ of the form

$$X = \begin{pmatrix} X_1 \\ \vdots \\ X_k \end{pmatrix}$$

The mean of a random vector $X$ is defined by

$$\mu 
= \begin{pmatrix} \mu_1 \\ \vdots \\ mu_k \end{pmatrix} 
= \begin{pmatrix} \mathbb{E}(X_1) \\ \vdots \\ \mathbb{E}(X_k) \end{pmatrix}
$$

The covariance matrix $\Sigma$ is defined to be

$$\Sigma = \mathbb{V}(X) = \begin{pmatrix}
\mathbb{V}(X_1) & \text{Cov}(X_1, X_2) & \cdots & \text{Cov}(X_1, X_k) \\
\text{Cov}(X_2, X_1) & \mathbb{V}(X_2) & \cdots & \text{Cov}(X_2, X_k) \\
\vdots & \vdots & \ddots & \vdots \\
\text{Cov}(X_k, X_1) & \text{Cov}(X_k, X_2) & \cdots & \mathbb{V}(X_k)
\end{pmatrix}$$

This is also called the variance matrix or the variance-covariance matrix.

**Theorem 15.1**.  Let $a$ be a vector of length $k$ and let $X$ be a random vector of the same length with mean $\mu$ and variance $\Sigma$.  Then $\mathbb{E}(a^T X) = a^T\mu$ and $\mathbb{V}(a^T X) = a^T \Sigma a$.  If $A$ is a matrix with $k$ columns then $\mathbb{E}(AX) = A\mu$ and $\mathbb{V}(AX) = A \Sigma A^T$.

Now suppose we have a random sample of $n$ vectors:

$$
\begin{pmatrix}X_{11} \\ X_{21} \\ \vdots \\ X_{k1} \end{pmatrix}, \;
\begin{pmatrix}X_{21} \\ X_{22} \\ \vdots \\ X_{k2} \end{pmatrix}, \;
\cdots , \;
\begin{pmatrix}X_{1n} \\ X_{2n} \\ \vdots \\ X_{kn} \end{pmatrix}
$$

The sample mean $\overline{X}$ is a vector defined by

$$\overline{X} = \begin{pmatrix} \overline{X}_1 \\ \vdots \\ \overline{X}_k \end{pmatrix}$$

where $\overline{X}_i = n^{-1} \sum_{j = 1}^n X_{ij}$.  The sample variance matrix is

$$ S = \begin{pmatrix} 
s_{11} & s_{12} & \cdots & s_{1k} \\
s_{12} & s_{22} & \cdots & s_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
s_{1k} & s_{2k} & \cdots & s_{kk}
\end{pmatrix} $$

where

$$s_{ab} = \frac{1}{n - 1} \sum_{j = 1}^n (X_{aj} - \overline{X}_a) (X_{bj} - \overline{X}_b)$$

It follows that $\mathbb{E}(\overline{X}) = \mu$ and $\mathbb{E}(S) = \Sigma$.

### 15.2 Estimating the Correlation

Consider $n$ data points from a bivariate distribution

$$
\begin{pmatrix} X_{11} \\ X_{21}\end{pmatrix}, \;
\begin{pmatrix} X_{12} \\ X_{22}\end{pmatrix}, \;
\cdots \;
\begin{pmatrix} X_{1n} \\ X_{1n}\end{pmatrix}
$$

Recall that the correlation between $X_1$ and $X_2$ is

$$\rho = \frac{\mathbb{E}((X_1 - \mu) (X_2 - \mu_2))}{\sigma_1 \sigma_2}$$

The sample correlation (the plug-in estimator) is

$$\hat{\rho} = \frac{\sum_{i=1}^n (X_{1i} - \overline{X}_1)(X_{2i} - \overline{X}_2)}{s_1 s_2}$$

We can construct a confidence interval for $\rho$ by applying the delta method as usual.  However, it turns out that we get a more accurate confidence interval by first constructing a confidence interval for a function $\theta = f(\rho)$ and then applying the inverse function $f^{-1}$.  The method, due to Fisher, is as follows.  Define

$$f(r) = \frac{1}{2} \left( \log(1 + r) - \log(1 - r)\right) $$

and let $\theta = f(\rho)$.  The inverse of $f$ is

$$g(z) \equiv f^{-1}(z) = \frac{e^{2z} - 1}{e^{2z} + 1}$$

Now do the following steps:

**Approximate Confidence Interval for the Correlation**

1. Compute

$$\hat{\theta} = f(\hat{\rho}) = \frac{1}{2} \left( \log(1 + \hat{\rho}) - \log(1 - \hat{\rho})\right) $$

2. Compute the approximate standard error of $\hat{\theta}$ which can be shown to be

$$\hat{\text{se}}(\hat{\theta}) = \frac{1}{\sqrt{n - 3}} $$

3. An approximate $1 - \alpha$ confidence interval for $\theta = f(\rho)$ is

$$(a, b) \equiv \left(\hat{\theta} - \frac{z_{\alpha/2}}{\sqrt{n - 3}}, \; \hat{\theta} + \frac{z_{\alpha/2}}{\sqrt{n - 3}} \right)$$

4. Apply the inverse transformation $f^{-1}(z)$ to get a confidence interval for $\rho$:

$$ \left( \frac{e^{2a} - 1}{e^{2a} + 1}, \frac{e^{2b} - 1}{e^{2b} + 1} \right) $$