# Lecture 13

- Vector
- Vector Stats  

- Covariance, and covariance matrix.

In [None]:
!pip install plotvec

In [None]:
import numpy as np
import numpy.random as npr
import random
import itertools

from plotvec import plotvec, plotvecR

import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline
# plt.style.use('ggplot')

import scipy.stats as stats

### Properties of Vector Addition

Because vector addition is component-wise scalar addition, it inherits many of its properties from scalar addition:
 
* *Commutative*:  $\mathbf{a}+\mathbf{b} = \mathbf{b} + \mathbf{a}$
* *Associative*: $(\mathbf{a}+\mathbf{b}) +\mathbf{c} = \mathbf{a}+(\mathbf{b} +\mathbf{c})$
* *Identity*: The zero vector is the identity for vector addition: $\mathbf{a} + \mathbf{0}  = \mathbf{a}$

## Scalar-Vector Multiplication (Scaling)

**In scalar-vector multiplication, every component of the vector is multiplied  by the scalar.**

**multiplication; scalar-vector**
>   Given a vector $\mathbf{x}$ and a scalar $\alpha$, $\alpha \mathbf{x}$, the $i$th component of $\alpha \mathbf{x}$ is given by $(\alpha \mathbf{x})_i = \alpha x_i$. Thus,
>   \begin{equation*}
    \alpha \mathbf{x} = \alpha   \left[ x_0, ~ x_1,~ \ldots, ~ x_{n-1} \right]^T = \left[ \alpha x_0, ~ \alpha x_1, ~\ldots, ~ \alpha x_{n-1} \right]^T.
    \end{equation*}

In NumPy, we can multiply a scalar by a vector using the usual `*` multiplication symbol:

In [None]:
a = np.array([2,3])
b = np.array([1,-2])
a1 = # scalar-vector muliplication
plotvecR( ,  , labels=['$\mathbf{a}$', '$0.5 \mathbf{a}$']) # todo

In [None]:
b1 = # scalar-vector muliplication
plotvecR( ,  ,  labels=['3$\mathbf{b}$', ' $\mathbf{b}$'])

In [None]:
a2 =  # scalar-vector muliplication
plotvecR(a, a2, labels=['$\mathbf{a}$', '$-0.5 \mathbf{a}$'])

* Multiplying by a positive scalar yields a vector that is in the **same direction** as the original vector.
* Multiplying by a negative scalar yields a vector that is in the **opposite direction** as the original vector.
* The length of the new vector is controlled by the scalar's magnitude (i.e., absolute value).

### Properties of Scalar Multiplication

Since scalar multiplication is component-wise multiplication by a scalar, it inherits the  properties below from normal multiplication of real scalars.

* *Commutative*: $\alpha \mathbf{x} = \mathbf{x} \alpha.$ It does not matter whether the multiplying scalar is on the right or left of the vector.
* *Associative*: If $\alpha$ and $\beta$ are scalars, then $(\alpha  \beta) \mathbf{x} = \alpha (\beta \mathbf{x})$. If multiplying by two scalars, we will get the same result if we do the scalar-scalar multiplication first or the scalar-vector multiplication first.
* *Distributive over scalar addition*: $(\alpha+\beta) \mathbf{x} = \alpha \mathbf{x} + \beta \mathbf{x}$ and $\mathbf{x} (\alpha+\beta)  = \mathbf{x}\alpha  + \mathbf{x} \beta $ 
* *Distributive over vector addition*: $\alpha ( \mathbf{x} +\mathbf{y}) = \alpha \mathbf{x} + \alpha \mathbf{y}$


## Vector Subtraction

We can combine vector addition and scaling to define vector subtraction.

We define $\mathbf{y} - \mathbf{x}$ as $\mathbf{y} + (-\mathbf{x})$, which yields

$$
\mathbf{y} - \mathbf{x} =   \left[ x_0 - y_0, ~~ x_1- y_1, ~~ \ldots,~~ x_{n-1} - y_{n-1}  \right]
$$

If we let $\mathbf{z} = \mathbf{y} - \mathbf{x}$, then we can also write $\mathbf{y} = \mathbf{x} + \mathbf{z}$, so $\mathbf{z}$ is the vector that needs to be added to $\mathbf{x}$ for the result to be $\mathbf{y}$.  

For example, the figure below shows the relation between $\mathbf{a}$, $\mathbf{b}$, and $\mathbf{b-a}$:


![image.png](attachment:b2a2c849-fb46-45b4-a6f2-17d716e21fc6.png)

## Component-wise Vector Multiplication: The Hadamard Product

**component-wise multiplication (vectors)**

**Hadamard product**

**Schur product**
>   Given $n$-vectors $\mathbf{x}$ and $\mathbf{y}$, the *Hadamard product* or *Schur product* is denoted $\mathbf{x} \odot \mathbf{y}$ and is the $n$-vector given by component-wise multiplication of $\mathbf{x}$ and $\mathbf{y}$,
>   \begin{equation*}
    \mathbf{x} \odot \mathbf{y} = \left[ x_0 y_0, ~~ x_1 y_1, ~~ \ldots,~~ x_{n-1} y_{n-1}  \right].
    \end{equation*}


The standard multiplication operator `*` performs component-wise multiplication:

In [None]:
g = np.array( [1, 2] )
h = np.array( [-3, 4] )
 

### Properties of Hadamard Product

Because the Hadamard product is just a collection of pairwise scalar multiplications across all the elements in two vectors, it takes on properties of scalar multiplication, such as being commutative and distributive across addition:
* *Commutative*: $\mathbf{a} \odot \mathbf{b} = \mathbf{b} \odot \mathbf{a}$
* *Associative with scalar multiplication*: $(\gamma \mathbf{a}) \odot \mathbf{b} = \gamma (\mathbf{a} \odot \mathbf{b} )$
* *Distributive across vector addition*: $(\mathbf{a} +\mathbf{b}) \odot \mathbf{c} = \mathbf{a} \odot \mathbf{c} +\mathbf{b} \odot \mathbf{c}$

## Vector-Vector Multiplication: Inner Product

The most common form of multiplication between vectors is called the *inner product* or *dot product*.

The input is two vectors of the same length, and the output is a scalar:


**dot product**

**inner product**
>   Given $n$-vectors $\mathbf{x}$ and $\mathbf{y}$, the *dot product* or *inner product* is denoted $\mathbf{x} \cdot \mathbf{y}$ or $\mathbf{x}^T \mathbf{y}$ and is the **scalar value** given by multiplying corresponding components and summing them up:
>   \begin{equation*}
    \mathbf{x} \cdot \mathbf{y} = \sum_{i=0}^{n-1} x_i y_i.
    \end{equation*}

Inner product is a concept that can be applied more broadly than to just vectors and can also be denoted using other notation, such as $\langle \mathbf{x}, \mathbf{y} \rangle$.
 

The dot product combines two of the operations we previously discussed: component-wise multiplication, followed by summing up the elements. The example below shows the computation of the dot product using these two operations:

In [None]:
g = np.array( [1, 2] )
h = np.array( [-3, 4] )

gh = # Hadamard product

We can perform the dot product directly using Python's matrix multiplier, which uses the `@` (read "at") symbol. Here is the dot product computation using this operator:

In [None]:
 # inner product 

Since component-wise multiplication is commutative, the dot product is also commutative:

In [None]:
 # check communitative

## Properties of Dot Product

* *Commutative*: $\mathbf{a} \cdot \mathbf{b} = \mathbf{b} \cdot \mathbf{a}$
* *Associative with scalar multiplication*: $(\gamma \mathbf{a}) \cdot \mathbf{b} = \gamma (\mathbf{a} \cdot \mathbf{b} )$
* *Distributive across vector addition*: $(\mathbf{a} +\mathbf{b}) \cdot \mathbf{c} = \mathbf{a} \cdot \mathbf{c} +\mathbf{b} \cdot \mathbf{c}$

**Dot product of a vector with itself: Sum of squares**

Recall that the Hadamard product of a vector $\mathbf{x}$ with itself is a vector of the squares of the elements in $\mathbf{x}$. Then the dot product of a vector with itself is the sum of the squares of the elements in the vector:


$$
\mathbf{x} \cdot \mathbf{x} = \sum_{i=0}^{n-1} x_{i}^{2}.
$$

Let's try this out using our example vector, $\mathbf{c}$:

Taking the inner product of a mathematical object with itself is common enough that mathematicians have introduced a special name and notation for it:

**norm squared (of a vector)**
>   For a mathematical object $\mathbf{x}$ with an inner product operator $\langle , \rangle$, the norm squared is denoted by $\| x \|^2$ and defined as
>   \begin{equation*}
    \| x \|^2 = \langle x, x \rangle .
    \end{equation*}

For vectors, the inner product operation is the dot product, and the norm squared of a vector $\mathbf{x}$ is $\|\mathbf{x}\|^2 = \mathbf{x} \cdot \mathbf{x}$.



## Length or Magnitude of a Vector

Consider again the vector $\mathbf{a} = \left[2, 3 \right]^T$, shown below:

Then $\mathbf{a}$ is the hypotenuse of a right triangle with sides 2 and 3, as shown below:

![image.png](attachment:e153e955-563d-4242-99dc-07f5e37517f7.png)

Let $\ell_a$ denote the length of $\mathbf{a}$. By the Pythagorean theorem, 

$$
\ell_{a}^{2} = 2^2 + 3^2,
$$
or 

$$
\ell_{a} = \sqrt{2^2 + 3^2}.
$$

For any 2-vector $\mathbf{b} = \left[ b_0, b_1\right]$, the same mathematical approach will give the length $\ell_b$ as

$$
\ell_{b} = \sqrt{b_{0}^{2} + b_{1}^{2}}.
$$

The argument inside the square root is simply the norm-squared of $\mathbf{b}$, so we can write

$$
\ell_{b} = \sqrt{\|\mathbf{b} \|^2},
$$
which we can simplify to write

$$
\ell_{b} =\|\mathbf{b} \|.
$$

The length of the vector $\mathbf{b}$ is the norm of $\mathbf{b}$ 

**norm**
>   For a mathematical object $\mathbf{x}$ with an inner product operator $\langle , \rangle$, the norm is denoted by $\| x \|$ and defined as
>   \begin{equation*}
    \left \Vert x \right \Vert = \sqrt{\langle x, x \rangle }.
    \end{equation*}

The norm of a vector $\mathbf{b}$ is its length, even if $\mathbf{b}$ has more than two dimensions. 

Let's start by computing the length of $\mathbf{a}$ by working with the individual elements of $\mathbf{a}:$

Now, let's use the dot product to find the norm-squared of $\mathbf{a}$ (the part inside the square root):

Finding the norm of a vector is a relatively common operation, so NumPy has a norm operator in the `np.linalg` module:

In [None]:
import numpy.linalg as la

#todo

Recall our examples of scaling $\mathbf{a}$ by multiplying it by a constant.

Let $\mathbf{w} = \gamma \mathbf{a}$, where $\gamma$ is some constant: 

In [None]:
a1 =  
plotvec(a, a1, labels=['$\mathbf{a}$', '$0.5 \mathbf{a}$'])

For an arbitrary vector $\mathbf{a}$, we can calculate the length of $\gamma \mathbf{a}$ as 

\begin{align*}
\| \gamma \mathbf{a} \| &= \sqrt{ \gamma \mathbf{a} \cdot \gamma \mathbf{a} } \\
&= \sqrt{ \gamma^2 \mathbf{a} \cdot \mathbf{a} } \\
&= \lvert \gamma \rvert \sqrt{  \mathbf{a} \cdot \mathbf{a} }\\
&= \lvert \gamma \rvert \|\mathbf{a} \| .
\end{align*}

For our example, the length of $0.5 \mathbf{a}$ is $0.5 \|a\|$. Let's check:

In [None]:
a1 =  

la.norm(a1), la.norm(a)

## Distance Between Vectors

We define the distance between two $n$-vectors as follows:

**distance (vectors)**
>  The *distance* between two $n$-vectors $\mathbf{a}$ and $\mathbf{b}$ is the norm of the difference between the vectors,  
> \begin{align*}
 d(\mathbf{a}, \mathbf{b}) = \| \mathbf{a} - \mathbf{b} \| = \|\mathbf{b} - \mathbf{a} \|.
 \end{align*}

In [None]:
g = np.array( [1, 2] )
h = np.array( [-3, 4] )

# compute d(g,h)

# Vector Statistics

Consider again the Covid data set from Chapter 3:

In [None]:
df = pd.read_csv( 'https://www.fdsp.net/data/covid-merged.csv' )

df.head()

We will set the 'state' column to be the index and add columns for the cases per 1000 residents and GDP per 1000 residents:

In [None]:
df.set_index('state', inplace=True)
df["gdp_norm"] = df["gdp"] / (df["population"] / 1000)
df["cases_norm"] = df["cases"] / (df["population"] / 1000)

df.head()

We can consider each column of this dataframe to be a vector of data.

In fact, it is easy to convert any column to a vector using the dataframes `to_numpy()` method:

In [None]:
#DEMO
cases =  
print(cases)
len(cases)

This offers us flexibility in working with data because it makes it easy to work with all of the tools that NumPy offers.

**WARNING!**

`cases` is a *view* into the dataframe and changes to `cases` affect the original dataframe.  If you need a separate copy, pass the keyword argument `copy=True`.

#   Matrices

Recall that a matrix is a two-dimensional table of numbers.


In [None]:
import numpy as np

# Creating a sample matrix
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print("Matrix A:")
print(A)


In [None]:

# 1. Basic Indexing (Accessing a Single Element)
print("\nAccessing element at row 1, column 2 (0-based index):",  )  # Output: 2




In [None]:
# 2. Slicing (Extracting Submatrices)
print("\nExtracting first two rows:")
print( )  # Rows 0 and 1, all columns


In [None]:

print("\nExtracting second column:")
print( )  # All rows, column 1

In [None]:

# 3. Advanced Indexing (Using Lists/Arrays)
row_indices = [0, 2]  # Selecting rows 0 and 2
col_indices = [1, 2]  # Selecting columns 1 and 2
print("\nSelecting elements at (0,1) and (2,2):")
print()  # Output: [2, 9]


In [None]:

# 4. Boolean Indexing (Conditional Selection)
print("\nElements greater than 4:")
print( )  # Output: [5, 6, 7, 8, 9]
 

In [None]:
covid_array = 

This array has a lot of rows, so let's print the first five. We can do this using indexing. By using the index range `:5`, we will get the first five rows:

Compare the values in `covid_array` with the values in the dataframe.
* Each column in `df` has been converted into a column of the NumPy array `covid_array`.
* All of the variables has been converted to floating-point values because a NumPy array can only have one data type, and the percent urban data requires a floating-point representation.

Each data feature (i.e., number of cases, population, GDP, percent urban) occupies one of the columns of the NumPy array.

The entries in this matrix can be indexed by row and then column.
* For instance, since row 4 corresponds to California and column 1 corresponds to population, we can retrieve the population of California as follows: 

In [None]:
covid_array[4,1]

Note that some of the libraries that we will use expect each data feature to be in a different row, while the Pandas dataframe `to_numpy()` method puts each data feature into a different column.

We can *transpose* the matrix to interchange the rows and columns:


**transpose**
>   Interchange the rows and columns in a matrix. For a matrix $\mathbf{M}$, the transpose is denoted by $\mathbf{M}^{T}$ satisfies  
>   \begin{align*}
 \mathbf{M}^T [i,j] = \mathbf{M}[j,i] ~~~ \forall i,j.
 \end{align*}

We can get the transpose of a NumPy matrix by appending `.T`:

In [None]:

A=   np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(A)

In [None]:
# transpose of A
A.T

In [None]:
covid_array.T

## Measuring Dependence through Moments: Covariances, and Correlations

To measure dependence between two features, we generalize the concept of variance.

Let $X$ and $Y$ be random variables. Then:
\begin{align*}
\operatorname{Var}[X] &= E \left[ \left(X - E[X]\right)^2 \right], \mbox{ and} \\
\operatorname{Var}[Y] &= E \left[ \left(Y - E[Y]\right)^2 \right]. \\
\end{align*}

We create a new *joint moment* called *covariance* that combines these two:

**covariance (random variables)**
>   For random variables $X$ and $Y$, the *covariance* is the joint moment given by
>   \begin{equation*}
    \operatorname{Cov}(X, Y) = E \left[ \left( X - E[X]\right) \left( Y- E[Y]\right) \right].
    \end{equation*}

If $X$ and $Y$ are independent random variables, then $\operatorname{Cov}(X, Y) =0$.
* However, **the converse is not true**:  $\operatorname{Cov}(X, Y) =0$ does not mean that $X$ and $Y$ are independent.

* Roughly speaking, a positive covariance indicates that the values of $X-E[X]$ and $Y-E[Y]$ obtained in a single experiment "tend" to have the same sign.

The data below are drawn from a Normal distribution with positive covariance:

![image.png](attachment:e9d9fbce-c122-4642-bc3f-7b58242ca0f3.png)

* Similarly, a negative covariance indicates that the values of $X-E[X]$ and $Y-E[Y]$ obtained in a single experiment "tend" to have the *opposite* sign. The data below are drawn from a Normal distribution with negative covariance:

![image.png](attachment:12d4de68-b1f2-4145-a99c-4606c542ac99.png)

Computing covariance for random variables requires understanding *joint probability distributions*,
which are outside the scope of this book.

We will compute the covariance for vectors of data.

If $\mathbf{x}$ and $\mathbf{y}$ are equal-length samples from some random variables $X$ and $Y$, then the unbiased (sample) covariance is:


**covariance (data vectors)**
>   For $n$-vectors  $\mathbf{x}$ and $\mathbf{y}$, the unbiased sample *covariance* is  given by
>   \begin{equation*}
    \operatorname{Cov}( \mathbf{x}, \mathbf{y}) = \frac{1}{n-1} \sum_{i=0}^{n-1} \left(x_i - \overline{x}\right) \left(y_i - \overline{y}\right) .
    \end{equation*}

**Note:** The covariance of a feature with itself is the variance of that feature. 

The covariance can be calculated efficiently using athe dot product:

\begin{equation*} \operatorname{Cov}( \mathbf{x}, \mathbf{y}) = \frac{1}{n-1}  \left(\mathbf{x} - \overline{\mathbf{x}}\right)  \cdot \left(\mathbf{y}- \overline{\mathbf{y}}\right) . 
\end{equation*}



Pandas dataframes have a `cov()` method that returns all the pairwise covariances:

In [None]:
df.cov()

This is called a **Covariance Matrix**. It is a table of the variances and covariances of the data in the following form 
\begin{align}
\mathbf{K_X} &= 
\begin{bmatrix}
\operatorname{Cov}(\mathbf{X}_1, \mathbf{X}_1) & \operatorname{Cov}(\mathbf{X}_1, \mathbf{X}_2)  \\
\operatorname{Cov}(\mathbf{X}_2, \mathbf{X}_1) & \operatorname{Cov}(\mathbf{X}_2, \mathbf{X}_2)  \\
\end{bmatrix} \\
&\\
&=
\begin{bmatrix}
\operatorname{Var}(\mathbf{X}_1) & \operatorname{Cov}(\mathbf{X}_1, \mathbf{X}_2)  \\
\operatorname{Cov}(\mathbf{X}_1, \mathbf{X}_2) & \operatorname{Var}(\mathbf{X}_2)  \\
\end{bmatrix} 
\end{align}

NumPy arrays do not have a covariance **method**. 
 

NumPy's `np.cov()` can also calculate the covariance for two separate vectors.


In [None]:
avec = covid_array[:,3] 
bvec = covid_array[:,4] 


np.cov(avec, bvec)

A problem with covariances is that they are hard to interpret because they can take on very large or very small values, depending on the variances of the features.

To get around this, we often use a normalized version of the covariance called the *correlation coefficient*.

As with covariance, we start by defining it in terms of random variables:

**correlation coefficient (random variables)**
>   For random variables $X$ and $Y$, the *correlation coefficient* is the given by
>   \begin{equation*}
    \rho = \frac{ \operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y},
    \end{equation*}
>   where $\sigma_X$ and $\sigma_Y$ are the standard deviations of $X$ and $Y$, respectively.

Then $|\rho| \le 1$. Correlation coefficients with magnitudes closer to 1  generally indicate greater dependence among the variables. 

The correlation coefficient for data vectors is usually denoted by $r$ or $R$ and is given by:

**correlation coefficient (data vectors)**
>   For $n$-vectors  $\mathbf{x}$ and $\mathbf{y}$, the *correlation coefficient* or *Pearson's correlation coefficient* is  given by
>   \begin{equation*}
    r = \frac{ \operatorname{Cov}(\mathbf{x}, \mathbf{y})}{\sigma_x \sigma_y},
    \end{equation*}
>   where $\sigma_x$ and $\sigma_y$ are the standard deviations of $\mathbf{x}$ and $\mathbf{y}$, respectively.

Pandas dataframes have a `corr()` method for computing the pairwise correlation coefficients:

## Cauchy-Schwarz Inequality

<div class="alert alert-info">
  <strong>Cauchy-Schwarz Inequality</strong>

The **Cauchy-Schwarz Inequality** provides a bound on the maximum absolute value of an inner product in terms of the norms of the vectors:
    
$$\left| \langle \mathbf{a}, \mathbf{b} \rangle \right| \le \left\|\mathbf{a} \right\| \left\| \mathbf{b}\right\|$$
    
with equality if and only if $\mathbf{a}= c\mathbf{b}$ for some constant $c$. 
    
Note that $ \langle \mathbf{a}, \mathbf{b} \rangle = \mathbf{a}^T\mathbf{b}$ is the inner product of $\mathbf{a}$ with $\mathbf{b}$.

*(See Boyd book, section 3.4, for proof)*
</div>
    
Here, I purposefully used the general inner product notation $\langle \rangle$ because the Cauchy-Schwarz Inequality applies to all inner products, not just those involving vectors (e.g. inner product of matrices).

Noting our computation of covariance using inner product above, we can get

\begin{align*}
\left|\operatorname{cov}(\mathbf{x}, \mathbf{y})  \right|
&= \big\langle \left(\mathbf{x} - \boldsymbol \mu_x \right), 
    \left(\mathbf{y} - \boldsymbol \mu_y \right) \big\rangle \\
&\le \left\| \mathbf{x} - \boldsymbol \mu_x  \right\|
    \left\| \mathbf{y} - \boldsymbol \mu_y  \right\| \\
&= \sqrt{\big\langle \left(\mathbf{x} - \boldsymbol \mu_x \right),
    \left(\mathbf{x} - \boldsymbol \mu_x \right) \big\rangle}
    \sqrt{\big\langle \left(\mathbf{y} - \boldsymbol \mu_y \right),
    \left(\mathbf{y} - \boldsymbol \mu_y \right) \big\rangle} \\
&= \sqrt{\operatorname{cov}(\mathbf{x}, \mathbf{x}) } \sqrt{\operatorname{cov}(\mathbf{y}, \mathbf{y}) }\\
&= \sigma_x \sigma_y
\end{align*}

So 
$$\left|\operatorname{cov}(\mathbf{x}, \mathbf{y})  \right| \leq \sigma_x \sigma_y$$

In [None]:
# let's compute the correlation coefficient matrix

The correlation coefficient is much easier to interpret than the covariance:
* If we look at the normalized cases, we can see that it is most correlated with the non-normalized number of cases, followed by the normalized GDP:
* The correlations between COVID rates and either urban index or population are lower.
* The correlation coefficients give us an easy way to look for dependence during exploratory data analysis.

The equivalent function in NumPy is `np.corrcoef()`.
*  the data features are expected to be in the rows of the array, so we have to transpose the array before passing it to `np.corrcoef()`

In [None]:
np.corrcoef()

As with `np.cov()`, we can use `np.corrcoef()` to calculate the correlation coefficient between two vectors, like

In [None]:
np.corrcoef(avec, bvec)

#### Example of Independent Data


Finally, let's look at what happens for some independent data:

Below I generate completely separate samples of Normal random variables with different variances and show a scatter plot of the data:

In [None]:
Y=stats.norm()
Z=stats.norm(scale=3)
y= 
z= 
plt.scatter(y,z, 1);
#plt.gca().set_aspect('equal')

Since the shape of the scatter plot is aligned with the $x$- and $y$-axes, there is no clear direction of dependence.

Let's check the numerical value of the covariance and correlation coefficient:

In [None]:
np.cov(y,z), np.corrcoef(y,z)

Note the small sample covariance and very small correlation coefficient. When random variables are independent, their covariance is zero; however, the sample covariance will generally not be exactly zero.

More examples of datasets with different correlations are shown in this image from the Wikipedia page for correlation (https://en.wikipedia.org/wiki/Correlation):

![image.png](attachment:d2d90fac-5702-437f-ab95-85e88588429e.png)


Note that data can be uncorrelated and have a distribution that looks nothing like the circular distribution of data from independent Normal random variables.

In fact, data can be uncorrelated and still be highly dependent.