# Linear Algebra

Now that you can store and manipulate data, let's briefly review the subset of basic linear algebra that you'll need to understand most of the models. We'll introduce all the basic concepts, the corresponding mathematical notaiton, and their realization in code all in one place. If you're already confident basic linear algebra, free to skim or skip this chapter. 

In [35]:
import mxnet as mx
import mxnet.ndarray as nd

## Scalars

If you never studied linear algebra or machine learning, you're probably used to working with single numbers, like $42.0$ and know how to do basic things like add them together, multiply them. In mathematical notation, we'll represent salars with ordinary lower cased letters ($x$, $y$, $z$). In MXNet, we can work with scalars by creating NDArrays with just one element. 

In [36]:
x = nd.array([3.0]) 
y = nd.array([2.0])
print(x + y)
print(x * y)
print(x / y)
print(nd.power(x,y))


[ 5.]
<NDArray 1 @cpu(0)>

[ 6.]
<NDArray 1 @cpu(0)>

[ 1.5]
<NDArray 1 @cpu(0)>

[ 9.]
<NDArray 1 @cpu(0)>


We can convert NDArrays to Python floats by calling their ``.asscalar()

In [37]:
x.asscalar()

3.0

## Vectors 
You can think of vectors are simply a list of numbers ([1.0,3.0,4.0,2.0]). A vector could represent numerical features of some real-world person or object, like the last-record measurements across various vital signs for a patient in the hospital. In math notation, we'll always denote vectors as bold-faced lower-cased letters ($\boldsymbol{u}$, $\boldsymbol{v}$, $\boldsymbol{w})$. In MXNet, we work with vectors via 1D NDArrays with an arbitrary number of components.

In [38]:
u = nd.zeros(shape=10)
v = nd.ones(shape=10)
print(u)
print(v)


[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
<NDArray 10 @cpu(0)>

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
<NDArray 10 @cpu(0)>


We can refer to any element of a vector by using a subscript. For example, we can refer to the $4$th element of $\boldsymbol{u}$ by $u_4$. Note that the element $u_4$ is a scalar, so we don't bold-face the font when referring to it.

## Matrices

Just as vectors are an extension of scalars from 0 to 1 dimension, matrices generalization vectors to two dimensions. Matrices, which we'll denote with capital letters ($A$, $B$, $C$) are 2D arrays. 

In [39]:
A = nd.random_normal(shape=(5,4))
B = nd.random_normal(shape=(5,4))
print(A)
print(B)


[[ 0.96975976  1.81402123 -0.52853745 -1.52274311]
 [-1.88908994 -2.51524496  0.65479124 -1.35493255]
 [-0.45481315 -0.95748407  0.32510808 -0.72485566]
 [-1.30023408  1.11196363  0.3679345  -0.47827247]
 [ 1.45342624 -1.17394924  0.24154152 -0.79218465]]
<NDArray 5x4 @cpu(0)>

[[ 0.47898006  0.93210429  0.96885103 -3.15577412]
 [-1.02182448  2.19352984 -0.06812762 -0.5385921 ]
 [-0.31868345 -0.8611334  -0.17634277 -1.8815192 ]
 [ 0.35655284 -0.72057074  0.74419165 -0.35601574]
 [ 0.77874237 -0.15963985  0.60878229  1.79744768]]
<NDArray 5x4 @cpu(0)>


Matrices are useful data structures, they allow us to organize data that has different modalities of variation. For example, returning to the example of medical data, rows in our matrix might correspond to different patients, while columns might correspond to different attributes.

We can access the scalar elements $a_{ij}$ of a matrix A by specifying the indices for the row ($i$) and column ($j$) respectively. Let's grab the element $a_{2,3}$ from the random matrix we initialized above.

In [42]:
A[2,3]


[-0.72485566]
<NDArray 1 @cpu(0)>

We can also grab the vectors corresponding to entire rows $\boldsymbol{a}_{i,:}$ or columns $\boldsymbol{a}_{:,j}$.

In [45]:
print(A[2,:])
print(A[:,3])


[-0.45481315 -0.95748407  0.32510808 -0.72485566]
<NDArray 4 @cpu(0)>

[-1.52274311 -1.35493255 -0.72485566 -0.47827247 -0.79218465]
<NDArray 5 @cpu(0)>


## Tensors 

Just as vectors generalize scalars, and matrices generalize vectors, we can actually build data structures with even more axes. Tensors, give us a generic way of discussing arrays with an arbitrary number of axes. Vectors, for example are be first-order tensors, and matrices are second-order tensors.

We'll have to think will become more important when we start working with images, which arrive as 3D data structures, with axes corresponding to the height, width, and the three (RGB) color channels. But in this chapter, we're going to skip past and make sure you know the basics.

## Element-wise operations

Oftentimes, we want to perform element-wise operations. This means that we perform a scalar operation on the corresponding elements of two vectors. So given any two vectors $\boldsymbol{u}$ and $\boldsymbol{v}$ *of the same shape*, and a scalar function $f$, we can perform the operation  we produce vector $\boldsymbol{c} = f(\boldsymbol{u},\boldsymbol{v})$ by setting $c_i \gets f(u_i, v_i)$. 

In [33]:
print(u)
print(v) 
print(u + v)
print(u - v)


[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
<NDArray 10 @cpu(0)>

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
<NDArray 10 @cpu(0)>

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
<NDArray 10 @cpu(0)>

[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
<NDArray 10 @cpu(0)>


We can call element-wise operations on any two tensors of the same shape, including matrices.

In [47]:
print(A + B)
print(A[0,0] + B[0,0])


[[ 1.44873977  2.74612546  0.44031358 -4.67851734]
 [-2.91091442 -0.32171512  0.5866636  -1.89352465]
 [-0.77349663 -1.81861746  0.14876531 -2.60637474]
 [-0.94368124  0.39139289  1.11212611 -0.83428824]
 [ 2.23216867 -1.33358908  0.8503238   1.00526309]]
<NDArray 5x4 @cpu(0)>

[ 1.44873977]
<NDArray 1 @cpu(0)>


## Sums and means 

The next more sophisticated thing we can do with arbitrary tensors is to calculate the sum of their elements. In mathematical notation, we express sums using the $\sum$ symbol. To express the sum of the elements in a vector $\boldsymbol{u}$ of length $d$, we can write $\sum_{i=1}^d u_i$. In code, we can just call ``nd.sum()``.

In [58]:
print(nd.sum(u))


[ 10.]
<NDArray 1 @cpu(0)>


We can similarly express sums over the elements of tensors of arbitrary shape. For example, the sum of the elements of an $m \times n$ matrix A could be written $\sum_{i=1}^{m} \sum{j=1}^{n} a_{i,j}$. 

In [59]:
print(nd.sum(A))


[-6.75379515]
<NDArray 1 @cpu(0)>


A related quantity to the sum is the *mean*, also commonly called the *average*. We calculate the mean by dividing the sum by the total number of elements. With mathematical notation, we could write the average over a vector ${\boldsymbol{u}$ as \frac{1}{d} \sum_{i=1}^{d} u_i$ and the average over a matrix $A$ as  $\frac{1}{n \cdot m} \sum_{i=1}^{m} \sum_{j=1}^{n} a_{i,j}$. In code, we could just call ``nd.mean()`` tensors of arbitrary shape:

In [62]:
print(nd.mean(u))
print(nd.mean(A))


[ 2.]
<NDArray 1 @cpu(0)>

[-0.33768976]
<NDArray 1 @cpu(0)>


## Dot products

<!-- So far, we've only performed element-wise operations, sums and averages. And if this was we could do, linear algebra probably wouldn't deserve it's own chapter. However, -->

One of the most fundamental operations is the dot product. Given two vectors $\boldsymbol{u}$ and $\boldsymbol{v}$, the dot product $\boldsymbol{u}^T \cdot \boldsymbol{v}$ is a sum over the products of the corresponding elements: $\boldsymbol{u}^T \cdot \boldsymbol{v} = \sum_{i=1}^{d} u_i \cdot v_i$.

In [55]:
u = nd.arange(0,5,1.)
v = nd.flip(nd.arange(0,5,1.), 0)
print(u)
print(v)
print(nd.dot(u,v))


[ 0.  1.  2.  3.  4.]
<NDArray 5 @cpu(0)>

[ 4.  3.  2.  1.  0.]
<NDArray 5 @cpu(0)>

[ 10.]
<NDArray 1 @cpu(0)>


Note that we can code the dot product over two vectors ``nd.dot(u, v)`` equivalently by performing an element-wise multiplication and then a sum:

In [63]:
nd.sum(u * v)


[ 10.]
<NDArray 1 @cpu(0)>

Dot products are useful in a wide range of contexts. For example, given a set of weights $\boldsymbol{w}$, the weighted sum of some values ${u}$ could be expressed as the dot product $\boldsymbol{u}^T \boldsymbol{w}$. When the weights are non-negative and sum to one ($\sum_{i=1}^{d} {w_i} = 1$), the dot product expresses a *weighted average*. When two vectors each have length one (we'll discuss what *length* means below in the section on norms), dot products can also capture the cosine of the angle between two vectors.

## Matrix-vector multiplication

## Matrix-matrix multiplication

## Norms