-  Linear Algebra: mathematics that deals with vector spaces.

# 1. Vector
- How useful the vectors are for data science?
  1. For us, vectors are some points in a free dimensional space.
  2. Used to represent any numeric data as list of numbers.
     
- In Python, lists are normally used to represent vectors for the sake of simplicity and ease of understanding.
- Using lists as vectors is great for learning but highly inefficient for performance.
- In production code, we use NumPy library, which includes a high performance array class with built in arithmetic operations.

In [1]:
# List as vector
from typing import List  # Type annotation for list

vector = List[float] # Vector defined as a list of float values


# Examples

height_weight_age = [160, # Vector containing height, weight, age
                     65,
                     40]

grades = [95, # Vector with grades in four tests
          78,
          92,
          65]

### 1. add(a: Vector, b: Vector) -> Vector

- If we simply add two lists, it will create a new concatenated list.
- Therefore, as lists are not vectors, we need to perform vector operations on them using zip-ing.

In [45]:
# Adding two vectors a and b

from typing import List 

Vector= List[float]     # Define type annotation for vector

def add_longcode(a: Vector, b: Vector) -> Vector: 
    """
    Adds corresponding elements
    """
    assert len(a)==len(b) # Input vectors must be of same length
    
    sum_vector = [] 
    zip_vector = zip(a,b) # List of Tuples
    
    for a_i, b_i in zip_vector:
        sum_i = a_i + b_i
        sum_vector.append(sum_i)
    
    return sum_vector

assert add_longcode([1,2], [5,2]) == [6,4]

- The above program can be shortened as:

In [46]:
# See how to make our code compact

from typing import List  # For lists' type annotation

Vector = List[float]  

def add(a: Vector, b: Vector) -> Vector:
    """Adds corresponding elements"""
    
    assert len(a)==len(b), "Input vectors must be of same length"
    
    return [(a_i + b_i) for a_i, b_i in zip(a, b)]

assert add([3,4],[2,1]) == [5,5]

# assert add([2,3],[3,5,3]) == [5,8]   # Generate assertion error, not same length

### 2. subtract(a: Vector, b: Vector) -> Vector

In [47]:
# Subtracting two vectors a and b

from typing import List 

Vector = List[float]

def subtract(a: Vector, b: Vector) -> Vector:
    """Subtracts corresponding elements"""

    assert len(a) == len(b)

    return [(a_i - b_i) for a_i, b_i in zip(a,b)]

assert subtract([1,2],[3,4]) == [-2,-2]

## 3. vector_sum(list_of_vectors: List[Vector]) -> Vector

**Component-wise operation**
- Now we add more than two vectors which are given as list of lists (i.e. list of vectors).

In [48]:
from typing import List 
Vector = List[float]

def vector_sum_longcode(list_of_vectors: List[Vector]) -> Vector:
   
    # Check if list_of_vectors is empty
    assert list_of_vectors, "no vectors provided"

    # Check if vectors are of same size
    l = len(list_of_vectors[0])  # Get length of first vector
    assert all(len(v)==l for v in list_of_vectors), "vectors are of different sizes!"  # all() returns True, 
                                                                                       # if all elements in given iterable are true
    s = [0]*l  # Define 0 value list of length l
    sum_vector = [] # Define empty vector to collate sum values
    
    for i in range(l):
        for v in list_of_vectors:
            s[i] = sum([s[i],v[i]])
        sum_vector.append(s[i])
    return sum_vector

assert vector_sum_longcode([[1,2],[3,4],[5,6]]) == [9,12]

- Above code can be written in one line, because sum() takes a list/array to calculate sum.

In [49]:
from typing import List 
Vector = List[float]

def vector_sum(list_of_vectors: List[Vector]) -> Vector:
    """
    sum of all corresponding elements
    """
   
    # Check if list_of_vectors is empty
    assert list_of_vectors, "no vectors provided"

    # Check if vectors are of same size
    l = len(list_of_vectors[0])  #length of first vector

    assert all(len(v)==l for v in list_of_vectors), "vectors are of different sizes!" # All() returns True, 
                                                                                      # if all elements in given iterable are true

    return [sum(v[i] for v in list_of_vectors) for i in range(l)]

assert vector_sum([[1,2],[3,4],[5,6]]) == [9,12]

- Above code can be made more compact using zipping.

In [50]:
from typing import List
Vector = List[float]

def vector_sum_compact(list_of_vectors: List[Vector]) -> Vector:
    return [sum(t) for t in zip(*[v for v in list_of_vectors])]

assert vector_sum_compact([[1,2],[3,4],[5,6]]) == [9,12]

## 4. scalar_multiply(v:Vector, c:float) -> Vector

In [51]:
from typing import List
Vector = List[float]

def scalar_multiply(v:Vector, c:float) -> Vector:
    """
    multiplies every element by c
    """
    l = len(v)
    return([c*v[i] for i in range(l)])

assert scalar_multiply([1,2,3],2) == [2,4,6]

## 5. vector_mean(v: List[Vector]) -> Vector
- Example list of vector = `[[1,2],[3,4],[5,6]]`.
- We want output as mean of 1,3,5 and 2,4,6 in a vector of 2 elements.
- First find componentwise sum vector of, then divide by number of vectors.

In [54]:
def vector_mean(v: List[Vector]) -> Vector:
    """
    Computes the element-wise average
    """
    a = vector_sum(v)
    return scalar_multiply(a, 1/len(v))
    

assert vector_mean([[1,2],[3,4],[5,6]]) == [3, 4]

## 6. dot_product(a: Vector, b: Vector) -> float
- `a.b = sum(a[i]*b[i])`

In [55]:
def dot(a: Vector, b: Vector) -> float:
    """
    Computes v_1 * w_1 + ... + v_n * w_n
    """
    assert len(a)==len(b), "different sizes"
    
    l = len(a)
    
    return(sum(a[i]*b[i] for i in range(l)))

assert dot([1,2,3],[4,5,6]) == 32

- If vector `a` has magnitude 1, the dot product measures how far the vector `b` extends in the `a` direction.

## 7. sum_of_squares(a: Vector) -> float

In [56]:
from typing import List
import math

Vector = List[float]

def sum_of_squares(a: Vector) -> float:
    """
    Returns v_1 * v_1 + ... + v_n * v_n
    """
    l = len(a)
    sum_a = sum(math.pow(a[i],2) for i in range(l))
    
    return sum_a

assert sum_of_squares([1,2])==5

## 8.  magnitude(a: Vector)
- magnitude of Vector = sqrt(sum_of squares)

In [57]:
def magnitude(a: Vector):
    """
    Returns the magnitude (or length) of v
    """
    return math.sqrt(sum_of_squares(a))
    
assert magnitude([3,4]) == 5 

## 9. squared_distance(a: Vector, b: Vector) ->float

- I will be used in k-means clustering ([20_clustering.ipynb](20_clustering.ipynb))


In [58]:
def squared_distance(a: Vector, b: Vector) -> float:
    """
    Computes (v_1 - w_1) ** 2 + ... + (v_n - w_n) ** 2
    """
    return sum_of_squares(subtract(a,b))

assert squared_distance([1,2],[4,6]) == 25


## 10. distance(a: Vector, b: Vector) -> float

In [60]:
import math
from typing import List

Vector = List[float]

def distance(a: Vector, b: Vector) -> float:
    """
    Computes the distance between v and w
    """
    return math.sqrt(squared_distance(a,b))

assert distance([1,2],[4,6])==5


# Can also be written as:
def distance(v: Vector, w: Vector) -> float:
    return magnitude(subtract(v, w))

# 2. Matrices
- List of Lists of same sizes.
- ```python
  A[i][j]
- Means Element $i^{th}$ row and $j^{th}$ column of matrix A.

In [None]:
Matrix = List[List[float]]   # Type alias/annotation 

A = [[1,2,3], [4,5,6]]  # rows=2, col=3
B = [[1,2], [3,4], [5,6]] # rows=3, col=3


- Indexing in a matrix starts with 0, as it is list of lists.

## 1. shape(A: Matrix) -> Tuple[int, int]

In [64]:
# Shape of matrix
from typing import Tuple

Matrix = List[List[float]]

def shape(A: Matrix) -> Tuple[int, int]:
    """
    Returns (# of rows of A, # of columns of A)
    """
    n_rows = len(A)
    n_col = len(A[0]) if A else 0  
    return (n_rows, n_col)

assert shape ([[1,2,3], [4,5,6]]) == (2,3)
assert shape([]) == (0,0)

## 2. get_row(A: Matrix, i: int) -> List

In [32]:
from typing import List

Matrix = List[List[float]]

def get_row(A: Matrix, i: int) -> List:
    """
    Returns the i-th row of A (as a Vector)
    """
    return A[i]

  
assert get_row([[1,2,3], [4,5,6]], 1) == [4,5,6]

## 3. get_column(A: Matrix, j: int) -> List

In [65]:
from typing import List

Matrix = List[List[float]]

def get_column(A: Matrix, j: int) -> List:
    """
    Returns the j-th column of A (as a Vector)
    """
    return [r[j] for  r in A]

assert get_column([[1,2,3], [4,5,6]], 1) == [2,5]

## 4. make_matrix(num_rows: int, num_cols: int, entry_fn: Callable[[int, int], float] -> Matrix

In [67]:
# make a matrix with values defined by entry_fn
from typing import Callable, List

def make_matrix(num_rows: int,
                num_cols: int, 
                entry_fn: Callable[[int, int],float]) -> Matrix:  # in callable: Callable[input type, output type]
    """
    Returns a num_rows x num_cols matrix
    whose (i,j)-th entry is entry_fn(i, j)
    """
    return [[entry_fn(i,j) for j in range(num_cols)] for i in range(num_rows)]


- create indentity matrix using make_matrix()

## 5. identity_matrix(size: int) -> Matrix

In [68]:
#5x5 I matrix

def identity_matrix(size: int) -> Matrix:
    """Returns the n x n identity matrix"""
    return make_matrix(size, size, lambda i, j: 1 if i==j else 0)

assert identity_matrix(5) == [[1, 0, 0, 0, 0], 
                              [0, 1, 0, 0, 0],
                              [0, 0, 1, 0, 0],
                              [0, 0, 0, 1, 0],
                              [0, 0, 0, 0, 1]]

## How to use matrices in Data Science?

1. To represent dataset containing multiple vectors. e.g. age, height, weight of 1000 people as 1000x3 matrix.
2. We use an n × k matrix to represent a linear function that maps k-dimensional vectors to n-dimensional vectors.
3. To represent binary relationships.

- Example: users 0-9 are connected to each other in following given way  
   friendships = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4),(4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]

This can be represented in form of matrix to read and understand easily. Create 9 rows and 9 columns, if tuple (i,j) is present in 'friendship' put value=1.

In [80]:
friend_matrix = [[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], # user 0
                 [1, 0, 1, 1, 0, 0, 0, 0, 0, 0], # user 1
                 [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], # user 2
                 [0, 1, 1, 0, 1, 0, 0, 0, 0, 0], # user 3
                 [0, 0, 0, 1, 0, 1, 0, 0, 0, 0], # user 4
                 [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # user 5
                 [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 6
                 [0, 0, 0, 0, 0, 1, 0, 0, 1, 0], # user 7
                 [0, 0, 0, 0, 0, 0, 1, 1, 0, 1], # user 8
                 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]] # user 9

assert friend_matrix[0][2] == 1, "0 and 2 are not friends"
# assert friend_matrix[1][1] == 1, "1 and 1 are not friends"

In [79]:
# To find any node's (e.g. 2nd user's) connection

def connections_of(u: int):
    return [user_id 
               for user_id, connection in enumerate(friend_matrix[u]) 
               if connection == 1]

assert connections_of(0) == [1,2]
assert connections_of(8) == [6,7,9]