# Reading Assingment 1 
**<center>Logdanidis Pavlos 2071,Vasilopoulos Vasileios 2024**

## 3. Linear algebra in machine learning


### Dataset and Data Files

The ***datasets*** that we use in machine learning ( tables of numbers where each row represents an observation and each column represents a feature of the observation ) are **matrices**, a key data structure in linear algebra.

Moreover, when we split our data into inputs and outputs, we get a **matrix** ($X$) and a **vector** ($y$) which is also an important data structure from linear algebra.

*********

### Images and Photographs
***Images*** are in their core matrices of pixel values .Operations that we do on images such as scaling,cropping and more are all done using liner algebra operations and use its notations.

*********

### One Hot Encoding
A sub-field of linear algebra called **sparse representation** is used in ***one hot endoding*** ,a popular encoding for categorical variables.When working with classification problems ,it is very common to encode categorical variables to make them easier to work with.

*********

### Linear Regression
A very common method used in machine learning for predicting numerical values in  simpler regression problems,is ***Linear Regeression***.The most common way of solving a linear regression problem is via a least squares optimization that uses **matrix factorization** methods( i.e. LU decomposition) from linear algebra.

The linear regression equation is summarized using linear algebra notaion:
<center>$y = A · b$</center>
    
Where $y$ is the output variable, $A$ is the dataset and $b$ are the model coefficients.    

*********

### Regularization
A technique that is often used to minimize the size of coefficients of a model in applied machine learning,is ***regularization*** .Implementations include the $L^2$ and $L^1$ forms of regularization , which are both lifted directly from linear algebra.

*********

### Principal Component Analysis
In machine learning we want to reduce the number of columns of a dataset as much as we can for various reasons.A popular dimensionality reduction method used is ***principal component analysis*** or PCA for short . PCA uses a matrix factorization method from linear algebra.

*********

### Singular-Value Decomposition
Another dimensionality reduction method  is the ***singular-value decomposition*** method or SVD for short.Applications include feature selection,visualization, noise reduction and more.

*******

### Latent Semantic Analysis
In natural language processing, documents are often represented as large matrices of word occurences.Lets say the columns are the known words in the vocabulary and the rows are sentences,paragraphs, pages or documents of text with cells in the matrix marked as the count or frequency of the number of times the word occurred.We can apply SVD to this matrix and that gives us the most compact and relevant words. This form of data preparation is called ***Latent
Semantic Analysis*** or LSA for short.

*******

### Recommender Systems
***Recommender systems*** are predictive modeling problems that involve the reccomendation of products (used a lot by companies like Amazon or Netflix).Linear algebra plays a huge role here.For example we can calculate the similarity of two customers using distance measures such as Euclidean distance
or dot products.SVD is also used to speed up querying,searching and comparison.

*******

### Deep Learning

Artificial ***neural networks*** are used on a range of challenging problems such as machine translation, photo captioning, speech recognition and much more.

Linear algebra data structures like vectors,matrices and tensors(matrices with more than 2 dimensions) are used greatly in neural network execution as well as the description of deep learning methods .

## 4. Introduction to NumPy Arrays

A key data structure in Numpy (Python library for scientific and numerical applications) , is the ***ndarray*** ,short for N-dimensional array.We will refer to it simply as array,and it is a constant size array in memory that contains data of the same type.

In the demo below , we import the array function from the numpy module and create an ndarray from a list of integers.Then,we print the array,its dimensions and its data type.

In [1]:
from numpy import array

list = [1,2,3,4,5]
myArray=array(list)

print(myArray)
print(myArray.shape)
print(myArray.dtype)

[1 2 3 4 5]
(5,)
int32


Now lets look at some numpy functions that create specific arrays, given their dimensions in a tuple.

In [2]:
from numpy import empty

#creates empty array meaning its content is random
array = empty([4,2])
print(array)

[[0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 5.90902512e-321]
 [7.00622146e+247 6.01346930e-154]]


In [3]:
from numpy import zeros

#creates array containing zero values
array =zeros([2,5])
print(array)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [4]:
from numpy import ones

#creates array containing one values
array =ones([3,1])
print(array)

[[1.]
 [1.]
 [1.]]


Next lets look at some numpy functions that create arrays from other arrays.
The ***vstack()*** function stacks two or more arrays on top of each other

In [5]:
from numpy import array
from numpy import vstack

array1=array([1,2,3,4])
array2=array([5,6,7,8])
array3=array([9,10,11,12])

bigArray=vstack((array1,array2,array3))
print(bigArray)
print(bigArray.shape)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)


Similarly we can stack arrays horizontally using the ***hstack()*** function


In [6]:
from numpy import array
from numpy import hstack

array1=array([1,2,3,4])
array2=array([5,6,7,8])
array3=array([9,10,11,12])

bigArray=hstack((array1,array2,array3))
print(bigArray)
print(bigArray.shape)


[ 1  2  3  4  5  6  7  8  9 10 11 12]
(12,)


## 5. Index, Slice and Reshape NumPy Arrays


We can create an array from a list of data using the ***array()*** function.

In [7]:
from numpy import array

list =[1,2,3,4]

data =array(list)

print(data)
print(type(data))

[1 2 3 4]
<class 'numpy.ndarray'>


We can convert a list of lists (which represents two-dimensional data) to an array in the same manner.

In [8]:
data = [[1, 2],
[3, 4],
[5, 6]]

data = array(data)
print(data)
print(type(data))

[[1 2]
 [3 4]
 [5 6]]
<class 'numpy.ndarray'>


Indexing works with the [] operator specifying the index for the value we want.We can use negative indexes to retrieve values offset from the end
of the array. 

In [9]:
print(data[0])
print(data[2])
print(data[-1])

[1 2]
[5 6]
[5 6]


Two dimensional indexing is shown in the demo below.

In [10]:
print(data[0,0])
print(data[0,1])
print(data[-1,-1])

1
2
6


If we want all the items from a row we specify the row number and ignore the column parameter.We do the opposite if we want a specific column.

In [11]:
#ROW
print(data[0,])
#or
print(data[0])
#or
print(data[0,:])

#COLUMN
print(data[:,0])


[1 2]
[1 2]
[1 2]
[1 3 5]


We can index a sub-sequence of an array specifying the *from* and *to* index.The slice extends from the *from* index and ends one item before the *to* index.

In [12]:
#first two rows
data[0:2]

array([[1, 2],
       [3, 4]])

In [13]:
#all elements
data[:]

array([[1, 2],
       [3, 4],
       [5, 6]])

Splitting Input and Output variables.

In [14]:
data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99]])
# X (input) is an array with all rows of data and all columns except the last one
X =data[:,:-1]
print(X)

[[11 22]
 [44 55]
 [77 88]]


In [15]:
# Y (output) is an array with all all rows of data and only the last column
Y =data[:,-1]
print(Y)

[33 66 99]


It is common practice to split our dataset to a *train* and a *test* set.Some rows of data are used to train our model and another to test its skill at predicting values.The training dataset would be all rows from the beginning to
the split point.The test dataset would be all rows starting from the split point to the end of the dimension. The following demo explains this splitting.

In [16]:
from numpy import array

data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99],
[12, 23, 34]])

split = 3
train=data[:split,:]
test=data[split:,:]
print(train)
print(test)



[[11 22 33]
 [44 55 66]
 [77 88 99]]
[[12 23 34]]


Many times in machine learning we want our data's dimensions to meet the expectations of specific libraries.That requires knowing how to reshape arrays.Using the ***shape*** attribute or our array we can access the number of rows and columns as such:

In [17]:
print('Rows: %d' % data.shape[0])
print('Cols: %d' % data.shape[1])

Rows: 4
Cols: 3


In the following example we reshape a one-dimensional array to a two-dimensional one using the ***reshape()*** function.

In [21]:
from numpy import array

data = array([1,2,3,4])
print(data.shape)
#reshape array to a 4 by 1 array
data =data.reshape((data.shape[0],1))
print(data.shape)

(4,)
(4, 1)


Similarly we can reshape 2D to 3D data which is required for the LSTM recurrent neural network model in the Keras deep learning library.

In [22]:
from numpy import array

data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99],
[12, 23, 34]])

print(data.shape)

(4, 3)


In [23]:
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)


(4, 3, 1)
