# Assignment 1
## Linear Algebra and Probability
Machine Learning

2. Let $D = \{d_1,\cdots,d_m\}$  be a set of documents and $T = \{t_1,\cdots,t_m\}$ a set of terms (words). Let
$TD = (TD_{i,j})_{i=1\cdots m,j=1\cdots n}$ be a matrix such that $(TD)_{i,j}$ corresponds to the number of times
the term $t_i$ appears in the document $d_j$. Also, let $l_i$ be the length, number of characters, of term $t_i$, and let $L = (l_1,\cdots,l_m)$ be a column vector. Finally, assume a process where a document $d_j$ is randomly chosen with uniform probability and then a term $t_i$, present in $d_j$, is randomly chosen with a probability proportional to the frequency of $t_i$ in $d_j$.

For all the following expressions you must provide:
- a mathematical expression to calculate it that includes TD, L, constants (scalars, vectors
or matrices) and linear algebra operations
-  a expression in Numpy [scipy](http://www.scipy.org) that, when evaluated, generates the re-
quested matrix, vector or scalar (the expression must be a linear algebra expression that
does not involve control structures such as for, while etc.)
-  the result of evaluating the expression, assuming:

$$
TD=\begin{pmatrix}
2&3&0&3&7\\
0&5&5&0&3\\
5&0&7&3&3\\
3&1&0&9&9\\ 
0&0&7&1&3\\
6&9&4&6&0\\
\end{pmatrix}
,\qquad L=\begin{pmatrix}
5\\
2\\
3\\
6\\
4\\
3\\
\end{pmatrix}
$$


1. Matrix $P(T,D)$(each position of the matrix, $P(T,D)_{i,j}$ , corresponds to the joint probability of term $t_i$ and document $d_j$ , $P(t_i, d_j)$)
1. Matrix $P(T|D)$
1. Matrix $P(D|T)$
1. Vector $P(D)$
1. Vector $P(T)$
1. $E[l]$ (the expected value of the random variable $l$ corresponding to the length of a randomly chosen term)
1. $\text{var}(l)$ (The Variance of $l$)

## Solution

First, we define the matrix $TD$ and the vector $L$ 

In [59]:
import numpy as np

In [60]:
TD=np.matrix([
    [2,3,0,3,7],
    [0,5,5,0,3],
    [5,0,7,3,3],
    [3,1,0,9,9],
    [0,0,7,1,3],
    [6,9,4,6,0],
    ])
print(TD)

[[2 3 0 3 7]
 [0 5 5 0 3]
 [5 0 7 3 3]
 [3 1 0 9 9]
 [0 0 7 1 3]
 [6 9 4 6 0]]


We use `np.matrix([]).T` so that the result is a column vector as it should be, without the `T` the result is a row vector.

In [61]:
L=np.matrix([5,2,3,6,4,3]).T
print(L)

[[5]
 [2]
 [3]
 [6]
 [4]
 [3]]


1. The joint probability $P(T,D)$ measures the _likelihood_ of having $T$ and $D$ at the same time. As the two variables are independent (We do not have reasons to say they are correlated) it can be seen as the product of the two probabilities
$$
P(T=t_i,D=d_j)=P(T=t_i)\cdot P(D=d_j)
$$
It reads that the probability of having $T=t_i$ and $D=d_j$ together is the product of them.

We have to consider the following,

- As the process tells us that the probability of choosing $d_i$ is __uniform!__, all documents have the same probability
$$
P(D=d_i)=\frac{1}{\# \text{ of Documents}}
$$
The number of documents is the same number of columns!.
- As the probability of choosing $t_i$ depends on the frequencies, we have to normalize them, to do so we sum the values of the columns and then dividing each column by its result.

In [62]:
n_rows,n_cols=np.shape(TD)
print(n_rows,n_cols)

6 5


The frequencies are calculating by summing all over the columns,

In [63]:
FTD=np.sum(TD,axis=0) #sum each row?
print(FTD) 

[[16 18 23 22 25]]


`NumPy` can take a matrix (or ndarray) and divide it by other structure of:
- The same size: Divides component by component.
- Scalar: Divides all by the number.
- 1d Array: if they have the same number of __columns__, it divides each column by the correspondent number (As it is our case).


In [65]:
Aux=TD/FTD

[[0.125      0.16666667 0.         0.13636364 0.28      ]
 [0.         0.27777778 0.2173913  0.         0.12      ]
 [0.3125     0.         0.30434783 0.13636364 0.12      ]
 [0.1875     0.05555556 0.         0.40909091 0.36      ]
 [0.         0.         0.30434783 0.04545455 0.12      ]
 [0.375      0.5        0.17391304 0.27272727 0.        ]] 5.0


And then multiplying by the probability of choosing the file $i$th,

In [67]:
PTD=Aux/n_cols
print(PTD,"verificamos que esté bien",np.sum(PTD))

[[0.025      0.03333333 0.         0.02727273 0.056     ]
 [0.         0.05555556 0.04347826 0.         0.024     ]
 [0.0625     0.         0.06086957 0.02727273 0.024     ]
 [0.0375     0.01111111 0.         0.08181818 0.072     ]
 [0.         0.         0.06086957 0.00909091 0.024     ]
 [0.075      0.1        0.03478261 0.05454545 0.        ]] verificamos que esté bien 1.0


As the two events are independent
$$
P(A|B)=P(A)
$$
and
$$
P(B|A)=P(B)
$$
<font color='red'>Note</font> Esto es lo que no me cuadra

The vector probabilities are calculated as the sum over rows and columns respectively
for the file

In [68]:
Prob_d=np.sum(PTD,axis=0)
print(Prob_d)

[[0.2 0.2 0.2 0.2 0.2]]


In [32]:
PAgivenB=(np.array([np.sum(PTD, axis=0)]*n_rows).reshape(np.shape(TD)))/n_rows
print(PAgivenB)

[[0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]
 [0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]
 [0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]
 [0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]
 [0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]
 [0.03333333 0.03333333 0.03333333 0.03333333 0.03333333]]


In [34]:
PBgivenA=(np.array([np.sum(PTD, axis=1)]*n_cols).T.reshape(np.shape(TD)))/n_cols
print(PBgivenA)

[[0.02832121 0.02832121 0.02832121 0.02832121 0.02832121]
 [0.02460676 0.02460676 0.02460676 0.02460676 0.02460676]
 [0.03492846 0.03492846 0.03492846 0.03492846 0.03492846]
 [0.04048586 0.04048586 0.04048586 0.04048586 0.04048586]
 [0.01879209 0.01879209 0.01879209 0.01879209 0.01879209]
 [0.05286561 0.05286561 0.05286561 0.05286561 0.05286561]]


In [36]:
PBgivenA2=(TD/FTD)/n_cols
print(PBgivenA2)

[[0.025      0.03333333 0.         0.02727273 0.056     ]
 [0.         0.05555556 0.04347826 0.         0.024     ]
 [0.0625     0.         0.06086957 0.02727273 0.024     ]
 [0.0375     0.01111111 0.         0.08181818 0.072     ]
 [0.         0.         0.06086957 0.00909091 0.024     ]
 [0.075      0.1        0.03478261 0.05454545 0.        ]]


In [None]:
PBgivenA3=(PTD/(prob_d))

And for the text

In [28]:
Prob_t=np.sum(PTD,axis=1)
print(Prob_t)

[[0.14160606]
 [0.12303382]
 [0.17464229]
 [0.20242929]
 [0.09396047]
 [0.26432806]]


The expected value of a discrete distribution is calculated as
$$
E[X]=\sum_i p_ix_i
$$
we assume all of them to be equally probable

In [11]:
def esp(X):
    return np.sum(X)/len(X)
Esp_l=esp(L)
print(Esp_l)

3.8333333333333335


And the variance, can be calculated as
$$
\text{Var}(X)=E[(X-E[X])^2]
$$
so,

In [12]:
Var_l=esp(np.square(L-Esp_l))
Var_l

1.8055555555555556

The function `np.square` calculates the square component wise.