# Matrix Factorization

This module introduces matrix factorization, which is a powerful technique for big data, text mining, and pre-processing data.

Learning Objectives
- Become familiar with the scikit learn syntax for Non Negative Matrix Factorization
- Explain non negative matrix factorization, and how it makes PCA difficult with many features

## Non Negative Matrix Factorization

![](./images/27_NonNegativeMatrixFactorization.png)

### Why Only Positive Values?
Since NMF can never undo the application of a latent feature,
it is much more careful about what it adds at each step.

In some applications, this can make for more human interpretable latent features.

Because NMF has the extra constraint of positive values,
it will tend to lose more information when truncating.

Also, NMF does not have to give orthogonal latent vectors.


### NMF Summary
Input:
- Count Vectorizer or TF-IDF Vectorizer

Parameters to Tune:
- Number of Topics
- Text Preprocessing (stop words, min / max doc freq, parts of speech...)

Output:
- W Matrix (terms topics) and H Matrix (documents -> topics)

### NMF: the Syntax
Import the class containing the clustering method.
```python
from sklearn.decomposition import NMF 
```
Create an instance of the class.
```python
nmf = NMF(n_components=3, init="random")
```
Fit the instance and create transformed version of the data:

```python
X_nmf = nmf.fit(X)
```

## Dimensionality Reduction: Approaches

Dimensionality reduction is common across a wide range of applications
Some rules of thumb for selecting an approach:

| Method                              | Use case                                                                                                                             |
|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Principal Components Analysis (PCA) | Identify small number of transformed variables with different effects, preserving variance                                           |
| Kernel PCA                          | Useful for situations with nonlinear relationships, but requires more computation than PCA                                           |
| Multidimensional Scaling            | Like PCA, but new (transformed features) are determined based on preserving distance between points, rather than explaining variance |
| Non-negative Matrix Factorization   | Useful when you want to consider only positive values (word matrices, images)                                                        |


# Summary
## Non Negative Matrix Decomposition
Non Negative Matrix Decomposition is another way of reducing the number of dimensions. Similar to PCA, it is also a matrix decomposition method in the form V=WxH.

The main difference is that it can only be applied to matrices that have positive values as inputs, for example:

- pixels in a matrix

- positive attributes that can be zero or higher

In the case of word and vocabulary recognition, each row in the matrix can be considered a document, while each column can be considered a topic.

NMF has proven to be powerful for:

- word and vocabulary recognition

- image processing, 

- text mining

- transcribing

- encoding and decoding

- decomposition of video, music, or images

There are advantages and disadvantages of only dealing with non negative values.

An advantage, is that NMF leads to features that tend to be more interpretable. For example, in facial recognition, the decomposed components match to something more interpretable like, for example, the nose, the eyebrows, or the mouth.

A disadvantage is that NMF truncates negative values by default to impose the added constraint of only positive values. This truncation tends to lose more information than other decomposition methods.

Unlike PCA, it does not have to use orthogonal latent vectors, and can end up using vectors that point in the same direction.

## NMF for NLP
In the case of Natural Language Processing, NMF works as below given these inputs, parameters to tune, and outputs:

### Inputs

Given vectorized inputs, which are usually pre-processed using count vectorizer or vectorizers in the form of Term Frequency - Inverse Document Frequency (TF-IDF).

### Parameters to tune

The main two parameters are:

- Number of Topics

- Text Preprocessing (stop words, min/max document frequency, parts of speech, etc)

### Output

The output of NMF will be two matrices:

W Matrix telling us how the terms relate to the different topics.

H Matrix telling us how to use those topics to reconstruct our original documents.

### Syntax
 The syntax consists of importing the class containing the clustering method:

   from sklearn.decomposition import NMF

 creating the instance of the class:

    nmf=NMF(n_components=3, init='random')

and fit the instance and create a transformed version of the data:

   x_nmf=NMF.fit(X)  