# Matrix Factorization

This module introduces matrix factorization, which is a powerful technique for big data, text mining, and pre-processing data.

Learning Objectives
- Become familiar with the scikit learn syntax for Non Negative Matrix Factorization
- Explain non negative matrix factorization, and how it makes PCA difficult with many features

## Non Negative Matrix Factorization

![](./images/27_NonNegativeMatrixFactorization.png)

### Why Only Positive Values?
Since NMF can never undo the application of a latent feature,
it is much more careful about what it adds at each step.

In some applications, this can make for more human interpretable latent features.

Because NMF has the extra constraint of positive values,
it will tend to lose more information when truncating.

Also, NMF does not have to give orthogonal latent vectors.


### NMF Summary
Input:
- Count Vectorizer or TF-IDF Vectorizer

Parameters to Tune:
- Number of Topics
- Text Preprocessing (stop words, min / max doc freq, parts of speech...)

Output:
- W Matrix (terms topics) and H Matrix (documents -> topics)

### NMF: the Syntax
Import the class containing the clustering method.
```python
from sklearn.decomposition import NMF 
```
Create an instance of the class.
```python
nmf = NMF(n_components=3, init="random")
```
Fit the instance and create transformed version of the data:

```python
X_nmf = nmf.fit(X)
```

## Dimensionality Reduction: Approaches

Dimensionality reduction is common across a wide range of applications
Some rules of thumb for selecting an approach:

| Method                              | Use case                                                                                                                             |
|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Principal Components Analysis (PCA) | Identify small number of transformed variables with different effects, preserving variance                                           |
| Kernel PCA                          | Useful for situations with nonlinear relationships, but requires more computation than PCA                                           |
| Multidimensional Scaling            | Like PCA, but new (transformed features) are determined based on preserving distance between points, rather than explaining variance |
| Non-negative Matrix Factorization   | Useful when you want to consider only positive values (word matrices, images)                                                        |
