# Metrics
<div style="position: absolute; right:0;top:0"><a href="../evaluation.py.ipynb" style="text-decoration: none"> <font size="5">↑</font></a></div>

This module computes various metrics to evaluate the topic models.

--- 

#### [Metric Viewer](./metric_viewer.app.ipynb)

Show all metric results.

---

## General

The metrics module is somewhat different from the previous modules in that all outputs are written to a single file and it will only evaluate available results, i.e.
it will not run any previous module automatically.
Similar to previous modules it will not compute a metric if the entry already exists in the file.

### Helper

- `load_ground_truth_classes`: loads ground truth classes from file

## Model Metrics

Those metrics work solely on the results of the Model module.
They are all evaluated with respect to ground truth classification information and can not be applied if those labels are missing.
Each Model has to return at least an H matrix that is a nonnegative topics-by-documents matrix. 
Each entry represents the affinity of the corresponding topic and document. 
Alternatively a model may return a class array of length equal to the number of documents where each entry is an integer $k \in\{1,...,\text{num_topics}\}$.
Note that the ids of the topics do not neccessarily correspond to ground truth labels.
An H matrix can be converted into a class array by finding the maximum index of each column. 

### Clustering

The following metrics can be used to evaluate class arrays:

- [Normalized Mutual Information (NMI)](./clustering.ipynb#NMI)
- [Adjusted Rand index (ARI)](./clustering.ipynb#ARI)

### Classification 

If the model returns an H matrix it is possible to train a classifier on the matrix and evaluate its classification performance using the following metrics.
5 fold cross validation is applied.

#### Classifiers
- SVM

#### Metrics
- micro averaged F1 score



## Distiller Metrics

`TODO`

## Details

**Clustering**  
Each clustering metric is defined in config.metrics['clustering'] as
```json
"identifier": {
        "name": STRING,
        "run": BOOLEAN,
        "filename": STRING,
        "function": STRING
      }
```
The function should have the following signature
```python
def my_metric(labels_true, labels_pred):
    # compute metric
    return metric
```
which is analogue to scikit learn metrics. 