# Models
<div style="position: absolute; right:0;top:0"><a href="../evaluation.ipynb" style="text-decoration: none"> <font size="5">↑</font></a></div>

This module runs the model, which is the first step of the topic modeling.

--- 

#### [Models App](./models.app.ipynb)

Run a model.

---
## Models

- K-Means ([KMeans](./clustering.py))
- NMF ([NFM](./nmf.py))
- NMF with positive shift ([ShiftNMF](./nmf.py))
- Wordembedding NMF ([WeNMF](./wenmf.py))

---

## Config

Each model is defined as an entry in `config.models['list']` as

```json
"identifier": {
    "name": STRING,
    "run": BOOLEAN,
    "mod": STRING,
    "cls": STRING,
    "vector": STRING,
    "token": STRING,
    OPTIONAL_PARAMETERS
    }
```
with
- `identifier`: A uique string to identify the model  
- `name`: Name for printing  
- `run`: True if the model should run during evaluation  
- `mod`: py module file where containing the model class
- `cls`: classname within the module
- `vector` (optional,default="BCP"): Vector types accepted by the model
- `token` (optional,default="BC"): Accepted token types for BoW vectors
- `OPTIONAL_PARAMETERS`: To be used by the model class


## Model

The Model class needs to inherit from `ModelBase` and implement `_output_of` and `_run`. You may define more than one class per file.

```python
class MyModel(ModelBase):
    
    def _output_of(self, info):
        if <is_applicable>:
            return [<output1>,<output2>]
    
    def _run(self, info):
        <set_outputs>
```

- `_output_of()`: If the model cannot be applied for the current info `_output_of` does not have to return anything or returns `None`.
  Otherwise it returns a list containing at least one of the following. Do not return both `H` amd `c`
  - `'W'`: The model returns a term-by-topic matrix W. `_run` must set `self.W`.
  - `'H'`: The model returns a topics-by-document matrix H. `_run` must set `self.H`.
  - `'c'`: The model returns a document classification vector c. `_run` must set `self.c`.
- `_run()`: This runs the model and computes the outputs
  - `self.input_mat`: The input matrix from the vectorizer module. It gets loaded before `_run()` is called. It may be sparse or dense.