<a href="https://colab.research.google.com/github/yunsing/Compsci361/blob/master/DataStreamEvaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Stream Mining**

Inspired by MOA and MEKA , following scikit-learn philosophy. When the installation is completed (and no errors were reported), then you will be ready to use scikit-multiflow.

In [0]:
!pip install -U scikit-multiflow

Generate synthetic data stream using SEAGenerator and then evaluate the stream using the HoeffdingTree. We use a holdout mechansime to evaluate the results. The evaluation metrics used to evaluate this was accuracy and kappa.

The holdout evaluation method, or periodic holdout evaluation method, analyses each arriving sample by updating its statistics, without computing performance metrics, nor predicting labels or regression values.

The performance evaluation happens at every n_wait analysed samples, at which moment the evaluator will test the learners performance on a test set, formed by yet unseen samples, which will be used to evaluate performance, but not to train the model.

```
class skmultiflow.evaluation.evaluate_holdout.EvaluateHoldout(n_wait=10000 (default), max_samples=100000, batch_size=1, max_time=inf, metrics=None, output_file=None, show_plot=False, restart_stream=True, test_size=5000, dynamic_test_set=False)

```

What do you observe from the outcome of the code below?




In [0]:
from skmultiflow.data import SEAGenerator
from skmultiflow.trees import HoeffdingTree
from skmultiflow.evaluation import EvaluateHoldout

# Set the stream
stream = SEAGenerator(random_state=1)
stream.prepare_for_use()

# Set the model
ht = HoeffdingTree()

# Set the evaluator
evaluator = EvaluateHoldout(max_samples=100000,
                            max_time=1000,
                            show_plot=False,
                            metrics=['accuracy', 'kappa'],
                            dynamic_test_set=True)

# Run evaluation
evaluator.evaluate(stream=stream, model=ht, model_names=['HT'])

The prequential evaluation method, or interleaved test-then-train method, is an alternative to the traditional holdout evaluation, inherited from batch setting problems.

The prequential evaluation is designed specifically for stream settings, in the sense that each sample serves two purposes, and that samples are analysed sequentially, in order of arrival, and become immediately inaccessible.



```
class skmultiflow.evaluation.evaluate_prequential.EvaluatePrequential(n_wait=200, max_samples=100000, batch_size=1, pretrain_size=200, max_time=inf, metrics=None, output_file=None, show_plot=False, restart_stream=True, data_points_for_classification=False
```



In [0]:
from skmultiflow.data import SEAGenerator
from skmultiflow.trees import HoeffdingTree
from skmultiflow.evaluation import EvaluatePrequential

# Set the stream
stream = SEAGenerator(random_state=1)
stream.prepare_for_use()

# Set the model
ht = HoeffdingTree()

# Set the evaluator

evaluator = EvaluatePrequential(max_samples=10000,
                                max_time=1000,
                                show_plot=False,
                                metrics=['accuracy', 'kappa'])


# Run evaluation
evaluator.evaluate(stream=stream, model=ht, model_names=['HT'])


In [0]:
from skmultiflow.data import SEAGenerator
from skmultiflow.trees import HoeffdingTree
from skmultiflow.trees.hoeffding_adaptive_tree import HAT
from skmultiflow.evaluation import EvaluatePrequential


# Set the stream
stream = SEAGenerator(random_state=1)
stream.prepare_for_use()

# Set the model
ht = HoeffdingTree()
hat = HAT()

# Set the evaluator

evaluator = EvaluatePrequential(max_samples=10000,
                                max_time=1000,
                                show_plot=False,
                                metrics=['accuracy', 'kappa'],data_points_for_classification=False)


# Run evaluation
evaluator.evaluate(stream=stream, model=[ht, hat], model_names=['HT', 'hat'])



