<a href="https://colab.research.google.com/github/yunsing/Compsci361/blob/master/DriftDetector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Concept Drift Detection: DDM & ADWIN

Inspired by MOA and MEKA , following scikit-learn philosophy. When the installation is completed (and no errors were reported), then you will be ready to use scikit-multiflow.

In [0]:
!pip install -U scikit-multiflow

The skmultiflow.drift_detection module includes methods for Concept Drift Detection.


This concept change detection method is based on the PAC learning model premise, that the learner’s error rate will decrease as the number of analysed samples increase, as long as the data distribution is stationary.


If the algorithm detects an increase in the error rate, that surpasses a calculated threshold, either change is detected or the algorithm will warn the user that change may occur in the near future, which is called the warning zone.


The detection threshold is calculated in function of two statistics, obtained when (pi + si) is minimum: pmin: The minimum recorded error rate. smin: The minimum recorded standard deviation.


At instant i, the detection algorithm uses: pi: The error rate at instant i. si: The standard deviation at instant i.


The conditions for entering the warning zone and detecting change are as follows: `if pi + si >= pmin + 2 * smin -> Warning zone if pi + si >= pmin + 3 * smin `-> Change detected

**Analyse the results from DDM.**

In [0]:
import numpy as np

from skmultiflow.drift_detection import DDM
ddm = DDM()

# Simulating a data stream as a normal distribution of 1's and 0's
data_stream = np.random.randint(2, size=10000)
# Changing the data concept from index 4999 to 10000
for i in range(4999, 10000):
     data_stream[i] = np.random.randint(4, high=8)
    
# Adding stream elements to ADWIN and verifying if drift occurred
for i in range(10000):
     ddm.add_element(data_stream[i])
     if ddm.detected_warning_zone():
         print('Warning zone has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))
     if ddm.detected_change():
         print('Change has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))

ADWIN  (ADaptive WINdowing) is an adaptive sliding window algorithm for detecting change, and keeping updated statistics about a data stream. ADWIN allows algorithms not adapted for drifting data, to be resistant to this phenomenon.


The general idea is to keep statistics from a window of variable size while detecting concept drift.


The algorithm will decide the size of the window by cutting the statistics’ window at different points and analysing the average of some statistic over these two windows. If the absolute value of the difference between the two averages surpasses a pre-defined threshold, change is detected at that point and all data before that time is discarded.


**What is the difference between the results for DDM and ADWIN?**

In [0]:
import numpy as np
from skmultiflow.drift_detection.adwin import ADWIN
adwin = ADWIN()
# Simulating a data stream as a normal distribution of 1's and 0's
data_stream = np.random.randint(2, size=10000)
# Changing the data concept from index 4999 to 10000
for i in range(4999, 10000):
     data_stream[i] = np.random.randint(4, high=8)
# Adding stream elements to ADWIN and verifying if drift occurred
for i in range(10000):
     adwin.add_element(data_stream[i])
     if adwin.detected_change():
         print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))

**What happens if you alter the delta value in ADWIN?**


Now try to use another drift detector 