Skip to content

Commit

Permalink
DISCO.
Browse files Browse the repository at this point in the history
  • Loading branch information
xehivs committed Dec 11, 2019
1 parent bdb9cea commit d2f026a
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 1 deletion.
Binary file added doc/_static/disco.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 32 additions & 1 deletion doc/streams.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ The cyclic repetition of class distributions is a completely different property
.. image:: _static/recurring.png

Non-recurring gradual drift
Non-recurring drift
---------------------------

The default mode of consecutive concept occurences is a non-recurring drift, where in each concept drift we are generating a completely new, previously unseen class distribution.
Expand All @@ -121,9 +121,20 @@ The default mode of consecutive concept occurences is a non-recurring drift, whe
Class imbalance
===============

Another area of data stream properties, different from the concept drift phenomenon, is the a priori probability of problem classes. By default, a balanced stream is generated, i.e. one in which patterns of all classes are present in a similar number.

.. code-block:: python
StreamGenerator()
.. image:: _static/stationary.png


Stationary imbalanced stream
----------------------------

The basic type of problem in which we are dealing with disturbed class distribution is a *dataset imbalanced stationary*, where the classes maintain a predetermined proportion in each chunk of data stream. To acquire this type of a stream, one should pass the ``list`` to the ``weights`` parameter of the generator (i) consisting of as many elements as the classes in the problem and (ii) adding to one.


.. code-block:: python
Expand All @@ -134,9 +145,29 @@ Stationary imbalanced stream
Dynamically imbalanced stream
-----------------------------

A less common type of *imbalanced data*, impossible to obtain in static datasets, is *data imbalanced dynamically*. In this case, the class distribution is not constant throughout the course of a stream, but changes over time, similar to changing the concept presence in gradual streams. To get this type of data stream, we pass a ``tuple`` of three numeric values to the ``weights`` parameter of the generator:

- the number of cycles of distribution changes,
- ``concept_sigmoid_spacing`` parameter, deciding about the dynamics of changes on the same principle as in gradual and incremental drifts,
- range within which oscillation is to take place.

.. code-block:: python
StreamGenerator(weights=(2, 5, 0.9))
.. image:: _static/dynamic-imbalanced.png

Mixing drift properties
=======================

Of course, when generating data streams, we don't have to limit ourselves to just one modification of their properties. We can easily prepare a stream with many drifts, any dynamics of changes, a selected type of drift and a diverse, dynamic imbalanced ratio. The last example in this chapter of User Guide is such proposition, namely, DISCO (Dynamically Imbalanced Stream with Concept Oscillation).

.. code-block:: python
StreamGenerator(
weights=(2, 5, 0.9), n_drifts=3, concept_sigmoid_spacing=5,
recurring=True, incremental=True
)
.. image:: _static/disco.png
Binary file added examples/disco.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions examples/plot_stream_plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,21 @@ def plot_stream(stream, filename="foo", title=""):
plot_stream(
stream, "dynamic-imbalanced", "Data stream with dynamically imbalanced drift"
)

##############################################################################
# DISCO
#
# inne

stream = sl.streams.StreamGenerator(
**concept_kwargs,
weights=(2, 5, 0.9),
n_drifts=3,
concept_sigmoid_spacing=5,
recurring=True,
incremental=True
)

plot_stream(
stream, "disco", "Dynamically Imbalanced Stream with Concept Oscillation (DISCO)"
)

0 comments on commit d2f026a

Please sign in to comment.