Luminol is a light weight python library that supports two functionalities: 
- *Anomaly Detection*: Given a time series, detect if the data contains any anomaly and gives you back a time window where the anomaly happened in, a time stamp where the anomaly reaches its severity, and a score indicating how severe is the anomaly compare to others in the time series.

- *Correlation*: Given two time series, help find their correlation coefficient. Since the correlation mechanism allows a shift room, you are able to correlate two peaks that are slightly apart in time.

In [1]:
import pandas as pd
import numpy as np
import time
from luminol.anomaly_detector import AnomalyDetector
from luminol.correlator import Correlator

In [2]:
#Basic example of calculating anomaly scores

ts = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}

my_detector = AnomalyDetector(ts)
score = my_detector.get_all_scores()
for timestamp, value in score.iteritems():
    print(timestamp, value)

0 0.0
1 0.8731282501307988
2 1.5716308502354377
3 2.1363368633427995
4 1.70906949067424
5 2.905418134146207
6 1.1715411093483696
7 0.9372328874786957
8 0.7497863099829566


In [3]:
#Correlating ts1 with ts2 on every anomaly

ts1 = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}
ts2 = {0: 0, 1: 0.5, 2: 1, 3: 0.5, 4: 1, 5: 0, 6: 1, 7: 1, 8: 1}

my_detector = AnomalyDetector(ts1, score_threshold=1.5)
score = my_detector.get_all_scores()
anomalies = my_detector.get_anomalies()
for a in anomalies:
    time_period = a.get_time_window()
    my_correlator = Correlator(ts1, ts2, time_period)
    if my_correlator.is_correlated(threshold=0.8):
        print("ts2 correlate with ts1 at time period (%d, %d)" % time_period)

ts2 correlate with ts1 at time period (2, 5)


In [9]:
a1_benchmark = pd.read_csv(r"C:\Users\Payton Rudnick\Documents\time_series_anomaly_detection\datasets\A1Benchmark\real_1.csv", index_col='timestamp')

In [10]:
a1_benchmark

Unnamed: 0_level_0,value,is_anomaly
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.000000,0
2,0.091758,0
3,0.172297,0
4,0.226219,0
5,0.176358,0
...,...,...
1416,0.159675,0
1417,0.137626,0
1418,0.197441,0
1419,0.161966,0


In [11]:
a1_benchmark['is_anomaly'].value_counts()

0    1418
1       2
Name: is_anomaly, dtype: int64

In [18]:
a2_benchmark = pd.read_csv(r'C:\Users\Payton Rudnick\Documents\time_series_anomaly_detection\datasets\A2Benchmark\synthetic_1.csv')

In [19]:
a2_benchmark['is_anomaly'].value_counts()

0    1417
1       4
Name: is_anomaly, dtype: int64

In [20]:
a2_benchmark

Unnamed: 0,timestamp,value,is_anomaly
0,1416726000,13.894031,0
1,1416729600,33.578274,0
2,1416733200,88.933746,0
3,1416736800,125.389424,0
4,1416740400,152.962000,0
...,...,...,...
1416,1421823600,-141.419766,0
1417,1421827200,-139.657834,0
1418,1421830800,-70.550652,0
1419,1421834400,-16.857148,0


In [21]:
a2_ts = dict(zip(a2_benchmark['timestamp'], a2_benchmark['value']))

In [23]:
#Basic example of calculating anomaly scores

#Creating a dictionary of timestamps and their associated values
a2_ts = dict(zip(a2_benchmark['timestamp'], a2_benchmark['value']))

my_detector = AnomalyDetector(a2_ts)
score = my_detector.get_all_scores()
for timestamp, value in score.iteritems():
    print(timestamp, value)

1416726000 0.0
1416729600 0.0
1416733200 0.15736534287506354
1416736800 0.08155272258447
1416740400 0.08283639550583931
1416744000 0.09525390475111772
1416747600 0.12305390472287678
1416751200 0.12170263756087489
1416754800 0.12322030703197885
1416758400 0.19467872105526723
1416762000 0.15969954813563883
1416765600 0.1354949811491405
1416769200 0.12343397384032302
1416772800 0.13486620857524073
1416776400 0.13600627876270913
1416780000 0.13571413194022194
1416783600 0.14208647277922698
1416787200 0.13391502812537523
1416790800 0.1439893053444283
1416794400 0.15395437287793773
1416798000 0.1362544718287898
1416801600 0.14012861190745987
1416805200 0.1352677878437798
1416808800 0.13909210745383413
1416812400 0.1294818105314619
1416816000 0.13139315955766553
1416819600 0.13377552986263555
1416823200 0.15583313845766084
1416826800 0.10846765981976256
1416830400 0.12829958926621066
1416834000 0.11525313851500527
1416837600 0.11023438422900765
1416841200 0.10934745609138243
1416844800 0.1051

Find a way to visualize this properly is the next step so we can see what Luminol identified as an anomaly. Will compare these results to the df column 'is_anomaly'