# Anomaly Detection <a id="top"></a>

Find all time intervals `[time1, time2]` where `time2-time1 < time_thres` and not nested.

**Testcases**

- [Testcase 1](#testcase1)
- [Testcase 2](#testcase2)
- [Testcase 3](#testcase3)
- [Testcase 4](#testcase4)
- [Testcase 5](#testcase5)
- [Testcase 6](#testcase6)
- [Testcase 7](#testcase7)

In [1]:
from datetime import datetime, timedelta
import pandas as pd

In [12]:
def detect_weird_timeint(data, time_window):
    '''
    '''
    start_index = 0
    end_index = 1
    
    anomalies = pd.DataFrame(columns=["start_index", "end_index", "start_time", "end_time", "num_adid", "adids"])
    
    while (start_index < end_index and end_index+1 < len(data)):
        print("-"*50)
        print(start_index, end_index)
        
        if (data.time[end_index] - data.time[start_index] <= time_window):
            print("time window exceeded")
            
            if (data.time[end_index+1] - data.time[start_index] > time_window):
                adids = data.loc[start_index:(end_index-1), "adid"].unique()
                num_adid = len(adids)

                if num_adid > 1:
                    print("anomaly found")
                    anomalies.loc[len(anomalies)] = [start_index, end_index-1, data.time[start_index],
                                                    data.time[end_index-1], num_adid, list(adids)]

                start_index += 1
                
            end_index += 1
            
        else:
            start_index += 1
    
    print(start_index, end_index)
    
    if end_index == len(data)-1:
        print("end of data")
        while (start_index < end_index):
            if (data.time[end_index] - data.time[start_index] <= time_window):
                adids = data.loc[start_index:(end_index), "adid"].unique()
                num_adid = len(adids)

                if num_adid > 1:
                    print("anomaly found")
                    anomalies.loc[len(anomalies)] = [start_index, end_index, data.time[start_index],
                                                    data.time[end_index], num_adid, list(adids)]
                return anomalies
    
    return anomalies

In [3]:
START_TIME = datetime(2020,1,1)
time_thres = timedelta(minutes=10)
ADIDS = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M"]

## Testcase 1 <a id="testcase1"></a>

5 distinct adid within `time_thres`

[Back to top](#top)

In [4]:
testcase1_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase1_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i]]
    
testcase1_df = testcase1_df.sort_values(by="time")
testcase1_df

Unnamed: 0,time,adid
0,2020-01-01 00:00:00,A
1,2020-01-01 00:01:00,B
2,2020-01-01 00:02:00,C
3,2020-01-01 00:03:00,D
4,2020-01-01 00:04:00,E


In [13]:
detect_weird_timeint(testcase1_df, timedelta(minutes=10))

--------------------------------------------------
0 1
time window exceeded
--------------------------------------------------
0 2
time window exceeded
--------------------------------------------------
0 3
time window exceeded
0 4
end of data
anomaly found


Unnamed: 0,start_index,end_index,start_time,end_time,num_adid,adids
0,0,4,2020-01-01,2020-01-01 00:04:00,5,"[A, B, C, D, E]"


## Testcase 2 <a id="testcase2"></a>

2 sets of 5 distinct adid, each within `time_thres`, but more than `time_thres` apart

[Back to top](#top)

In [6]:
testcase2_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase2_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i]]
    
for i in range(5):
    testcase2_df.loc[5+i,:] = [START_TIME + timedelta(minutes=31) + i*time_thres/10, ADIDS[i]]
    
testcase2_df = testcase2_df.sort_values(by="time")
testcase2_df

Unnamed: 0,time,adid
0,2020-01-01 00:00:00,A
1,2020-01-01 00:01:00,B
2,2020-01-01 00:02:00,C
3,2020-01-01 00:03:00,D
4,2020-01-01 00:04:00,E
5,2020-01-01 00:31:00,A
6,2020-01-01 00:32:00,B
7,2020-01-01 00:33:00,C
8,2020-01-01 00:34:00,D
9,2020-01-01 00:35:00,E


In [14]:
detect_weird_timeint(testcase2_df, timedelta(minutes=10))

--------------------------------------------------
0 1
time window exceeded
--------------------------------------------------
0 2
time window exceeded
--------------------------------------------------
0 3
time window exceeded
--------------------------------------------------
0 4
time window exceeded
anomaly found
--------------------------------------------------
1 5
--------------------------------------------------
2 5
--------------------------------------------------
3 5
--------------------------------------------------
4 5
5 5


Unnamed: 0,start_index,end_index,start_time,end_time,num_adid,adids
0,0,3,2020-01-01,2020-01-01 00:03:00,4,"[A, B, C, D]"


## Testcase 3 <a id="testcase3"></a>

5 non-distinct adid within `time_thres`

[Back to top](#top)

In [None]:
testcase3_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase3_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i%3]]
    
testcase3_df = testcase3_df.sort_values(by="time")
testcase3_df

In [None]:
detect_weird_timeint(testcase3_df, timedelta(minutes=10))

## Testcase 4 <a id="testcase4"></a>

2 sets of 5 non-distinct adid, each within time_thres, but more than time_thres apart

[Back to top](#top)

In [None]:
testcase4_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase4_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i%3]]
    
for i in range(5):
    testcase4_df.loc[5+i,:] = [START_TIME + timedelta(minutes=31) + i*time_thres/10, ADIDS[i%3]]
    
testcase4_df = testcase4_df.sort_values(by="time")
testcase4_df

In [None]:
detect_weird_timeint(testcase4_df, timedelta(minutes=10))

## Testcase 5 <a id="testcase5"></a>

2 sets of 5 distinct adid, each within `time_thres`, but at most `time_thres` apart

[Back to top](#top)

In [None]:
testcase5_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase5_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i]]
    
for i in range(5):
    testcase5_df.loc[5+i,:] = [testcase5_df.time[2] + timedelta(seconds=5) + i*timedelta(seconds=177), ADIDS[i+5]]
    
testcase5_df = testcase5_df.sort_values(by="time")
testcase5_df

In [None]:
detect_weird_timeint(testcase5_df, timedelta(minutes=10))

## Testcase 6 <a id="testcase6"></a>

2 sets of 2 non-distinct adid, each within time_thres, but at most time_thres apart

[Back to top](#top)

In [None]:
testcase6_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase6_df.loc[i,:] = [START_TIME + i*time_thres/10, ADIDS[i%2]]
    
for i in range(5):
    testcase6_df.loc[5+i,:] = [testcase6_df.time[2] + timedelta(seconds=5) + i*timedelta(seconds=207), ADIDS[1]]

    
testcase6_df = testcase6_df.sort_values(by="time")
testcase6_df

In [None]:
detect_weird_timeint(testcase6_df, timedelta(minutes=10))

## Testcase 7 <a id="testcase7"></a>

First 5 adid all time_thres apart followed by 5 adid within time_thres apart

[Back to top](#top)

In [None]:
testcase7_df = pd.DataFrame(columns=["time", "adid"])

for i in range(5):
    testcase7_df.loc[i,:] = [START_TIME + i*(time_thres + timedelta(seconds=37)), ADIDS[i]]
    
for i in range(5):
    testcase7_df.loc[i+5,:] = [testcase7_df.time[4] + timedelta(minutes=31) + i*time_thres/10, ADIDS[i]]
    
testcase7_df = testcase7_df.sort_values(by="time")
testcase7_df

In [None]:
detect_weird_timeint(testcase7_df, timedelta(minutes=10))