## Example of anom_detect Usage

Below I use an example from the commonly used sunspots dataset to show some features of the anomaly detection library, especially some of the plotting functionalities.

If you want to run the example, download the data set from the below commented link and then run the example.

In [1]:
from anom_detect import anom_detect
import pandas as pd
%matplotlib inline

### Load data set into Pandas

In [2]:
#!wget -c http://www-personal.umich.edu/~mejn/cp/data/sunspots.txt -P .

In [3]:
df = pd.DataFrame.from_csv('sunspots.txt',sep='\t',header=None)
df.index.name = 'time'
df.columns = ['sunspots']

  """Entry point for launching an IPython kernel.


FileNotFoundError: File b'sunspots.txt' does not exist

In [None]:
df.head()

### Evaluate for Anomalies

There are a number of options available in the anom_detect method.  It is recommended a small description below helps to:
- method : This is the data filtering method used, for the moment only 'average' is avaiable representing the moving average method.  In the future more data modelling techniques will be implemented.
- max_outliers : This is defaulted to 'None', which means that the max number of outliers is set to the size of your data set.  For more efficient computation this should be limited.
- window : The window size for the moving average, defaulted to 5.
- alpha : the significance level used for ESD test.
- mode : Method used in discrete linear convolution for dealing with boudaries.  Please read seperate documentation.  Default is 'same', this means that the window of averaging must intersect with data points with a length of >len(window)/2

In [None]:
# Use default values
an = anom_detect()

In [None]:
# Find the anomalies and print them
an.evaluate(df)

In [None]:
an.plot()

In [None]:
an.plot(left=200,right=400,top=200,bottom=0)

### Accessing data

In [None]:
# The graph values can be accessed using 'results'.
an.results.head()

In [None]:
# Anomalous data points can be printed from anoma_points.
an.anoma_points.head()

### Check Normality of Residual

In order to use the ESD test, it is important that the quantity being tested is approximately normally distributed.  You can use the normality function in order to check this through two plots. 
In this implementation we calculate a residual value between the approximated curve (in this case the 5 day moving average) and the actual data:

<b>residual = (actual data point) -  (estimated value from moving average)</b>

The plots are simple and qualitative checks for normality:
- <b>Distribution of residuals</b> : is just a histogram of the residual in 100 bins.
- <b>Probability plot</b> : plots the actual data against it's corresponding normal value approximation (uses scipy.stats.probplot).  A perfectly normal data set would lie along the straight line.

In [None]:
an.normality()