<div class="alert alert-block alert-info">
    <h1>Análisis de Series Temporales</h1>
    <h3>Clase 8 - PYtimeTK - Anomalias </h3>
    <h3>Ejercicio 4_Asincronico_opcional</h3>
        <p>Docente: Rodrigo Del Rosso<p>
        <p>Asistentes: Sebastián Calcagno, Drago Braian <p>
</div>

**Modeltime: The Tidymodels Extension for Time Series Modeling**  

The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages.

https://business-science.github.io/modeltime/

**Objetivo del Ejercicio**: Introducción a la librería Model Time de R  

1- Herramientas Generales  
2- Ejemplo aplicado

# **PYtimeTK**

https://business-science.github.io/pytimetk/guides/06_anomalize.html

In [None]:
!pip install pytimetk

In [None]:
import pytimetk as tk
import pandas as pd

df = tk.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])

df

Unnamed: 0,order_id,order_line,order_date,quantity,price,total_price,model,category_1,category_2,frame_material,bikeshop_name,city,state
0,1,1,2011-01-07,1,6070,6070,Jekyll Carbon 2,Mountain,Over Mountain,Carbon,Ithaca Mountain Climbers,Ithaca,NY
1,1,2,2011-01-07,1,5970,5970,Trigger Carbon 2,Mountain,Over Mountain,Carbon,Ithaca Mountain Climbers,Ithaca,NY
2,2,1,2011-01-10,1,2770,2770,Beast of the East 1,Mountain,Trail,Aluminum,Kansas City 29ers,Kansas City,KS
3,2,2,2011-01-10,1,5970,5970,Trigger Carbon 2,Mountain,Over Mountain,Carbon,Kansas City 29ers,Kansas City,KS
4,3,1,2011-01-10,1,10660,10660,Supersix Evo Hi-Mod Team,Road,Elite Road,Carbon,Louisville Race Equipment,Louisville,KY
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2461,321,3,2011-12-22,1,1410,1410,CAAD8 105,Road,Elite Road,Aluminum,Miami Race Equipment,Miami,FL
2462,322,1,2011-12-28,1,1250,1250,Synapse Disc Tiagra,Road,Endurance Road,Aluminum,Phoenix Bi-peds,Phoenix,AZ
2463,322,2,2011-12-28,1,2660,2660,Bad Habit 2,Mountain,Trail,Aluminum,Phoenix Bi-peds,Phoenix,AZ
2464,322,3,2011-12-28,1,2340,2340,F-Si 1,Mountain,Cross Country Race,Aluminum,Phoenix Bi-peds,Phoenix,AZ


Using summarize_by_time() for a Sales Analysis: The result is the total revenue for Mountain and Road bikes by month.

In [None]:
summary_category_1_df = df \
    .groupby("category_1") \
    .summarize_by_time(
        date_column  = 'order_date',
        value_column = 'total_price',
        freq         = "MS",
        agg_func     = 'sum',
        wide_format  = False
    )

# First 5 rows shown
summary_category_1_df.head()

Unnamed: 0,category_1,order_date,total_price
0,Mountain,2011-01-01,221490
1,Mountain,2011-02-01,660555
2,Mountain,2011-03-01,358855
3,Mountain,2011-04-01,1075975
4,Mountain,2011-05-01,450440


Visualizing Sales Patterns

In [None]:
summary_category_1_df \
    .groupby('category_1') \
    .plot_timeseries(
        date_column  = 'order_date',
        value_column = 'total_price',
        smooth_frac  = 0.8
    )

**Anomaly Detection**

**m4_monthly**: muestra de 4 series de tiempo de frecuencia mensual tomadas de la M4 competition.

The Makridakis Competitions are a series of open competitions to evaluate and compare the accuracy of different time series forecasting methods.

A continuación cargamos los datos, seleccionamos in ID de interés y visualizamos la serie:

In [None]:
# libraries
import pytimetk as tk
import pandas as pd
import numpy as np

# Import Data
m4_daily_df = tk.load_dataset('m4_daily', parse_dates = ['date'])

In [None]:
# Data filtering
df = (
    m4_daily_df
        .query("id == 'D10'")
        .query("date.dt.year == 2015")
)

In [None]:
# Plot data
tk.plot_timeseries(
    data         = df,
    date_column  = 'date',
    value_column = 'value'
)

Ahora, como primer paso, se realiza una descomposición estacional y se generan remainders utilizando anomalize()

**iqr_alpha** controla el umbral para detectar valores atípicos. Es el nivel de significancia utilizado en el método del rango intercuartil (IQR) para la detección de valores atípicos. El valor predeterminado es 0,05, que corresponde a un nivel de significancia del 5%. Un nivel de significancia más bajo dará como resultado un umbral más alto, lo que significa que se detectarán menos valores atípicos. Un nivel de significancia más alto dará como resultado un umbral más bajo, lo que significa que se detectarán más valores atípicos.


In [None]:
# Anomalize
anomalize_df = tk.anomalize(
    data          = df,
    date_column   = 'date',
    value_column  = 'value',
    period        = 7,
    iqr_alpha     = 0.05, # using the default
    clean_alpha   = 0.75, # using the default
    clean         = "min_max"
)

anomalize_df.glimpse()

<class 'pandas.core.frame.DataFrame'>: 365 rows of 12 columns
date:               datetime64[ns]    [Timestamp('2015-01-01 00:00:00'), ...
observed:           float16           [2352.0, 2302.0, 2300.0, 2342.0, 2 ...
seasonal:           float16           [14.6953125, -18.203125, -32.78125 ...
seasadj:            float16           [2338.0, 2320.0, 2332.0, 2346.0, 2 ...
trend:              float16           [2324.0, 2324.0, 2322.0, 2322.0, 2 ...
remainder:          float16           [13.2109375, -2.990234375, 10.4765 ...
anomaly:            object            ['No', 'No', 'No', 'No', 'No', 'No ...
anomaly_score:      float16           [19.65625, 35.84375, 22.390625, 9. ...
anomaly_direction:  int8              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
recomposed_l1:      float16           [2178.0, 2144.0, 2128.0, 2158.0, 2 ...
recomposed_l2:      float16           [2566.0, 2532.0, 2516.0, 2544.0, 2 ...
observed_clean:     float16           [2352.0, 2302.0, 2300.0, 2342.0, 2 ...


Visualizamos la seaonal decomposition para ver su representación:

In [None]:
# Plot seasonal decomposition
tk.plot_anomalies_decomp(
    data        = anomalize_df,
    date_column = 'date',
    engine      = 'plotly',
    title       = 'Seasonal Decomposition'
)

Visualizamos las anomalías ustilizando tk.plot_anomalies();

In [None]:
# Plot anomalies
tk.plot_anomalies(
    data        = anomalize_df,
    date_column = 'date',
    engine      = 'plotly',
    title       = 'Plot Anomaly Bands'
)

Finalmente observamos la gráfic con los datos limpios utilizando plot_anomalies_cleaned()

In [None]:
# Plot cleaned anomalies
tk.plot_anomalies_cleaned(
    data        = anomalize_df,
    date_column = 'date'
)