<div style="text-align:center"><span style="color:#800000; font-family:Georgia; font-size:2.4em;"> Proyecto de Aplicación Profesional </span></div>

<img style="float: right; margin: auto;" src="https://www.ambulante.org/wp-content/uploads/2019/03/logos_web_Jalisco_ITESOJesuita.png" width="350" height="220" />

> <b> <p style = "font-family: Palatino; font-size:0.8em; color: #008080;" > Violeta | Mariana </p> <b/>

<i> <p style = "font-family: Calibri Light; font-size:1.1em;color:black;"> Caracteristicas de Series de Tiempo </p> <i/>

In [5]:
import pandas as pd
import funciones as fn

# Time Series
Una serie de tiempo o serie temporal es una secuencia de datos, observaciones o valores, medidos en determinados momentos y ordenados cronológicamente. Los datos pueden estar espaciados a intervalos iguales o desiguales.

In [8]:
# Serie de Tiempo: Ejemplo

OA_Ak = '800f1b3f91d7cb0a713c532e17823f6d-f9acd6a21490f97aef649dfd8e723435'
OA_In = "EUR_USD"                   # Instrumento
OA_Gn = "H1"                        # Granularidad
fini = pd.to_datetime("2019-07-06 00:00:00").tz_localize('GMT')  # Fecha inicial
ffin = pd.to_datetime("2019-12-06 00:00:00").tz_localize('GMT')  # Fecha final

# Descargar precios masivos
df = fn.f_precios_masivos(p0_fini=fini, p1_ffin=ffin, p2_gran=OA_Gn, p3_inst=OA_In, p4_oatk=OA_Ak, p5_ginc=4900)

pd.set_option('display.max_rows', 10)


serie = df['Close']
serie

0       1.12275
1       1.12288
2       1.12268
3       1.12212
4       1.12276
         ...   
2616    1.11040
2617    1.11030
2618    1.11065
2619    1.11076
2620    1.11077
Name: Close, Length: 2621, dtype: float64

In [36]:
serie.describe()

count    2621.000000
mean        1.108401
std         0.008511
min         1.088060
25%         1.102030
50%         1.107700
75%         1.114360
max         1.128020
Name: Close, dtype: float64

# Time Series Analysis

- Descriptive - "seeks to summarize a characteristic of a set of data"
- Exploratory - "analyze the data to see if there are patterns, trends, or relationships between variables" (hypothesis generating)
- Inferential - "a restatement of this proposed hypothesis as a question and would be answered by analyzing a different set of data" (hypothesis testing)
- Predictive - "determine the impact on one factor based on other factor in a population - to make a prediction"
- Causal - "asks whether changing one factor will change another factor in a population - to establish a causal link"
- Mechanistic - "establish how the change in one factor results in change in another factor in a population - to determine the exact mechanism"

[Link]('https://github.com/rouseguy/TimeSeriesAnalysiswithPython/blob/master/time_series/1-Frame.ipynb')

## Stationary
Prueba Dickey Fuller
>Ho: No es estacionaria 
>
>Hi: Es estacionaria

Aceptar Ho si: 
p_value > 0.05


[Link]('https://machinelearningmastery.com/time-series-data-stationary-python/')

In [21]:
# Stationarity Test
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

def Stationarity(serie):
    result = adfuller(serie)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))

Stationarity(serie)

ADF Statistic: -2.466725
p-value: 0.123782
Critical Values:
	1%: -3.433
	5%: -2.863
	10%: -2.567


## Autocorrelation
La prueba de Ljung-Box es un tipo de prueba estadística de si un grupo cualquiera de autocorrelaciones de una serie de tiempo son diferentes de cero.

*Se puede definir de la siguiente manera.*

>H0: Los datos se distribuyen de forma independiente (es decir, las correlaciones en la población de la que se toma la muestra son 0, de modo que cualquier correlación observada en los datos es el resultado de la aleatoriedad del proceso de muestreo).
>
>Ha: Los datos no se distribuyen de forma independiente.

In [22]:
def Autocorrelation(serie):
    #res = sm.tsa.ARMA(serie, (1,1)).fit(disp=-1)
    #sm.stats.acorr_ljungbox(res.resid, lags=[10], return_df=True)
    ibva, pva = sm.stats.diagnostic.acorr_ljungbox(serie, lags=None, boxpierce=False)
    return ibva,pva
Autocorrelation(serie)

(array([ 2602.73949408,  5186.0885978 ,  7750.49288148, 10295.9963108 ,
        12823.98960234, 15334.25439518, 17826.72267627, 20302.51511634,
        22761.93540785, 25205.22236127, 27631.95218942, 30041.96452895,
        32435.50386828, 34813.33120335, 37176.05296689, 39523.43690303,
        41855.35347134, 44171.97703692, 46472.71492204, 48757.97838646,
        51026.73291771, 53278.80428838, 55514.91668778, 57733.44792848,
        59934.26362464, 62115.87686895, 64278.134478  , 66421.63221022,
        68547.73767348, 70657.85875759, 72752.34210127, 74831.53908247,
        76896.16225483, 78946.27055224, 80981.07763759, 83001.00319595,
        85005.69907745, 86995.41727608, 88970.46177851, 90930.73519126]),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0.]))

In [None]:
def autocorr(serie_a, serie_b):
    result = numpy.correlate(serie_a, serie_b mode='full')
    return result[result.size/2:]

## Normal
Prueba de Kolmogorov-Smirnov
es una prueba no paramétrica que determina la bondad de ajuste de dos distribuciones de probabilidad entre sí.

In [29]:
import scipy.stats as st
def Kolmogrov_Test(serie):
    param = st.norm.fit(serie)
    D, p = st.kstest(serie, 'norm', args=param)
    print("p value for " + 'norm' + " = "+str(p))
Kolmogrov_Test(serie)

p value for norm = 6.789048675755803e-07


[Normal]('https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.mstats.normaltest.html')

In [33]:
def Normal_Test(serie):
    print(st.mstats.normaltest(serie))
    
Normal_Test(serie)

NormaltestResult(statistic=86.24250174972434, pvalue=1.873607828223265e-19)


## Moving Average

In [None]:
# Ventanas moviles
def Movil_Wind_DS(serie, w):
    vol  = serie.rolling(window = w).std()
    return vol
    
def Movil_Wind_Mean(serie, w):
    vol  = serie.rolling(window = w).mean()
    return vol
    

# Feature Engineering
Ejem:

1. **Feature normalization**

"(min 1 (max (+ (* slope x) intercept) 0))" : scale feature x with slope and intercept, and normalize to [0,1]

2. **Feature combination**

"(‐ (log2 (+ 5 impressions)) (log2 (+ 1 clicks)))" : combine #impression and #clicks into a smoothed CTR style feature

3. **Nonlinear featurization**

"(if (> query_doc_matches 0) 0 1)" : negation of a query/document matching feature

4. **Cascading modeling**

"(sigmoid (+ (+ (..) w1) w0))" : convert a logistic regression model into a feature

5. **Model combination (e.g. combine decision tree and linear regression)**

"(+ (* model1_score w1) (* model2_score w2))" : combine two model scores into one final score

[Link]('https://github.com/linkedin/FeatureFu')

import pandas as pd
from pmdarima.arima import auto_arima
from statsmodels.stats.diagnostic import acorr_ljungbox
df = pd.read_csv("uschange.csv")
arima_model = auto_arima(y = df['Consumption'], 
                  exogenous = df[['Income']],
                  start_p=1, 
                  start_q=1,
                  max_p=3, 
                  max_q=3,
                  seasonal=False,
                  d = None,  
                  error_action='ignore',  
                  suppress_warnings=True, 
                  stepwise=False)
acorr_ljungbox(arima_model.resid(), lags = 10)