<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">

# Eikon Data API

**Financial Time Series Prediction &mdash; Using Deep Neural Networks**

Dr. Yves J. Hilpisch | The Python Quants GmbH

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>

<img src="http://hilpisch.com/images/tr_eikon_02.png" width=350px align=left>

## The Agenda

This tutorial shows

* how to retrieve historical intraday data across asset classes via the Eikon Data API,
* how to work with such data using `pandas`, `Plotly` and `Cufflinks` and
* how to apply deep learning techniques based on deep neural networks for time series prediction

## Importing Required Packages

In [None]:
import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
import tensorflow as tf  # Tensorflow
import configparser as cp

The following **Python and package versions** are used.

In [None]:
import sys
print(sys.version)

In [None]:
ek.__version__

In [None]:
np.__version__

In [None]:
pd.__version__

In [None]:
cf.__version__

In [None]:
tf.__version__

## Connecting to Eikon Data API

This code sets the `app_id` to connect to the **Eikon Data API Proxy** which needs to be running locally.

In [None]:
cfg = cp.ConfigParser()
cfg.read('eikon.cfg')

In [None]:
ek.set_app_key(cfg['eikon']['app_id']) #set_app_id function being deprecated

## Retrieving Intraday Data

We first define a **small universe of `RICS`** for which to retrieve data.

In [None]:
rics = [
    'SPY',  # S&P 500 ETF
    'AAPL.O',  # Apple stock
    'AMZN.O'  # Amazon stock
]

Second, **intraday data** is retrieved.

In [None]:
data = pd.DataFrame()
for ric in rics:
    data[ric] = ek.get_timeseries(ric,  # the RICs
                     fields='CLOSE',  # the required fields
                     start_date='2018-05-02 12:00:00',  # start time
                     end_date='2018-05-02 16:00:00', # end time
                     interval='minute')['CLOSE']  # bar length  

In [None]:
data.info()

In [None]:
data.head()  # first five rows

In [None]:
data.tail()  # final five rows

In [None]:
data.dropna(inplace=True)

## Calculating the Log Returns

We next calculate the **log returns** in vectorized fashion.

In [None]:
rets = np.log(data / data.shift(1)).dropna()  # log returns in vectorized fashion

In [None]:
rets.head()

## Plotting the Data

Using `Cufflinks`, we can plot the normalized financial time series as **line plots** for comparison.

In [None]:
data.normalize().iplot(kind='lines')

The frequeny distributions, i.e. the **histograms**, of the log returns per `RIC`.

In [None]:
rets.iplot(kind='histogram', subplots=True)

## Preparing Lagged Data

The code that follows derives the **lagged data** for every single `RIC`. First, a function that adds columns with lagged data to a `DataFrame` object.

In [None]:
lags = 10

In [None]:
def add_lags(data, ric, lags):
    cols = []
    df = pd.DataFrame(rets[ric])
    for lag in range(1, lags + 1):
        col = 'lag_{}'.format(lag)  # defines the column name
        # creates the lagged data column with directional values
        df[col] = df[ric].shift(lag)
        cols.append(col)  # stores the column name
    df.dropna(inplace=True)  # gets rid of incomplete data rows
    return df, cols

Second, the iterations over all `RICs`, using the `add_lags` function and storing the resulting `DataFrame` objects in a dictionary.

In [None]:
dfs = {}
for ric in rics:
    df, cols = add_lags(data, ric, lags)
    dfs[ric] = df

In [None]:
cols  # the column names for the lags

In [None]:
dfs.keys()  # the keys of the dictonary

In [None]:
dfs['AAPL.O'].head(7)

In [None]:
np.digitize(dfs['AAPL.O'].head(7), bins=[0])

In [None]:
2 ** lags  # number of patterns

## The DNN Model

The matrix consisting of the lagged data columns is used to "predict" the next day's direction of movement of the `RIC` via a **deep neural network (DNN)** algorithm. This is a **classification algorithm** that is able to **learn from historical patterns** (10 lags) to predict whether an upwards movement is more likely or a downwards movement.

In what follows, the `Tensorflow` package from Google is used (see https://www.tensorflow.org/).

<img src="http://hilpisch.com/images/tensorflow_logo.png" width="15%" align="left">

In [None]:
tf.logging.set_verbosity(tf.logging.ERROR)

First, the definition of the **features**.

In [None]:
fd = {col: tf.contrib.layers.real_valued_column(col, 1)
      for col in cols}

The values shall be **bucketized** later on.

In [None]:
fc = [tf.contrib.layers.bucketized_column(fd[col], boundaries=[0])
     for col in cols]

## The Model Fitting

For the model fitting, a Python function is required to deliver the **data for the features and labels**.

In [None]:
def get_data():
    fc = {col: tf.constant(df[col]) for col in cols}
    la = tf.constant(np.digitize(df[ric], bins=[0]))
    return fc, la

Now, the DNN can be trained ("fitted") to the data. The the **DNN model object** is instantiated with three hidden layers.

In [None]:
%%time
for ric in rics:
    dnn = tf.contrib.learn.DNNClassifier(
        hidden_units=[128, 128, 128],
        feature_columns=fc)  # the DNN model
    df = dfs[ric].copy()  # getting data for the RIC
    dnn.fit(input_fn=get_data, steps=250)  # model fitting
    # prediction step
    dfs[ric]['position'] = list(dnn.predict(input_fn=get_data))
    # transforming results to +1 and -1
    dfs[ric]['position'] = np.where(dfs[ric]['position'] > 0, 1, -1)

The prediction value is either `+1` for an upwards movement or `-1` for a downwards movement. With regard to a using this as signals for a trading strategy, one **would go long for `+1` and go short for `-1`**.

In [None]:
for ric in rics:
    print('{:10} | {}'.format(ric, dfs[ric]['position'].values[:12]))

## Vectorized Backtesting

Let's backtest the performance of the DNN-based trading strategies. Here, vectorization is used for convencience and speed. First, the **strategy returns** which result from multiplying the prediction or position values by the log returns of the respective `RIC`.

In [None]:
for ric in rics:
    dfs[ric]['strategy'] = dfs[ric]['position'] * dfs[ric][ric]

Second, the visualization of the **cumulative performance**.

In [None]:
for ric in rics:
    dfs[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

## Out-of-Sample Testing

Next, to get a more realistic picture of the real trading performance to be expected a **train test split** to implement **out-of-sample backtesting**.

In [None]:
split = int(len(data) / 2)

In [None]:
vspan = [{'x0': data.index[0], 'x1': data.index[split], 'color': 'green', 'fill': True, 'opacity': .2},
        {'x0': data.index[split], 'x1': data.index[-1], 'color': 'red', 'fill': True, 'opacity': .2}]

Roughly speaking, the **green part is taken for training**, the **red part for testing**.

In [None]:
data.normalize().iplot(vspan=vspan)

In [None]:
%%time
res = {}
for ric in rics:
    dnn = tf.contrib.learn.DNNClassifier(
        hidden_units=[128, 128, 128],
        feature_columns=fc)  # the DNN model
    dfr = dfs[ric].copy()  # getting data for the RIC
    # training step
    df = dfr.iloc[:split]
    dnn.fit(input_fn=get_data, steps=250)  # model fitting
    # prediction step
    df = dfr.iloc[split:]
    pred = list(dnn.predict(input_fn=get_data))
    # transforming results to +1 and -1
    pred = np.where(np.array(pred) > 0, 1, -1)
    # collecting the results
    strat = pred * df[ric]
    res[ric] = pd.DataFrame({ric: df[ric],
                             'pred': pred,
                             'strategy': strat})

In [None]:
res['AAPL.O'].head()

In [None]:
for ric in rics:
    res[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

## Conclusions

Based on this tutorial, we can conclude that

* it is easy to retrieve **historical intraday data (one minute bars)** via the Eikon Data API,
* `Plotly` and `Cufflinks` make **financial data visualization** convenient,
* **deep learning (DL) techniques** such as **deep neural networks (DNN)** for classification are easily applied by the use of Python and
* that such techniques might be helpful in **predicting the direction of market movements** using a lag- and pattern-based approach.

## Eikon Data API Developer Resources

* [Overview](https://developers.thomsonreuters.com/eikon-data-apis) 
* [Quick Start ](https://developers.thomsonreuters.com/eikon-data-apis/quick-start)
* [Documentation](https://developers.thomsonreuters.com/eikon-data-apis/docs)
* [Downloads](https://developers.thomsonreuters.com/eikon-data-apis/downloads)
* [Tutorials](https://developers.thomsonreuters.com/eikon-data-apis/learning)
* [Q&A Forums](https://developers.thomsonreuters.com/eikon-data-apis/qa) 

Data Item Browser Application: Type `DIB` into Eikon Search Bar.

<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">