In [None]:
import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd

<h1> Transition matrices and regimes ratios distribution </h1>

Recall we are looking at the number of ticks per minute, and that we defined three regimes of volatilty by separating the number of ticks by tersiles. In particular this notion of volatility regimes is time dependent. <br/>

In this notebook we are going to study the effect known events have on the volatility regimes. We will only be looking at events predicted as being of type "3".  <br/>

The data used in the code are of two types: <br/>
- csv files of the volatility regime in every minute of every year;
- a csv file with the day number of every event, and another csv file with the minute of the day when said event happened.

I was not responsible for generating these datas! Yicheng and Yifan gave me the files with the day and minutes of the known events. Sebastian gave me some of the volatility regime data.

<h1> The functions </h1>

<h3> Data processing </h3>

We need to merge the dataframes of the day number and minute number of events. 

In [None]:
def event_time(row_csv, col_csv):
    """
    Creates a DataFrame where each row corresponds to the time of an event. First column is day number, 
    second is time of day.
    row_csv and col_csv are strings.
    """
    row = pd.read_csv(row_csv)
    col = pd.read_csv(col_csv)
    row.drop(row.columns[0],axis = 1, inplace=True)   
    col.drop(col.columns[0],axis = 1, inplace=True) 
    row.rename(columns = {'x':'day'}, inplace=True)
    col.rename(columns = {'x':'minute'}, inplace=True)
    event_times = pd.concat([row, col], axis=1)
    return event_times

<h2> Transition matrices </h2>

For every day $k$ and every minute $t$, let $\lambda^k_t$ be the intensity of ticks return during minute $t$ of day $k$. We associate to $\lambda^k_t$ a volatility regime $\Lambda^k_t \in \{1,2,3\}$, by looking between which tersile $\lambda^k_t$ falls. <br/>

We then compute the transition matrices between these volatility regimes. 

First we compute the unconditioned transition matrices,  that is 
\begin{align*}
(A_u)_{ij} & = \frac{(a_u)_{ij}}{\sum_j (a_u)_{ij}}, \hspace{2cm} \text{where}   \\
(a_u)_{ij} & = \sum_{t,k}1_{\Lambda^k_t = i, \: \Lambda^k_{t+u}=j},
\end{align*}
where we sum over all the minutes $t$ in a day, and all the days $k$ in the sample. 

In [None]:
def uncond_trans_mat(regime_val, u, N=3):
    """
    Unconditioned transition matrix between volatility regimes
    ##################
    N (int): number of volatility regimes.
    regime_val (DataFrame): rows are days, columns are minutes of the day, values are the volatility regime.
    u (int): we compute transition matrix between regimes happening between times t and u+t.    
    """
    A = np.zeros((N,N))
    regime_val = regime.values
    for row in range(regime_val.shape[0]):
        for time in range(regime_val.shape[1]-u):
            for i in range(N):
                for j in range(N):
                    if regime_val[row,time] == i+1.0 and regime_val[row,time+u] == j+1.0:
                        A[i,j] = A[i,j]+1
    result = A/A.sum(axis=1).reshape(-1,1)
    return np.around(result,2) #Rounding the result to two decimals

Then we compute the conditioned transition matrices, that is 
\begin{align*}
(B_{u_1, u_2})_{ij} & = \frac{(b_{u_1, u_2})_{ij}}{\sum_j (b_{u_1,u_2})_{ij}}   \\
(b_{u_1,u_2})_{ij} & = \sum_{n}1_{\Lambda^{k_n}_{t_n-u_1} = i, \: \Lambda^{k_n}_{t_n+u_2}=j},
\end{align*}
where $n$ runs over the number of events.

In [None]:
def trans_mat(event_time, regime, u1, u2, N=3):
    """
    (Conditioned) transition matrix between volatilty regimes
    #########################
    N (int): number of volatility regimes.
    event_time_np ((-1,2) DataFrame): day and minute of events.
    regime (DataFrame): rows are days, columns are minutes of the day, values are the volatility regime.
    u_1, u_2 (int): we compute transition matrix between regimes happening u1 minutes before and u2 minutes 
                    after each events.
    """
    A = np.zeros((N,N))
    event_time_np = event_time.values
    regime_val = regime.values
    for event in range(event_time_np.shape[0]):
        day = int(event_time_np[event,0])-1 #In the data set days are numbered starting at 1
        for i in range(N):
            for j in range(N):
                time = int(event_time_np[event,1])-1 #In the data set minutes are numbered starting at 1
                if not time+u2 > regime_val.shape[1]-1 and not time-u1 <0:
                    if regime_val[day,time-u1] == float(i)+1 and regime_val[day,time+u2] == float(j)+1:
                        A[i,j] = A[i,j]+1
    result = A/A.sum(axis=1).reshape(-1,1)
    return np.around(result,2) #Rounding the result to two decimals

<h2> Regime ratios distributions </h2>

We compute on average how the ratios of regimes are distributed. By definition of the regimes, in theory we are in any given regime 1/3 of the time.

In [None]:
def uncond_vol_regime(regime):
    """
    Given a DataFrame "regime", computes the average (unconditioned) regime ratio distribution
    """
    regime_np = regime.values
    A = np.zeros(3)
    for n in range(3):
        A[n] = (regime_np ==float(n)+1).sum()
    return A/A.sum()

We look at how the ratios of regimes were distributed $u$ minutes before/after an event in average. 

In [None]:
def vol_regimes(event_times, regime, u):
    """
    event_time_np ((-1,2) DataFrame): day and minute of events.
    regime (DataFrame): rows are days, columns are minutes of the day, values are the volatility regime.
    u (int): relative integer
    For every event, look at which volatility regime we were u minutes before/after (depends on sign of u).
    """
    event_times_np = event_times.values
    regime_val = regime.values
    vol_regimes = np.array([0,0,0])
    for event in range(event_times_np.shape[0]):
        day = int(event_times_np[event, 0])-1 #In the data set days are numbered starting at 1
        time = int(event_times_np[event, 1])-1 #In the data set minutes are numbered starting at 1
        if not time+u > regime_val.shape[1]-1: 
            for n in range(3):
                if regime_val[day, time + u] == float(n)+1:
                    vol_regimes[n] = vol_regimes[n]+1
    
    vol_regimes =  vol_regimes/vol_regimes.sum()
    return vol_regimes

Then we plot how these ratios changed from $t$ minutes before, to $t$ minutes after an event in average. 

In [None]:
def regime_ratio(event_times, regime, t):
    """
    event_times: DataFrame
    regime: Dataframe
    t: int
    Plots the average distribution of regime ratios from t-1 minutes before to t-1 minutes after an event.
    """
    dist_before = vol_regimes(event_times, regime, 0)
    for minute in range(1,t):
        vol_reg = vol_regimes(event_times, regime, -minute)
        dist_before = np.vstack((dist_before,vol_reg))
    H_minus = np.fliplr(dist_before.transpose())
    
    dist_after = vol_regimes(event_times, regime, 1).reshape(1,-1)
    for minute in range(2,t):
        vol_reg = vol_regimes(event_times, regime, minute)
        dist_after = np.vstack((dist_after,vol_reg))
    H_plus = dist_after.transpose()
    
    H = np.hstack((H_minus, H_plus))
    for n in range(3):
        plt.plot(np.linspace(-t,t,2*t-1),H[n,:])
        
    plt.legend(['low', 'medium', 'high'], loc = 2)

    return H.shape

<h1> The Data </h1>

<h3> 2017 </h3>

<b> The events </b>

In [None]:
event_times = event_time('row_regime3.csv', 'col_regime3.csv')
event_times.head()

<b> The regimes </b>

In [None]:
regime = pd.read_csv('regimes_1minute.csv')
regime.head()

In [None]:
A1 = uncond_trans_mat(regime, 1)
A2 = uncond_trans_mat(regime, 2)
A3 = uncond_trans_mat(regime, 3)

In [None]:
B01 = trans_mat(event_times, regime, 0,1)
B11 = trans_mat(event_times, regime, 1,1)
B12 = trans_mat(event_times, regime, 1, 2)

In [None]:
print('Unconditioned matrix t->t+1')
print(A1)
print('Transition matrix during event and 1 minute after')
print(B01)
print('Unconditioned matrix t->t+2')
print(A2)
print('Transition matrix 1 minute before event and 1 minute after')
print(B11)
print('Unconditioned matrix t->t+3')
print(A3)
print('Transition matrix 1 minute before event and 2 minute after')
print(B12)

In [None]:
uncond_vol_regime(regime)

In [None]:
regime_ratio(event_times, regime, 5)#Plot regime ratios from 5 minutes before, to 5 minutes after an event in average. 

In [None]:
regime_ratio(event_times, regime, 30)

<h3>2013 to 2017 </h3>

<b> The events </b>

In [None]:
event_times = event_time('row2013-2017.csv', 'col2013-2017.csv')
event_times.head()

<b> The regimes </b>

In [None]:
regime = pd.read_csv('Regimes2013-2017.csv')
regime.head()

In [None]:
A1 = uncond_trans_mat(regime, 1)
A2 = uncond_trans_mat(regime, 2)
A3 = uncond_trans_mat(regime, 3)

In [None]:
B01 = trans_mat(event_times, regime, 0,1)
B11 = trans_mat(event_times, regime, 1,1)
B12 = trans_mat(event_times, regime, 1, 2)

In [None]:
print('Unconditioned matrix t->t+1')
print(A1)
print('Transition matrix during event and 1 minute after')
print(B01)
print('Unconditioned matrix t->t+2')
print(A2)
print('Transition matrix 1 minute before event and 1 minute after')
print(B11)
print('Unconditioned matrix t->t+3')
print(A3)
print('Transition matrix 1 minute before event and 2 minute after')
print(B12)

<b> Remark  </b>

In [None]:
print('A1^2 =')
print(np.around(np.linalg.matrix_power(A1, 2),2))
print('A1^3=')
print(np.around(np.linalg.matrix_power(A1, 3),2))

We see that the transitions between regimes seems to be very far from being a Markov process. 

In [None]:
uncond_vol_regime(regime)

In [None]:
regime_ratio(event_times, regime, 5)

We see that volatility is high in the minutes prior to an event. But volatility indeed seems to drop 1-2 minutes prior to an event, as the player wait for the information, and picks up steam 1-2 minutes after the event. 

In [None]:
regime_ratio(event_times, regime, 15)

In [None]:
regime_ratio(event_times, regime, 50)

In [None]:
regime_ratio(event_times, regime, 70)

It seems to take around 30 minutes after and event for the market to stabilize. 