In [1]:
# imports
import numpy as np
import pandas as pd

## p 29

$$
\begin{align*}
\theta_{T} &= \sum_{t = 1}^{T} b_{t} \\
&\text{The expected value, at the beggining is} \\
E_{0}[\theta_{T}] &= E_{0}[T]\ (2 * P(b_t = 1) - 1) \\
&\text{ We define a tick imbalance bar (TIB) as } \theta_{T^{*}} \text{ where} \\
T^{*} &= \arg \min_{T} \{ \left| \theta_{T} \right| \geq\ \left| E_{0}[\theta_{T}] \right|\ \} \\
&= \arg \min_{T} \{ \left| \theta_{T} \right| \geq\ E_{0}[T]\ \left| 2 * P(b_t = 1) - 1 \right|\ \} \\
\end{align*}
$$

$$
\begin{align*}
E_{0}[\theta_{T}] &: \text{Expected size of the tick bar.} \\
|2 * P(b_t = 1) - 1| &: \text{The measure on what the size of the expected imbalance} \\
& \text{\hspace{2.2em}is implied on.}
\end{align*}
$$

From my understanding, the purpose of TIBs, DIBs and VIBs is that, in the case\
where we are given access to high frequency data, in the form of ticks, it is\
much more efficient to sample bars, which are groups of ticks and sampled in a\
way that the information is not lost.

Yes, your understanding is essentially correct. Tick Imbalance Bars (TIBs), Dollar Imbalance Bars (DIBs), and Volume Imbalance Bars (VIBs) are methods used in the analysis of high-frequency trading data that aim to sample data more intelligently based on the arrival of new information to the market. These methods are part of a broader category known as "information-driven bars," which are designed to overcome the limitations of time-based sampling in financial data analysis.

Here's a breakdown of each type of bar and the philosophy behind them:

### Tick Imbalance Bars (TIBs)
- **Purpose**: TIBs are created by grouping ticks until a certain imbalance in the trade direction is reached. The imbalance is measured by a predefined threshold.
- **Rationale**: The idea is that ticks (individual trades) carry information about market direction based on the volume and price changes. By grouping ticks into bars when there is a significant imbalance, you capture moments when potentially informed traders are active, thus capturing periods of significant market moves before they are fully realized.

### Volume Imbalance Bars (VIBs)
- **Purpose**: Similar to TIBs but with a focus on the volume of trades. VIBs group ticks until the accumulated volume of buys versus sells exceeds a threshold.
- **Rationale**: This method assumes that large volumes signify meaningful market activity, particularly informed trading. By focusing on volume imbalances, these bars aim to capture significant market events driven by volume shifts.

### Dollar Imbalance Bars (DIBs)
- **Purpose**: These bars extend the concept to the dollar value of trades, grouping ticks until the dollar imbalance (buy dollar volume vs. sell dollar volume) exceeds a threshold.
- **Rationale**: The dollar value can provide a more accurate measure of market impact, especially in asset classes where trade sizes vary significantly. This method helps in identifying bars that represent significant monetary movements in the market, potentially signaling major strategic trades.

### Common Benefits of Information-Driven Bars
- **Efficiency in Sampling**: Unlike time-based bars (like every minute or hour), which treat quiet periods and busy periods the same, information-driven bars focus on periods of significant activity. This results in more efficient use of data, as more bars are sampled during periods of high information flow and fewer during quiet times.
- **Reduction in Noise**: These methods help reduce noise by focusing on data points that have actual information content, potentially leading to better trading signals.
- **Adaptability**: These bars adapt to market conditions dynamically, allowing for variable bar sizes based on underlying market activity. This is particularly useful in avoiding the pitfalls of fixed-size bars that might either over-sample during quiet periods or under-sample during active periods.

In summary, the purpose of using TIBs, VIBs, and DIBs is to ensure that the sampling of financial data into bars is driven by the informational content of the trades rather than arbitrary time intervals. This approach aims to capture the essence of market dynamics more accurately and efficiently, particularly beneficial in algorithmic trading and quantitative analysis.

# 2.4.2 PCA Weights

In [2]:
def pcaWeights(cov, riskDist=None, riskTarget=1.):
    # Following the riskAlloc distribution, match riskTarget
    eVal, eVec = np.linalg.eigh(cov)  # must be Hermitian
    indices = eVal.argsort()[::-1]  # arguments for sorting eVal descending
    eVal, eVec = eVal[indices], eVec[:, indices]
    if riskDist is None:
        riskDist = np.zeros(cov.shape[0])
        riskDist[-1] = 1.
    loads = riskTarget * (riskDist / eVal) ** 0.5
    wghts = np.dot(eVec, np.reshape(loads, (-1, 1)))
    # ctr = (loads / riskTarget) ** 2 * eVal  # verify riskDist
    return wghts


Certainly! Let's break down the different inputs to the `pcaWeights` function and explore how they work and interact with each other. Here are the inputs:

1. **cov**: This is the covariance matrix of the asset returns. It is a square matrix that contains the covariances between each pair of assets. The size of the matrix is \( N \times N \), where \( N \) is the number of assets.
2. **riskDist**: This is an optional input. It is an array that specifies the desired distribution of risk across the principal components. If not provided, the function assumes all risk is allocated to the component with the smallest eigenvalue.
3. **riskTarget**: This is a scalar that represents the target risk level for the portfolio. It scales the overall risk allocation.

### Interpreting Different Sets of Inputs

#### Case 1: Basic Example


In [3]:
cov_matrix = np.array([[0.1, 0.02, 0.03], [0.02, 0.1, 0.04], [0.03, 0.04, 0.1]])
weights = pcaWeights(cov_matrix)
print(weights)

[[-1.05868598]
 [-2.49416586]
 [ 3.14066144]]



- **cov**: The covariance matrix provided here is a simple 3x3 matrix representing three assets.
- **riskDist**: Not provided, so it defaults to allocating all risk to the principal component with the smallest eigenvalue.
- **riskTarget**: Default value of 1.0.

**Interpretation**:
- The function calculates the PCA weights such that all risk is allocated to the component with the smallest eigenvalue. This will likely result in weights that minimize variance by focusing on the least risky component.

#### Case 2: Specifying Risk Distribution


In [4]:

cov_matrix = np.array([[0.1, 0.02, 0.03], [0.02, 0.1, 0.04], [0.03, 0.04, 0.1]])
risk_distribution = np.array([0.2, 0.3, 0.5])
weights = pcaWeights(cov_matrix, riskDist=risk_distribution)
print(weights)


[[-2.89734397]
 [-1.36761731]
 [ 1.81096786]]



- **cov**: The same covariance matrix as before.
- **riskDist**: Specifies that 20% of the risk should be allocated to the first principal component, 30% to the second, and 50% to the third.
- **riskTarget**: Default value of 1.0.

**Interpretation**:
- The function calculates the weights such that the specified risk distribution is achieved. This means the portfolio will be constructed to align with the desired risk proportions across the principal components.

#### Case 3: Adjusting the Risk Target



In [5]:

cov_matrix = np.array([[0.1, 0.02, 0.03], [0.02, 0.1, 0.04], [0.03, 0.04, 0.1]])
risk_distribution = np.array([0.2, 0.3, 0.5])
weights = pcaWeights(cov_matrix, riskDist=risk_distribution, riskTarget=0.5)
print(weights)


[[-1.44867198]
 [-0.68380866]
 [ 0.90548393]]




- **cov**: The same covariance matrix as before.
- **riskDist**: Specifies the same risk distribution as before.
- **riskTarget**: Set to 0.5, meaning the overall risk level is halved.

**Interpretation**:
- The function scales down the overall risk by 50%, resulting in a portfolio that has the same risk distribution but at a lower total risk level.

### Interaction Between Inputs

1. **cov and riskDist**:
   - The covariance matrix determines the principal components and their associated eigenvalues. The risk distribution specifies how much risk should be allocated to each of these components.
   - If `riskDist` is not provided, the function defaults to allocating all risk to the component with the smallest eigenvalue, which typically represents the least risk.

2. **cov and riskTarget**:
   - The covariance matrix defines the structure of risks and returns among the assets. The risk target scales this structure up or down.
   - A higher `riskTarget` means a higher overall risk, while a lower `riskTarget` means a lower overall risk.

3. **riskDist and riskTarget**:
   - The risk distribution specifies the proportions of risk allocated to each component, while the risk target scales the total risk.
   - Together, they determine both the proportions and the total level of risk in the portfolio.

### Examples with Explanations

1. **Equal Risk Distribution**

 


```python
   risk_distribution = np.array([1/3, 1/3, 1/3])
   weights = pcaWeights(cov_matrix, riskDist=risk_distribution)
````


   **Interpretation**:
   - Equal risk is allocated to each of the three principal components, resulting in a balanced risk profile.

2. **Conservative Risk Distribution**

   ```python
   risk_distribution = np.array([0.1, 0.2, 0.7])
   weights = pcaWeights(cov_matrix, riskDist=risk_distribution, riskTarget=0.3)
   ```

   **Interpretation**:
   - Most of the risk is allocated to the third principal component, which might represent the least volatile or most stable factor. The overall risk is scaled down to 30%.

3. **Aggressive Risk Distribution**

   ```python
   risk_distribution = np.array([0.5, 0.3, 0.2])
   weights = pcaWeights(cov_matrix, riskDist=risk_distribution, riskTarget=1.5)
   ```

   **Interpretation**:
   - More risk is allocated to the first principal component, which could be the most volatile or highest return factor. The overall risk is scaled up to 150%.

By adjusting these inputs, a portfolio manager can tailor the risk profile of the portfolio to match specific investment strategies or risk tolerances.

# 2.4.3 Single Futures Roll