<img src="https://hilpisch.com/tpq_logo_bic.png" width="20%" align="right">

# Python for Algorithmic Trading
## Vectorized Lagged-Returns OLS Strategy on a Single Asset

&copy; Dr. Yves J. Hilpisch<br>
AI-Powered by GPT 5.1<br>
The Python Quants GmbH | https://tpq.io<br>
https://hilpisch.com | https://linktr.ee/dyjh


## Notebook Goals

This notebook walks through the lagged-returns OLS strategy from Sections 5 and 6 in a fully vectorized form. You will

- load daily prices prices from `data/epat_eod.csv`,
- construct lagged log-return features,
- fit an OLS model to predict tomorrow's return from past returns,
- translate predictions into positions and equity curves, and
- compare the strategy against buy-and-hold and a coin-flip benchmark.


In [None]:
# Vectorized lagged-returns OLS backtest in notebook form.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

plt.style.use("seaborn-v0_8")  #  consistent look
plt.rcParams.update({"figure.dpi": 250})

DATA_PATH = Path("../data/epat_eod.csv")
DATA_URL = (
    "https://raw.githubusercontent.com/yhilpisch/epatcode/"
    "refs/heads/main/data/epat_eod.csv"
)
DATA_SRC = DATA_PATH if DATA_PATH.is_file() else DATA_URL


## 1. Load and Inspect Time Series Data

We start by loading the end-of-day dataset and extracting the symbol column as a clean price series.

In [None]:
symbol = "EURUSD"
end = '2025-12-31'
raw = pd.read_csv(DATA_SRC, parse_dates=["Date"]).set_index("Date").sort_index().loc[:end]
data = raw[symbol].astype(float).dropna()
display(data.to_frame(symbol).tail())
print(f"Date range: {data.index.min().date()} â†’ {data.index.max().date()} ({len(data):,} obs)")


In [None]:
data.plot();

## 2. Construct Lagged-Returns Features

We follow the article and use a fixed number of lags (for example, seven) of daily log-returns as predictors.

In [None]:
def make_lagged_returns(prices: pd.Series,
                        lags: int = 5) -> tuple[np.ndarray, np.ndarray, pd.DatetimeIndex]:
    """Compute log-returns and assemble a lagged design matrix."""

    log_prices = np.log(prices.to_numpy())
    rets = np.diff(log_prices)  #  r_t = log S_t - log S_{t-1}
    dates = prices.index[1:]

    n = rets.shape[0]
    if n <= lags:
        raise ValueError("Not enough observations for the chosen number of lags.")

    X = np.column_stack(
        [rets[(lags - k):(n - k)] for k in range(1, lags + 1)]
    )  #  columns r_{t-1},...,r_{t-lags}
    y = rets[lags:]  #  target r_t
    dates_eff = dates[lags:]
    return X, y, dates_eff

lags = 7
X, y, dates = make_lagged_returns(data, lags=lags)
X.shape, y.shape


In [None]:
# Diagnostic: inspect first rows of target and lagged returns.
rows = min(5, y.shape[0])
cols = min(X.shape[1], 7)
diag_cols = [y[:rows]] + [X[:rows, k] for k in range(cols)]
diag_labels = ["y_t"] + [f"r_t_minus_{k + 1}" for k in range(cols)]
diag_df = pd.DataFrame(
    np.column_stack(diag_cols),
    columns=diag_labels,
    index=dates[:rows],
)
diag_df.round(6)


## 3. Fit the OLS Model

Next we fit the linear model
$$r_t = \alpha + \beta^\top x_t + \varepsilon_t$$
where the components of $x_t$ are lagged returns.

In [None]:
def fit_ols(X: np.ndarray, y: np.ndarray, normalize: bool = False) -> np.ndarray:
    """Estimate y = beta_0 + X beta via OLS and return coefficients.

    If ``normalize`` is True, each feature column is standardized to
    zero mean and unit variance before fitting. This can improve
    numerical conditioning when features have different scales,
    but the trading logic based on the sign of predictions remains
    unchanged.
    """

    if normalize:
        mean = X.mean(axis=0)
        std = X.std(axis=0, ddof=1)
        std[std == 0.0] = 1.0  #  avoid division by zero for constant cols
        X_use = (X - mean) / std
    else:
        X_use = X

    X_design = np.column_stack([np.ones(X_use.shape[0]), X_use])  #  add intercept
    beta = np.linalg.lstsq(X_design, np.sign(y), rcond=None)[0]
    return beta


beta = fit_ols(X, y, normalize=False)
beta


## 4. From Predictions to Positions

We convert one-step-ahead predictions into long/short positions, lagging the signals by one day to avoid look-ahead bias. Transaction costs are modelled as a simple proportional charge on changes in position.

In [None]:
def run_lag_strategy(X: np.ndarray,
                     y: np.ndarray,
                     beta: np.ndarray,
                     cost: float = 0.0001) -> np.ndarray:
    """Compute strategy returns from lagged OLS predictions."""

    X_design = np.column_stack([np.ones(X.shape[0]), X])
    y_pred = X_design @ beta  #  forecasts of r_t

    pos = np.sign(y_pred)  #  raw positions -1, 0, +1
    strat_rets = pos * y  #  gross strategy returns

    turnover = np.abs(pos[1:] - pos[:-1])  #  trades per step
    strat_rets[1:] = strat_rets[1:] - cost * turnover
    return strat_rets


strat_rets = run_lag_strategy(X, y, beta)
strat_rets[:5]


## 5. Performance Metrics: Buy & Hold vs Strategy

Before looking at equity curves, it is useful to compare basic performance metrics for buy-and-hold and the lagged-returns strategy on the same return window. We use the aligned log-return vectors `y` (for the symbol) and `strat_rets` (for the strategy) to compute annualized return, volatility, and Sharpe ratio.

In [None]:
trading_days = 252  #  rough number of trading days per year

def max_drawdown_and_duration(equity: np.ndarray) -> tuple[float, int]:
    """Compute maximum drawdown and its duration (in periods)."""

    peak = np.maximum.accumulate(equity)
    dd = equity / peak - 1.0  #  drawdown series (<= 0)
    underwater = dd < 0.0
    max_dur = 0
    cur = 0
    for flag in underwater:
        if flag:
            cur += 1
            if cur > max_dur:
                max_dur = cur
        else:
            cur = 0
    return float(dd.min()), int(max_dur)


# equity curves on the effective window shared by y and strat_rets
eq_bh = np.cumprod(1.0 + y)
eq_strat = np.cumprod(1.0 + strat_rets)

# buy-and-hold metrics
ann_ret_bh = y.mean() * trading_days
ann_vol_bh = y.std(ddof=1) * np.sqrt(trading_days)
sharpe_bh = ann_ret_bh / ann_vol_bh if ann_vol_bh > 0.0 else np.nan
total_ret_bh = eq_bh[-1] - 1.0
max_dd_bh, dur_bh = max_drawdown_and_duration(eq_bh)

# strategy metrics
ann_ret_strat = strat_rets.mean() * trading_days
ann_vol_strat = strat_rets.std(ddof=1) * np.sqrt(trading_days)
sharpe_strat = (ann_ret_strat / ann_vol_strat
                if ann_vol_strat > 0.0 else np.nan)
total_ret_strat = eq_strat[-1] - 1.0
max_dd_strat, dur_strat = max_drawdown_and_duration(eq_strat)

summary = pd.DataFrame(
    {
        "ann_return": [ann_ret_bh, ann_ret_strat],
        "ann_vol": [ann_vol_bh, ann_vol_strat],
        "sharpe": [sharpe_bh, sharpe_strat],
        "total_return": [total_ret_bh, total_ret_strat],
        "max_drawdown": [max_dd_bh, max_dd_strat],
        "dd_duration": [dur_bh, dur_strat],
    },
    index=["buy_and_hold", "lag_ols_strategy"],
)
summary.round(3).T


## 6. Equity Curves and Benchmarks

Finally we compare the lagged-returns strategy against buy-and-hold and a coin-flip long/short benchmark.

In [None]:
eq_bh = np.cumprod(1.0 + y)  #  buy-and-hold equity
eq_strat = np.cumprod(1.0 + strat_rets)  #  strategy equity

rng = np.random.default_rng(seed=42)
coin_pos = rng.choice([-1.0, 1.0], size=y.shape[0])  #  random long/short
coin_rets = coin_pos * y
eq_coin = np.cumprod(1.0 + coin_rets)  #  coin-flip equity

fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(dates, eq_strat, label="Lag-OLS strategy")
ax.plot(dates, eq_bh, label=f"Buy & hold ({symbol})", ls='--', lw=1)
ax.plot(dates, eq_coin, label="Coin-flip long/short", ls='-.', lw=1)
ax.set_xlabel("date")
ax.set_ylabel("normalized equity")
ax.set_title(f"Vectorized lagged-returns strategy on {symbol}")
ax.legend(loc="best")
fig.tight_layout()
plt.show()


## 7. Extensions

The vectorized pattern in this notebook extends easily:

- change the number of lags or add further features (for example, moving-average signals),
- adjust the mapping from predictions to positions (thresholds, scaling, volatility targeting), and
- move from a single time series column to a multi-asset matrix for cross-sectional strategies.

The OOP backtest in the article wraps these steps into a reusable class; you can think of this notebook as the functional prototype that the class encapsulates.

<img src="https://hilpisch.com/tpq_logo_bic.png" width="20%" align="right">