# L9a: Portfolio Rebalancing Strategies
In this lecture, we will explore portfolio rebalancing strategies that maintain optimal asset allocations despite market-driven weight changes. We will examine how portfolio drift occurs, why it matters for portfolio performance, and how autonomous agent-based strategies can be used to rebalance portfolios effectively.

> __Learning Objectives:__
> 
> By the end of this lecture, you will be able to:
> * __Quantify portfolio drift:__ Calculate how asset weights change over time due to differing returns and explain why continuous rebalancing is impractical due to transaction costs and other market frictions.
> * __Formulate bandit-based portfolio selection:__ Model portfolio asset selection as a multi-armed bandit problem where each asset combination represents an arm, and apply the epsilon-greedy algorithm to balance exploration and exploitation in the asset selection problem.
> * __Solve utility maximization problems:__ Determine optimal asset quantities given fixed budgets and investor preferences using Cobb-Douglas utility functions and understand how risk-adjusted single index model parameters inform investor preference coefficients.

Let's get started!
___

## Examples

Today, we will be using the following examples to illustrate key concepts:

> [▶ Let's build a portfolio with risky and risk-free assets](CHEME-5660-L8b-Example-SIM-MinVar-RRFA-Fall-2025.ipynb). In this example, we construct a portfolio that includes both risky assets and a risk-free asset, such as a treasury STRIPs bond. We will use the single index model to optimize the portfolio, aiming to minimize risk while achieving a specified return. We'll explore how the inclusion of a risk-free asset affects the portfolio's risk-return profile, construct the Capital Allocation Line, and identify the tangent portfolio.

> [▶ Let's compute the drift of a risky asset maximum Sharpe portfolio.](CHEME-5660-L9a-Example-Portfolio-Drift-Fall-2025.ipynb). In this example, we'll construct select an optimal risky asset portfolio (computed using the single index model) and compute its drift over time. We'll analyze how the drift impacts the expected returns and the associated risk of the portfolio and discuss its implications for investment strategies.

> [▶ Let's think about bandit based rebalancing models](CHEME-5660-L9a-Example-INFORMS-Fall-2025.ipynb). In this example, we'll explore bandit-based rebalancing strategies for a portfolio of risky assets. We'll reformulate the portfolio rebalancing problem as a multi-armed bandit problem, where each __asset combination__ represents an arm of the bandit. We'll implement an epsilon-greedy algorithm to decide when and how to rebalance the portfolio based on observed returns, and analyze the performance of this strategy in terms of cumulative returns and risk.
___

<div>
    <center>
        <img src="figs/Fig-MinVar-RiskFree-Schematic.svg" width="680"/>
    </center>
</div>

## Concept Review: Risky and Risk-Free Assets
In the last lecture, we introduced the problem of portfolios with a mixture of risky and risk-free assets. We discussed the Capital Allocation Line (CAL), which represents the risk-return trade-off of a portfolio that combines a risk-free asset with a portfolio of risky assets. The slope of the CAL is determined by the __Sharpe Ratio__ of the risky asset portfolio, which measures the excess return per unit of risk.

The minimum variance portfolio problem for a portfolio $\mathcal{P}$ using the single index model with a combination of risky and risk-free assets is given by:
$$
\begin{align*}
\text{minimize}~\operatorname{Var}(g_{\mathcal{P}}) &= \sum_{i\in\mathcal{P}}\sum_{j\in\mathcal{P}}w_{i}w_{j}
\operatorname{Cov}\left(g_{i},g_{j}\right) \\
\text{subject to}~\mathbb{E}[g_{\mathcal{P}}]& = w_{f}g_{f}+\alpha_{\mathcal{P}}+\beta_{\mathcal{P}}\;\mathbb{E}[g_{M}] = {R^{*}}\\
\alpha_{\mathcal{P}} & = \sum_{i\in\mathcal{P}}w_{i}\;\alpha_{i}\\
\beta_{\mathcal{P}} & = \sum_{i\in\mathcal{P}}w_{i}\;\beta_{i} \\
w_{f}+\sum_{i\in\mathcal{P}}w_{i} & = 1 \\
w_{f}&\geq{0}\\
w_{i}&\geq{0}\qquad{\forall{i}\in\mathcal{P}}
\end{align*}
$$
where the covariance between assets $i$ and $j$ is given by:
$$
\begin{align*}
\operatorname{Cov}(g_{i}, g_{j}) & = \begin{cases}
\beta_{i}^{2}\sigma_{m}^{2}+\Delta{t}\;\sigma_{\epsilon_{i}}^{2} & i = j \\
\beta_{i}\beta_{j}\sigma_{m}^2 & i \neq j
\end{cases} \\
\end{align*}
$$

The terms $w_{i}\geq{0}$ denote the fraction of risky asset $i\in\mathcal{P}$, 
the quantity $w_{f}$ denotes the fraction of risk-free assets in the portfolio, 
$g_{f}$ denotes the risk-free rate or return, and $R^{*}$ is the minimum required growth rate (return) 
for the overall portfolio $\mathcal{P}$. 

> __The Separation Theorem__
>
> Every investor, regardless of their risk preferences, should hold the same risky portfolio, i.e., the tangent portfolio. What differs between investors is how much they allocate between this tangent portfolio and the risk-free asset. Risk-averse investors hold more of the risk-free asset, while risk-seeking investors may borrow money (making $w_f$ negative) to invest more than 100% of their wealth in the tangent portfolio.
>
> This result is known as the __Separation Theorem__ or __Two-Fund Separation Theorem__, originally developed by James Tobin in his 1958 paper on liquidity preference. [Tobin, James (1958). "Liquidity Preference as Behavior Towards Risk". The Review of Economic Studies, 25(2): 65-86.](https://doi.org/10.2307/2296205) The investment decision (which risky portfolio to hold) is separated from the financing decision (how much to borrow or lend at the risk-free rate).


Let's finish up our example of portfolios with risky and risk-free assets.

> __Example__
>
> [▶ Let's build a portfolio with risky and risk-free assets](CHEME-5660-L8b-Example-SIM-MinVar-RRFA-Fall-2025.ipynb). In this example, we construct a portfolio that includes both risky assets and a risk-free asset, such as a treasury STRIPs bond. We will use the single index model to optimize the portfolio, aiming to minimize risk while achieving a specified return. We'll explore how the inclusion of a risk-free asset affects the portfolio's risk-return profile, construct the Capital Allocation Line, and identify the tangent portfolio.

___

## CAL, the Tangent Portfolio, and the Sharpe Ratio

When we introduce a risk-free asset into our portfolio optimization problem, something remarkable happens: the efficient frontier transforms into a straight line in risk-return space. This line is called the __Capital Allocation Line (CAL)__.

The Capital Allocation Line describes all portfolios that can be formed by combining a risk-free asset (e.g., Treasury STRIPS) with return $g_f$ and zero variance, and a single optimal risky portfolio called the __tangent portfolio__ (denoted by $T$). The tangent portfolio has expected return $\mathbb{E}[g_T]$ and variance $\sigma_T^2$. Any portfolio on the CAL can be expressed as:
$$
\begin{align*}
\mathbb{E}[g_{\mathcal{P}}] &= w_f g_f + (1 - w_f)\;\mathbb{E}[g_T]\\
\sigma_{\mathcal{P}} &= (1 - w_f) \sigma_T
\end{align*}
$$
where $w_f$ is the fraction invested in the risk-free asset, $(1-w_f)$ is the fraction invested in the tangent portfolio, $\sigma_{\mathcal{P}} = \sqrt{\operatorname{Var}(g_{\mathcal{P}})}$ is the standard deviation of the portfolio return, and $\sigma_T = \sqrt{\operatorname{Var}(g_T)}$ is the standard deviation of the tangent portfolio return. To derive the CAL equation, we solve for $w_f$ from the second equation and substitute into the first:
$$
\begin{align*}
\sigma_{\mathcal{P}} &= (1 - w_f) \sigma_T\\
\frac{\sigma_{\mathcal{P}}}{\sigma_T} &= 1 - w_f\\
w_f &= 1 - \frac{\sigma_{\mathcal{P}}}{\sigma_T}
\end{align*}
$$
Substituting this expression for $w_f$ into the expected return equation:
$$
\begin{align*}
\mathbb{E}[g_{\mathcal{P}}] &= w_f g_f + (1 - w_f) \mathbb{E}[g_T]\\
&= \left(1 - \frac{\sigma_{\mathcal{P}}}{\sigma_T}\right) g_f + \frac{\sigma_{\mathcal{P}}}{\sigma_T} \mathbb{E}[g_T]\\
&= g_f - \frac{\sigma_{\mathcal{P}}}{\sigma_T} g_f + \frac{\sigma_{\mathcal{P}}}{\sigma_T} \mathbb{E}[g_T]\\
&= g_f + \frac{\sigma_{\mathcal{P}}}{\sigma_T} \left(\mathbb{E}[g_T] - g_f\right)\\
&= g_f + \underbrace{\left(\frac{\mathbb{E}[g_T] - g_f}{\sigma_T}\right)}_{\text{Sharpe ratio T.P.}}\;\sigma_{\mathcal{P}}\quad\blacksquare
\end{align*}
$$
This is a __linear relationship__ between expected return and risk. The slope of this line is the __Sharpe ratio of the tangent portfolio__, which measures the excess return per unit of risk for that optimal risky portfolio.

To find the tangent portfolio, we identify the risky portfolio that maximizes the Sharpe ratio. Geometrically, it's the point where a line from the risk-free rate is tangent to the risky-only efficient frontier. Using the single index model, we solve the optimization problem for the weights $w_i$ of the risky assets in portfolio $\mathcal{P}$ that maximizes the Sharpe ratio:
$$
\boxed{
\begin{align*}
\text{maximize} &\quad \frac{\mathbb{E}[g_{\mathcal{P}}] - g_f}{\sigma_{\mathcal{P}}} = \frac{\alpha_{\mathcal{P}} + \beta_{\mathcal{P}}\;\mathbb{E}[g_M] - g_f}{\sigma_{\mathcal{P}}}\\
\text{subject to} &\quad \sum_{i\in\mathcal{P}}w_{i} = 1\\
&\quad w_{i} \geq 0 \qquad \forall{i}\in\mathcal{P}
\end{align*}}
$$
The portfolio $\mathcal{P}$ that solves this optimization problem is the tangent portfolio $T$. Once found, its Sharpe ratio $\frac{\mathbb{E}[g_T] - g_f}{\sigma_T}$ becomes the slope of the CAL.

> __Practical Notes__
>
> __Concentration constraints:__ In practice, unconstrained tangent portfolio optimization often produces extreme weights in a few assets, especially with imprecise estimates. Practitioners commonly impose concentration constraints like $0 \leq w_i \leq u$ to force diversification. While these constraints theoretically reduce the Sharpe ratio, they often improve out-of-sample performance by making portfolios more robust to estimation error.

> __Tricky problem:__ Maximizing the Sharpe ratio turns out to be surprisingly tricky, because the objective function is a ratio of two functions that depend on the weights $w_i$ (with a quadratic in the denominator). The approach that we use to solve this problem is to transform it in a Second Order Cone Program (SOCP), which can be solved efficiently using modern optimization software, such [as the COSMO.jl package in Julia](https://github.com/oxfordcontrol/COSMO.jl.git).

__However, the Sharpe ratio is more general than just the Tangent Portfolio__. Any portfolio risky-asset $\mathcal{P}$ has its own Sharpe ratio, which is a measure of the risk-return trade-off of that specific portfolio. Thus, the Sharpe ratio provides a standardized (dimensionless) way to compare different portfolios or investment strategies, regardless of their individual compositions.

___

## Allocation Drift
Once we have established an optimal portfolio, market fluctuations can cause the asset weights to deviate from their target (optimal) allocations over time. This phenomenon, known as __portfolio drift__, can lead to unintended risk exposures and suboptimal performance.


### Why does drift occur?
The weight (or _dollar fraction_) of asset $i$ in portfolio $\mathcal{P}$ is given by:
$$
\omega_{i} = \frac{n_{i}\cdot{S}_{i}}{\sum_{j\in\mathcal{P}}n_{j}\cdot{S}_{j}}\qquad\forall{i}\in\mathcal{P}
$$
where $n_{i}$ denotes the number of shares of asset $i$, and $S_{i}$ denotes the share price of asset $i$. The numerator is the value of asset $i$ in the portfolio, while the denominator is the portfolio’s total value. Thus, because share prices change, the optimal allocation $\omega_{i}$ drifts over time if the number of shares of each asset stays the same. 

Suppose we have a portfolio $\mathcal{P}$ with $N$ risky assets, each initially allocated $w_{i}^{(0)}$ of the budget $W^{(0)}$ at time $t=0$, where the price of each asset at the time of allocation is given by $S_{i}^{(0)}$. After some time, the prices of these assets change, leading to new prices $S_{i}^{(t)}$ at time $t$. If we do not adjust the number of shares $n_{i}$ held in each asset, the new weights $w_{i}^{(t)}$ will be:
$$
\begin{align*}
w_{i}^{(t)} &= \frac{n_{i} \cdot S_{i}^{(t)}}{\sum_{j=1}^{N} n_{j} \cdot S_{j}^{(t)}} \\
&= \frac{\left(\frac{w_{i}^{(0)} \cdot W^{(0)}}{S_{i}^{(0)}}\right) \cdot S_{i}^{(t)}}{\sum_{j=1}^{N} \left(\frac{w_{j}^{(0)} \cdot W^{(0)}}{S_{j}^{(0)}}\right) \cdot S_{j}^{(t)}}\\
&= \frac{w_{i}^{(0)} \cdot \left(\frac{S_{i}^{(t)}}{S_{i}^{(0)}}\right)}{\sum_{j=1}^{N} w_{j}^{(0)} \cdot \left(\frac{S_{j}^{(t)}}{S_{j}^{(0)}}\right)}\\
& = \frac{w_{i}^{(0)} \cdot (1 + R_{i}^{(t)})}{\sum_{j=1}^{N} w_{j}^{(0)} \cdot (1 + R_{j}^{(t)})}
\end{align*}
$$
where $R_{i}^{(t)} = ({S_{i}^{(t)} - S_{i}^{(0)}})/{S_{i}^{(0)}}$ is the __fractional return__ of asset $i$ from time $0$ to time $t$. The new weight $w_{i}^{(t)}$ depends on the initial weight $w_{i}^{(0)}$ and the returns (price changes) of all assets in the portfolio. 

The only way for the weights to remain unchanged ($w_{i}^{(t)} = w_{i}^{(0)}$) is if all assets experience __exactly__ the same return over the time period, i.e., $R_{i}^{(t)} = R_{j}^{(t)}$ for all $i,j \in \mathcal{P}$ (which is highly unlikely in practice).

__TL;DR__: Even if we started with an optimal allocation, the weights will naturally drift away from their targets due to differing asset returns. But why does this matter?

> __Why does this matter?__: If we do nothing, our portfolio is only optimal in the minimum-variance sense until the next trade, i.e., until the next market tick in which any of the assets in the portfolio $\mathcal{P}$ experience a price change. Thus, to maintain a truly optimal portfolio, with our specific balance of risk and reward, we must rebalance it at every market tick. 

Continuous rebalancing is impractical for many reasons, e.g., transaction costs, taxes incurred from frequent trading, data processing requirements, etc. Therefore, we need to explore strategies that allow us to rebalance our portfolio effectively without the need for constant adjustments.

Let's do an example to illustrate portfolio drift and the need for rebalancing.

> __Example__
> 
> [▶ Let's compute the drift of a risky asset maximum Sharpe portfolio.](CHEME-5660-L9a-Example-Portfolio-Drift-Fall-2025.ipynb). In this example, we'll select an optimal risky asset portfolio (computed using the single index model) and compute its drift over time. We'll analyze how the drift impacts the expected returns and the associated risk of the portfolio and discuss its implications for investment strategies.

___

<div>
    <center>
        <img src="figs/Fig-1-System-Schematic.svg" width="680"/>
    </center>
</div>

## Online Agent-Based Rebalancing
One approach to portfolio rebalancing involves using online autonomous agent-based strategies. In this method, autonomous agents monitor market conditions and make real-time decisions to adjust the portfolio based on predefined rules or algorithms.

Let's look at some experimental ideas for online agent-based strategies for portfolio rebalancing. The approach addresses two questions: which assets to include in the portfolio (which we solve as a bandit problem), and how many shares of each asset to hold (which we solve as a utility maximization problem).

### What Is a Bandit Problem?  
A bandit problem is a class of online (sequential) decision-making tasks in which an agent repeatedly chooses among $K$ options (called __arms__) and receives a reward based on that choice.

The agent chooses from $K$ alternatives (somehow) and executes the desired action. At each round $t$, the agent pulls one arm and observes a reward $r_{t}$. Good pulls yield higher rewards; poor pulls yield lower rewards (or even losses). The agent’s goal is to maximize cumulative reward over time.

Here are a few examples of applications of bandit problems:
* __Clinical Trials__: Balances learning about new treatments (exploration) with assigning patients to the current best therapy (exploitation).
* __Financial Portfolio Design__: Dynamically allocates capital across assets to maximize returns while testing novel investments.
* __Adaptive Routing__: Chooses network paths to minimize delay, trading off probing unknown routes against using established fast ones.
* __Recommendation Systems__: Iteratively selects items to display—like movies or products—balancing novel suggestions against proven favorites.

__Additional Resources__: Our lecture notes for Bandits were inspired from Chapter 1 of "Introduction to Multi-Armed Bandits" by Aleksandrs Slivkins. This is an excellent resource (albeit quite technical) for learning more about bandit problems. [The book is available online](https://arxiv.org/abs/1904.07272). We also drew material from the [Bandit problem Thompson sampling tutorial by Russo et al., 2020](https://arxiv.org/abs/1707.02038).

Let's examine one of the simplest algorithms to solve bandit problems, the __epsilon-greedy algorithm__.

The agent has $K$ arms (choices), $\mathcal{A} = \left\{1,2,\dots,K\right\}$, and the total number of rounds is $T$. The agent uses the following algorithm to choose which arm to pull (which action to take) during each round:

For $t = 1,2,\dots,T$:
1. _Initialize_: Roll a random number $p\in\left[0,1\right]$ and compute a threshold $\epsilon_{t}\sim{t}^{-1/3}$. Note, in other sources, $\epsilon$ is a constant, not a function of $t$.
2. _Exploration_: If $p\leq\epsilon_{t}$, choose a random (uniform) arm $a_{t}\in\mathcal{A}$. Execute the action $a_{t}$ and receive a reward $r_{t}$ from the _adversary_ (nature). 
3. _Exploitation_: Else if $p>\epsilon_{t}$, choose action $a^{\star}$ (action with the highest average reward so far, the greedy choice). Execute the action $a^{\star}_{t}$ and receive a reward $r_{t}$ from the _adversary_ (nature).
4. Update list of rewards for $a_{t}\in\mathcal{A}$

__Theorem__: The epsilon-greedy algorithm with exploration probability $\epsilon_{t}={t^{-1/3}}\cdot\left(K\cdot\log(t)\right)^{1/3}$ achieves a regret bound of $\mathbb{E}\left[R(t)\right]\leq{t}^{2/3}\cdot\left(K\cdot\log(t)\right)^{1/3}$ for each round $t$.

### Maximum Utility Portfolio Optimization Problem
Suppose you have a collection of risky assets in portfolio $\mathcal{P}$.
The goal is to allocate a fixed budget $B$ across these assets to __maximize the utility__ of the portfolio, where the utility is directly informed by quantitative investor preferences measures.

> __Utility Function:__ A utility function $U:\mathbb{R}_{+}^{P}\to\mathbb{R}$ maps the number of shares of each asset in the portfolio to a real-valued utility score that reflects the investor's satisfaction with that allocation. We use a Cobb-Douglas utility function to model investor preferences:
> $$
\begin{align*}
U\left(n_{1},n_{2},\dots,n_{P}\right) = \kappa(\gamma)\prod_{i\in\mathcal{P}}n_{i}^{\gamma_{i}}
\end{align*}
$$
> where $\gamma_{i}\in\mathbb{R}$ is the __preference coefficient__ for asset $i$ (we need to estimate these values), and $\kappa(\gamma)$ is a leading coefficient that sets the scale of the utility function.

Let $n_{i}\in\left(n_{1},\dots,n_{P}\in\mathbb{R}_{+}\right)^{\top}$ be the __number of shares__ of asset $i$ in the portfolio (we need to estimate these values). 

> __Inclusion:__ Not every asset is included in the __final__ portfolio from the universe of possible assets; inclusion (or exclusion) of an asset is governed by the __binary action vector__ $\mathbf{a}_{j}$ specified by a bandit agent, where $a_{j} = 1$ if asset $j$ is included in the portfolio,  and $a_{j} = 0$ otherwise. 

An investment budget $B$ is allocated across the assets in the portfolio, where $p_{i}$ is the acquisition price of asset $i$ at the time of allocation. The optimal portfolio is the solution
of the utility maximization problem:
$$
\boxed{
\begin{align*}
    \underset{n_{1},\dots,n_{P}}{\text{maximize}} &\quad \kappa(\gamma)\prod_{i\in\mathcal{P}}n_{i}^{\gamma_{i}} \\
    \text{subject to}&\quad B =  \sum_{i\in\mathcal{P}}n_{i}\;{p}_{i}\\
    \epsilon\;{a_{i}}&\leq n_{i} \leq{a_{i}}\left(\frac{B}{p_{i}}\right)\quad{\forall{i}\in\mathcal{P}}\\
    a_{i} &\in\{0,1\}\quad{\forall{i}\in\mathcal{P}}\\
    \epsilon &\in\mathbb{R}_{+} \quad\text{(hyperparameter)}
\end{align*}}
$$
In this analysis, we assume that the budget $B$ is fixed during a time period, 
and that the prices $p_{i}$ are fixed at the time of allocation (and are bounded by the bid-ask spread), and we allow fractional shares of assets. 
Short selling is not allowed. 

The leading coefficient $\kappa(\gamma)$ sets the scale of the utility function. 
However, given that the utility is ordinal, we can set the scale of the utility function to an arbitrary value, 
e.g., $\kappa \pm {1}$, where $\kappa = 1$ if all the $\gamma_{i}$ coefficients are positive, 
and $\kappa = -1$ if $\textit{any}$ of the $\gamma_{i}$ coefficients are negative.

#### Analytical Solution
The share optimization problem has an __analytical solution__. 
Let $S = \left\{i\mid\mathbf{a} = 1\right\}$ be the set of assets in the portfolio; $S_{+} = \left\{i\mid\gamma_{i}>0\right\}$ be the set of preferred assets, $S_{-} = \left\{i\mid\gamma_{i}<0\right\}$ be the set of non-preferred assets.
Then, the optimal maximum utility portfolio given the action $\mathbf{a}$, the budget $B$, user preferences $\gamma_{i}$, 
and the acquisition share price $p_{i}$ of asset $i$ is given by:
$$
\begin{align*}
n_{i}^{\star} & = \begin{cases}
\left(\frac{\gamma_{i}}{\sum_{j\in{S}_{+}}\gamma_{j}}\right)\;\frac{B - \epsilon\sum_{k\in{S}_{-}}p_{k}}{p_{i}} & \forall{i}\in{S}_{+}\\
\epsilon & \forall{i}\in{S}_{-}
\end{cases}\quad\blacksquare    
\end{align*}
$$
where $n_{i}^{\star}$ is the optimal number of shares of asset $i$ in the portfolio.

#### Investor Preference Model
The $\gamma_{i}$ coefficients reflect the relative importance of each asset in generating utility for the investor. These coefficients can incorporate market conditions, sentiment, and other asset-specific information through an $m$-dimensional feature vector $\mathbf{x}_{i}\in\mathbb{R}^{m}$:
$$
\begin{align*}
\gamma_{i} & = \sigma\left(\mathbf{x}^{\top}_{i}\theta_{i}\right)\quad\forall{i}\in\mathcal{P}
\end{align*}
$$
where $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ is an activation function such that $\sigma_{\theta}(x)\in[-1,1]$,
and $\mathbf{\theta}_{i}\in\mathbb{R}^{p}$ denotes the feature weights that can be learned from data or set based on subjective beliefs.

For a concrete implementation, we use a single index model with feature vector $\mathbf{x}_{i} = \left(1,\mathbb{E}(\bar{g}_{m})\right)$, where $\mathbb{E}(\bar{g}_{m})$ is the expected growth rate of the market portfolio.
The parameters $\theta_{i} = \left(\alpha_{i},\beta_{i}\right)$ represent firm-specific growth and relative risk with respect to the market portfolio.

We risk-adjust the single index model by dividing by $\beta_{i}^{\lambda}$, where $\lambda \geq 0$ is a risk aversion parameter that controls how much the investor penalizes higher-risk assets. 
When $\lambda = 0$, there is no risk adjustment; when $\lambda = 1$, the preference is inversely proportional to systematic risk.
Using the $\texttt{tanh}$ activation function, the coefficients are modeled as:
$$
\begin{align*}
    \gamma_{i} &= \texttt{tanh}\left(\alpha_{i}/{\beta_{i}^{\lambda}}+\beta^{1-\lambda}_{i}\cdot\mathbb{E}(\bar{g}_{m})\right)\quad\forall{i}\in\mathcal{P}\Longrightarrow{-1<\gamma_{i}<1}
\end{align*}
$$
Assets with positive expected risk-adjusted growth rates yield $\gamma_{i} > 0$ (preferred), while those with negative rates yield $\gamma_{i}<0$ (non-preferred).

Let's play around with an example of the epsilon-greedy algorithm for portfolio rebalancing.

> __Example__
> 
> [▶ Let's think about bandit based rebalancing models](CHEME-5660-L9a-Example-INFORMS-Fall-2025.ipynb). In this example, we'll explore bandit-based rebalancing strategies for a portfolio of risky assets. We'll reformulate the portfolio rebalancing problem as a multi-armed bandit problem, where each __asset combination__ represents an arm of the bandit. We'll implement an epsilon-greedy algorithm to decide when and how to rebalance the portfolio based on observed returns, and analyze the performance of this strategy in terms of cumulative returns and risk.
___

## Summary
In this lecture, we examined portfolio rebalancing strategies for maintaining optimal allocations in dynamic markets. We analyzed portfolio drift, introduced bandit algorithms for real-time portfolio decisions, and derived analytical solutions for utility-based asset allocation.

> __Key Takeaways:__
>
> * __Portfolio drift is inevitable without rebalancing:__ Asset weights naturally diverge from target allocations due to differing returns, causing unintended risk exposures. Continuous rebalancing is impractical due to transaction costs and taxes, necessitating strategic rebalancing policies.
> * __Bandit algorithms balance exploration and exploitation:__ The epsilon-greedy algorithm provides a tractable approach to portfolio rebalancing decisions, allowing dynamic asset selection while managing the tradeoff between testing new asset combinations and committing to proven performers.
> * __Utility maximization incorporates investor preferences:__ Cobb-Douglas utility functions combined with risk-adjusted single index model parameters enable systematic portfolio construction that reflects both asset fundamentals and individual investor risk tolerance.

In the next lecture, we will explore additional portfolio management techniques.
___

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance. Only risk capital that is not required for living expenses should be used.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on evaluating your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.

___