In [2]:
from collections import defaultdict
from scipy.linalg import eig 

## Markov Processes:

A set of states $S$ and a state transition probability matrix $P$ s.t. the coefficients are $\mathbb{P}[X_{n+1} \mid X_n] = \mathbb{P}[X_{n+1} \mid X_1, \dots, X_n]$

In [3]:
class MP:
    def __init__(self, P):
        self.transition_matrix = P
    
    def stationary_distribution(self):
        S, U = eig(self.transition_matrix.T)
        stationary = np.array(U[:, np.where(np.abs(S - 1.) < 1e-8)[0][0]].flat)
        stationary /= np.sum(stationary)
        return stationary

## Markov Reward Processes

A set of states $S$, a state transition probability matrix $P$, a reward function $R$ s.t. $R(s) = \mathbb{E}[R_{n+1} \mid S_n = s]$, and a discount factor $\gamma \in [0, 1]$

The state value function $v(s) = \mathbb{E}[G_t \mid S_t = s]$ of an MRP is the expected return starting from state $s$, where $G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1}$ is the total discounted reward from time $t$

In [None]:
class MRP(MP):
    def __init__(self, P, R, gamma):
        MP.__init__(self, P)
        self.reward = R
        self.gamma = gamma

In [None]:
class MRP_2(MP):
    def __init__(self, P, R_m, gamma):
        super().__init__(self, P)
        self.transition_reward = R_m
        self.gamma = gamma
        self.state_number = self.transition_matrix.shape[0]
    
    def get_reward_per_state(self):
        self.reward = []
        for s in range(self.state_number):
            self.reward.append(sum([np.array(self.transition_matrix)[s][s_p]*np.array(self.transition_reward)[s][s_p] \
                                    for s_p in range(self.state_number)]))

Bellman Equation for MRP:
$$ v(s) = \mathbb{E}[R_{t+1} + \gamma v(S_{t+1}) \mid S_t = s] = R(s) + \gamma \sum_{s' \in S} P(s, s') v_\pi(s')$$

Matrix form of the Bellman Equation for MRP:
$$ v = R + \gamma P v$$ 

# Markov Decision Processes

A set of states $S$, a state transition probability matrix $P$, a reward function $R$ s.t. $R(s) = \mathbb{E}[R_{n+1} \mid S_n = s]$, a discount factor $\gamma \in [0, 1]$, and a finite set of actions $A$. MDPs are similar to MRPs, but with actions.

A policy $\pi$ is a probability distribution of the actions given a state: $\pi(a\mid s) = \mathbb{P}[A_t = a \mid S_t = s]$

A value function $v_\pi$ for a given policy $\pi$ is the expected return from a state $s$ that is obtained by following the policy $\pi$: $v_\pi(s) = \mathbb{E}_\pi(G_t \mid S_t = s)$

In [1]:
class MDP():
    def __init__(self, gamma, P_list=None, R_list=None, mrp_list=None):
        self.gamma = gamma
        if mrp_list != None:
            self.mrp_list = mrp_list
        else:
            self.mrp_list = [MRP(P_list[i], R_list[i], gamma) for i in range(len(P_list))]
    
    def get_mrp(self, policy):
        P = np.zeros(self.mrp_list[0].state_number, self.mrp_list[0].state_number)
        R = np.zeros(self.mrp_list[0].state_number)
        for s in range(self.mrp_list[0].state_number):
            R[s] = sum(policy.proba_matrix[a, s] * MRP[a].reward[s])
            for s_p in range(self.mrp_list[0].state_number):
                P[s, s_p] = sum([policy.proba_matrix[a, s] * MRP[a].transition_matrix[s, s_p] for a in range(len(self.mrp_list))])
        return MRP(P, R, self.gamma)
        
class Policy():
    def __init__(self, P):
        self.proba_matrix = P

In [2]:
class MDP_2():
    def __init__(self, P_list, R_list, gamma):
        self.gamma = gamma
        self.mrp2_list = [MRP_2(P_list[i], R_list[i], gamma) for i in range(len(P_list))]
    
    def get_reward_per_state(self):
        return MDP(self.gamma, None, None, mrp_list = [mrp_2.get_reward_per_state() for mrp_2 in mrp2_list])

Bellman Expectation Equations: 
$$v_\pi(s) = \sum_{a\in A} \pi(a\mid s) q_\pi(s,a) $$
$$q_\pi(s,a) = R(s, a) + \gamma \sum_{s' \in S} P(s, s', a) v_\pi(s') $$
$$v_\pi(s) = \sum_{a\in A} \pi(a\mid s) [R(s, a) + \gamma \sum_{s' \in S} v_\pi(s')] $$
$$q_\pi(s,a) = R(s, a) + \gamma \sum_{s' \in S} P(s, s', a) [\sum_{a'\in A} \pi(a'\mid s') q_\pi(s',a')] $$

Matrix Form of the Bellman Expectation Equation:
$$v_\pi = R^\pi + \gamma R^\pi v_\pi $$

Bellman Optimality Equations: 
$$v_*(s) = \max_a{q_*(s,a)} $$
$$q_*(s,a) = R(s, a)  + \gamma \sum_{s' \in S} P(s, s', a) v_*(s') $$
$$v_*(s) = \max_a [R(s, a)  + \gamma \sum_{s' \in S} P(s, s', a) v_*(s')] $$
$$q_*(s,a) = R(s, a)  + \gamma \sum_{s' \in S} P(s, s', a) \max_{a'}{q_*(s',a')} $$

# CARA

Absolute Risk-Aversion $A(x) = \frac{-U^{''}(x)}{U^{'}(x)} = a$ 

$U(x) = −e^{−ax}$

$E[U(x)] = 
\begin{cases}
\mu & \text{if } a = 0 \\
e^{(-a \mu + a^2 \sigma^2 / 2)/a} & \text{else}
\end{cases}$ 


$x_{CE} = µ −aσ^2/2$

Absolute Risk Premium $π_A = µ − x_{CE} =aσ^2/2$

Relative Risk Premium $π_R = π_A/\mathbb{E}(x)$

# CRRA

Relative Risk Aversion $R(x) = \frac{-U^{''}(x)x}{U^{'}(x)} = \gamma$

$U(x) = \frac{x^{1-\gamma}}{1-\gamma}$

$E[U(x)] = 
\begin{cases}
\mu & \text{if } \gamma = 1 \\
\frac{e^{\mu(1-\gamma) + \sigma^2 (1-\gamma)^2 / 2}}{1 - \gamma} & \text{else}
\end{cases}$ 

$x_{CE} = e^{\mu + \sigma^2 (1-\gamma)/2}$

Absolute Risk Premium $π_A = π_R \times \mathbb{E}(x)$

Relative Risk Premium $\pi_R = 1 - e^{-\sigma^2\gamma / 2}$

# Merton Portfolio Problem

Optimal fraction of risky asset in CARA model : $π^∗ = \frac{µ − r}{a σ^2}$

Optimal fraction of risky asset in CRRA model : $π^∗ = \frac{µ − r}{\gamma σ^2}$

Merton's portfolio problem can be formulated as the following MDP:

- S is [Current Time, Current Holdings, Current Prices]
- A is [Allocation Quantities, Consumption Quantity]
- R is Utility of Consumption less Transaction Costs
- T governed by risky asset movements
- $\gamma$ discount factor 

worked with mdopham and csaad