# Goal
- A trader needs to decide how many options on SPY and XLE to buy/sell for different strikes 

- All options maturities is assumed to be the same, say 1 week, or 1 day, and such period represents the length of the time step

- The goal is to replicate or maybe even outperform a benchmark composed of SPY (denoted by $X$) and XLE (denoted by $Y$)

- Specifically, we want to replicate with options on $X$ and $Y$ the payoff
\begin{align*}
\xi(X,Y) = \frac{n_XX}{n_XX+n_YY}X+\frac{n_YY}{n_XX+n_YY}Y
\end{align*},
where $n_X$ and $n_Y$ are the numbers of shares outstanding for $X$ and $Y$ respectively

- The idea is that when inflation is high, XLE provides a good hedge for it, while if inflation is low, one does not want to miss growth in the stock market and so one trades SPY

- Short selling options is allowed, but there is no risk free asset

In [8]:
import numpy as np
from scipy.stats import norm
import matplotlib
import scipy
import gymnasium as gym
from gymnasium.spaces import Box, Discrete, multi_binary

In [11]:
def BSprice(S: list[float], k0: list[float], k1: list[float], r: float, T: float, sigma: float) -> list[float]:
    O = []
    N = len(k0)
    for n in range(N):
        d1 = ( np.log(k0(n)/S[0]) + 0.5*sigma*sigma*np.sqrt(T) ) / sigma*np.sqrt(T)
        d2 = d1 - sigma*np.sqrt(T)
        P = norm.cdf(d1)*S[0]-norm.cdf(d2)*k0[n]*np.exp(-r*T)
        O.append(P)
    N = len(k1)
    for n in range(N):
        d1 = ( np.log(k1(n)/S[1]) + 0.5*sigma*sigma*np.sqrt(T) ) / sigma*np.sqrt(T)
        d2 = d1 - sigma*np.sqrt(T)
        P = norm.cdf(d1)*S[1]-norm.cdf(d2)*k1[n]*np.exp(-r*T)
        O.append(P)
    return O



In [9]:
class BenchmarkReplication(gym.Env):
    """
    Methods:
        step: to buy or sell any option on SPY and XLE and reveal next prices
        reset: reset the game
    
    Attributes:
        N: There are N/2 options available to trade for each underlying asset, maturing the next period. Then the player needs to make N-1 decisions.
        cash: The cash for the player. The inital cash is W. The game stops when cash is negative
        pos: The current position. The initial position is zero and it can be negative.
        time: The current time. The value is in [0,T-1].
        time_series: The price path with length T.
    """

    def __init__(self, W: float, N: int, sigma: float, mu: float, start_time = 0, T: float = 26, dT: float = 1, r: float = 0):
        self.W0 = W
        self.W = W # Wealth
        self.N = N # There are N options. Then the player needs to make N-1 decisions.
        self.sigma = sigma
        self.mu = mu
        self.start_time = start_time
        self.T = T
        self.dT = dT
        self.action_spaces = Box(low = -np.inf, high = np.inf, shape = (N-1,))
        self.observation_space = Box(low = -np.inf, high = np.inf, shape = (N-1+2,)), # prices of N options and 2 underlying assets 
        self.p = np.zeros(N)
        S = [100,100] #underlying asset prices are initialized at 100
        kmin = 0.7*S 
        kmax = 1.3*S
        k0 = np.linspace(kmin[0],kmax[0],N/2)
        k1 = np.linspace(kmin[1],kmax[0],N/2)
        O = BSprice(S,k0,k1,r,dT,sigma)
        
    def step(self, action):
        T = self.T
        N = self.N
        p = self.p
      
        if self.time < T-1: # If it is the final moment, we liquidate all the positions
            self.time += 1
            S = self.time_series[self.time]
            kmin = 0.7*S 
            kmax = 1.3*S
            k0 = np.linspace(kmin[0],kmax[0],N/2)
            k1 = np.linspace(kmin[1],kmax[0],N/2)
            payoff = 0
            for n in range(N/2):
                payoff = payoff + action[0][n]*max(S[0] - k0[n],0) + action[1][n]*max(S[1] - k1[n],0)
            payoff = payoff + action[0][N/2-1]*max(S[0] - k0[N/2-1],0) + (self.W-sum(action))*max(S[1] - k1[n],0)
            
            self.W[1] += payoff + self.cash - S[0]*S[0]/sum(S) - S[1]*S[1]/sum(S) #The benchmark formed by SPY and XLE is subtracted
            return (payoff, self.time, S)
        else:
            raise IndexError("The game is over! Please reset the game.")

            # we return cash, position,
        
        
    def reset(self):
        T = self.T
        N = self.N
        self.p = np.zeros(N)
        self.W = self.W0 # the initial cash
        self.time = self.start_time # the current time is zero
        
        
        eps0 = [self.sigma[0]*np.random.rand(1) for i in range(T)]
        eps1 = [self.sigma[1]*np.random.rand(1) for i in range(T)]
        
        self.time_series = [[sum(eps0[:i]), sum(eps1[:i])] for i in range(T)]

        return (self.cash,self.time,self.time_series[:self.time+1])

    def render(self):
        pass

This should serve as basic model to construct a portfolio of options to replicate the SPY/XLE benchmark.
Some things to do inclulde:
- playing a game (e.g., def a ``play`` function that first reset the game and subsequently runs step until game ends)  (Note: there are surely errors in the code above as I have not tested)
- adding correlation in the generation of the time series
- run the game over time series of market data
- determine optimal length of each time step (i.e. optimal maturities to trade)