This program is to compute the PnL attributed to each of the factors on each day, and reports the totals. <br>

Suppose that there are $N$ stocks, $T$ days and $k$ factors.

Let $\displaystyle r_{n, i}$ be the return of the $n$-th stock on the $i$-th day, <br>
$\displaystyle  \quad \ s_{n, i}$ be the number of shares held in the $n$-th stock on the $i$-th day, <br>
$\displaystyle  \quad \ f_{j, n, i}$ be the loading of the $j$-th factor in the $n$-th stock on the $i$-th day.

The algorithm can be described as below: <br>

For each day (loop i from 1 to T):

- Get the information of the price of stocks $\displaystyle p_{\cdot,i}$, number of shares of the stocks, $\displaystyle s_{\cdot,i}$, and factor loadings $f_{\cdot,\cdot, i}$ on the $i$-th day. (For example, $\displaystyle s_{\cdot,i} = \begin{bmatrix} s_{1, i} \\ \vdots \\ s_{n, i} \end{bmatrix}$)
- Compute the return of stocks $\displaystyle r_{\cdot,i}$ by computing the difference between the stock price on the $i$-th day and $i - 1$-th day: $\displaystyle r_{\cdot,i} = p_{\cdot,i} - p_{\cdot,i - 1}$
- Apply GLS regression to $\displaystyle r_{\cdot,i}, f_{\cdot,\cdot, i}$, so we could write that: $\displaystyle r_{\cdot,i} = \sum_{j = 1}^k b_j f_{j, \cdot, i} + \epsilon, \, f_{j, \cdot, i} = \begin{bmatrix} f_{j, 1, i} \\ \vdots \\ f_{j, n, i} \end{bmatrix}$
- For each factor (loop j from 1 to k), the PnL attributed on it on the $i$-th day will be: $\displaystyle \text{PnL}_{j, i} = b_j (f_{j, \cdot, i})^Ts_{\cdot, i}$
- The total PnL on the $i$-th day will be: $\displaystyle \sum_{j = 1}^k\text{PnL}_{j, i}$

In [8]:
# Question 4, attribute PnL to a set of given factors over time

import numpy as np
import statsmodels.api as sm

def PnL(prices, loadings, shares, price0 = 0):
    
    '''
    the program that computes the PnL attributed to each of the factors on each day and reports the totals
    Parameters:
       prices: An N*T matrix that represents the prices for a set of N stocks on each of T days, each row is a stock 
       loadings: A k*N*T matrix that represents the loadings of each of the k factors to each of the N stocks on each 
                 of the T days
       shares: An N*T matrix that represents the number of shares held in the portfolio, in each of the N stocks, on each 
               of the T days, each row is a stock
       price0: A length N array that represents the prices for a set of N stocks on day 0, default will be a zero array
    Return: returns a tuple that have two variables
       PnL: An k*T matrix that shows the PnL on each of the k factors on each of the T days
       totalPnL: A length T array that shows the total PnL on each of the T days
    '''
    
    # extract the number of factors, stocks and days
    k = len(loadings)
    N = len(prices)
    T = len(prices[0])
    # initialize arrays to store the result
    PnL = np.zeros((k, T))
    totalPnL = np.zeros(T)
    # if there is no initial price of each of the stock, assume their initial prices are all zero
    if (len(price0) == 1 and (not price0)):
        price0 = np.zeros(N)

    # loop each of the day
    for i in range(T):
        # the returns on the i-th day
        if (not i):
            dayReturn = (prices[:,i] - price0).reshape(N, 1)
        else:
            dayReturn = (prices[:,i] - prices[:,i-1]).reshape(N, 1)
        # the loadings of each of the k factors to each of the N stocks on the i-th day
        dayLoading = loadings[:,:,i].T
        # the number of shares held in the portfolio on the i-th day
        dayShare = shares[:,i]
        # apply GLM so we could write returns as: r = b1f1 + ... + bkfk
        GLMresults = sm.GLM(endog = dayReturn, exog = dayLoading, family=sm.families.Gaussian()).fit()
        GLMparams = GLMresults.params

        # loop each of the factors
        for j in range(k):
            # compute PnL on the k-th factor on the i-th day
            dayLoadingk = dayLoading[:,j]
            PnL[j, i] = GLMparams[j] * np.dot(dayLoadingk, dayShare)
        # compute totalPnL on the i-th day
        totalPnL[i] = np.sum(PnL[:,i])
        
    return (PnL, totalPnL)

In [11]:
# Test if the program works as expected
# In this example, there are two factors, three stocks and four days
prices = np.array([[1.02, 1.05, 1.07, 1.11], [1.03, 1.06, 1.09, 1.11], [1.03, 1.04, 1.08, 1.10]])
price0 = np.array([1, 1, 1])
loadings = np.array([[[0.25, 0.35, 0.4, 0.3], [0.35, 0.45, 0.35, 0.25], [0.4, 0.2, 0.25, 0.45]], 
                    [[0.5, 0.5, 0.2, 0.5], [0.3, 0.3, 0.3, 0.3], [0.2, 0.2, 0.5, 0.2]]])
shares = np.array([[100, 95, 98, 103], [100, 102, 98, 99], [100, 103, 104, 98]])

PnLresult = PnL(prices, loadings, shares, price0)
# display the result
print("The PnL attributed to the first factor on each day is:", PnLresult[0][0])
print("The PnL attributed to the second factor on each day is:", PnLresult[0][1])
print("The total PnL on each day is:", PnLresult[1])

2 3 4
The PnL attributed to the first factor on each day is: [7.71428571 4.80277778 1.70571429 1.20231254]
The PnL attributed to the second factor on each day is: [0.28571429 2.37611111 7.35857143 6.95074982]
The total PnL on each day is: [8.         7.17888889 9.06428571 8.15306237]
