### Here is the code for problem 1-2

### Problem 1
    Given the dataset in DailyPrices.csv, for the stocks SPY, AAPL, and EQIX

    A. Calculate the Arithmetic Returns. Remove the mean, such that each series has 0 mean.
    Present the last 5 rows and the total standard deviation.
    
    B. Calculate the Log Returns. Remove the mean, such that each series has 0 mean.
    Present the last 5 rows and the total standard deviation.

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
df = pd.read_csv('DailyPrices.csv', parse_dates=['Date'], index_col='Date')
stocks = ['SPY', 'AAPL', 'EQIX']
arithmetic_returns = df[stocks].pct_change().dropna()
arithmetic_returns = arithmetic_returns - arithmetic_returns.mean()

log_returns = np.log(df[stocks] / df[stocks].shift(1)).dropna()
log_returns = log_returns - log_returns.mean()
print("Arithmetic Returns - Last 5 Rows:")
print(arithmetic_returns.tail())
print("\nArithmetic Returns - Total Standard Deviation:")
print(arithmetic_returns.std())

print("\nLog Returns - Last 5 Rows:")
print(log_returns.tail())
print("\nLog Returns - Total Standard Deviation:")
print(log_returns.std())

Arithmetic Returns - Last 5 Rows:
                 SPY      AAPL      EQIX
Date                                    
2024-12-27 -0.011492 -0.014678 -0.006966
2024-12-30 -0.012377 -0.014699 -0.008064
2024-12-31 -0.004603 -0.008493  0.006512
2025-01-02 -0.003422 -0.027671  0.000497
2025-01-03  0.011538 -0.003445  0.015745

Arithmetic Returns - Total Standard Deviation:
SPY     0.008077
AAPL    0.013483
EQIX    0.015361
dtype: float64

Log Returns - Last 5 Rows:
                 SPY      AAPL      EQIX
Date                                    
2024-12-27 -0.011515 -0.014675 -0.006867
2024-12-30 -0.012410 -0.014696 -0.007972
2024-12-31 -0.004577 -0.008427  0.006602
2025-01-02 -0.003392 -0.027930  0.000613
2025-01-03  0.011494 -0.003356  0.015725

Log Returns - Total Standard Deviation:
SPY     0.008078
AAPL    0.013446
EQIX    0.015270
dtype: float64


### Problem 2

Given the dataset in DailyPrices.csv, you have a portfolio of

    ● 100 shares of SPY
    ● 200 shares of AAPL
    ● 150 shares of EQIX
A. Calculate the current value of the portfolio given today is 1/3/2025

B. Calculate the VaR and ES of each stock and the entire portfolio at the 5% alpha level
assuming arithmetic returns and 0 mean return, for the following methods:

    a. Normally distributed with exponentially weighted covariance with lambda=0.97
    b. T distribution using a Gaussian Copula
    c. Historic simulation using the full history.
    C. Discuss the differences between the methods.

In [4]:
date = '2025-01-03'
prices = df.loc[date, stocks]
portfolio = {'SPY': 100, 'AAPL': 200, 'EQIX': 150}
shares=np.array([100, 200, 150])
portfolio_value = sum(prices[stock] * portfolio[stock] for stock in stocks)
p_weights=np.array([portfolio[stock] * prices[stock] / portfolio_value for stock in stocks])
print(p_weights)
print(f"current value for {date} is: ${portfolio_value:.2f}")

[0.23502904 0.1932483  0.57172266]
current value for 2025-01-03 is: $251862.50


In [5]:
### EWM from Project01, Problem05
data_1=arithmetic_returns.drop(arithmetic_returns.columns[0], axis=1)

lambda_1= 0.97

# weight_list


def weight_t_i(i):
    global lambda_1
    return (1-lambda_1)*lambda_1**(i)

def weight_list(len):
    weights =[]
    for i in range(1, len+1):
        weights.append(weight_t_i(len-i))
    weights=np.array(weights)
    weights=weights/np.sum(weights)
    return weights

weights=weight_list(len(data_1))


def ewcov(data1, data2):
    global weigthts
    data1 = data1.dropna()
    data2 = data2.dropna()
    data1 = data1.reset_index(drop=True)
    data2 = data2.reset_index(drop=True)
    data1 = data1.to_numpy()
    data2 = data2.to_numpy()
    data1 = data1.flatten()
    data2 = data2.flatten()
    len_data1 = len(data1)
    len_data2 = len(data2)
    if len_data1 != len_data2:
        print('data1 and data2 have different length')
        return

    mean1 = np.mean(data1)
    mean2 = np.mean(data2)
    return (data1-mean1) @ np.diag(weights) @ (data2-mean2)


def out_ewm(data):
    init_cov=data.cov()
    for i in init_cov.columns:
        for j in init_cov.columns:
            init_cov.loc[j,i] = ewcov(data[i], data[j])
    return init_cov

evm_cov=out_ewm(arithmetic_returns)
print(evm_cov)

           SPY      AAPL      EQIX
SPY   0.000072  0.000054  0.000052
AAPL  0.000054  0.000140  0.000038
EQIX  0.000052  0.000038  0.000153


In [6]:
## a.

alpha_level=0.05
z_alpha=stats.norm.ppf(1-alpha_level)
portfolio_mean=0
portfolio_std=np.sqrt(p_weights.T @ evm_cov @ p_weights)
VaR=z_alpha*portfolio_std*portfolio_value
ES=portfolio_std*stats.norm.pdf(z_alpha)*portfolio_value/alpha_level
results = pd.DataFrame(columns=['VaR  $', 'Expected Shortfall  $'])
results.loc['Portfolio'] = [VaR, ES]
for stock in stocks:
    stock_var = z_alpha * np.sqrt(evm_cov.loc[stock, stock]) * portfolio[stock] * prices[stock]
    stock_es = np.sqrt(evm_cov.loc[stock, stock]) * portfolio[stock] * prices[stock] * stats.norm.pdf(z_alpha) / alpha_level
    results.loc[stock] = [stock_var, stock_es]

print(results)

                VaR  $  Expected Shortfall  $
Portfolio  3856.321669            4835.982950
SPY         827.848763            1038.155747
AAPL        946.076369            1186.417935
EQIX       2933.512216            3678.742668


In [12]:
## b.
## given a vector of SPY AAPL and EQIX returns, fit into T -distribution
## Step 1: map the vector through the T-distribution CDF to (0,1);
## Step 2: map the (0,1) using the normal quantile function
## Step 3: Using spearman rank correlation to get the correlation matrix
## Step 4: Using the correlation matrix to simulate.


t_params={}
for stock in stocks:
    t_params[stock]=stats.t.fit(arithmetic_returns[stock],method='mle')
U=pd.DataFrame()
for stock in stocks:
    df,loc,scale=t_params[stock]
    U[stock]=stats.t.cdf(arithmetic_returns[stock], df, loc, scale)
normal_quantile=U.apply(lambda x: stats.norm.ppf(x))
spearman_rank=normal_quantile.corr(method='spearman')
np.random.seed(123)
n_samples=10000
copula_sim =stats.multivariate_normal.rvs(mean=np.zeros(3), cov=spearman_rank, size=n_samples)
sim_returns=np.zeros_like(copula_sim)
for i, stock in enumerate(stocks):
    df,loc,scale=t_params[stock]
    sim_returns[:,i]=stats.t.ppf(stats.norm.cdf(copula_sim[:,i]), df, loc, scale)
sim_port_t=sim_returns @ np.diag(prices) @ np.diag(shares)
sim_sum_t=sim_port_t.sum(axis=1)
var_5_t=-np.percentile(sim_sum_t, 5)
ES_5_t=-np.mean(sim_sum_t[sim_sum_t<=-var_5_t])

results_t=pd.DataFrame(columns=['VaR 5% $', 'Expected Shortfall 5% $'])
results_t.loc['Portfolio']=[var_5_t,ES_5_t]
for i, stock in enumerate(stocks):
    stock_t=sim_port_t[:,i]
    stock_var_t=-np.percentile(stock_t, 5)
    stock_es_t=-np.mean(stock_t[stock_t<=-stock_var_t])
    results_t.loc[stock]=[stock_var_t,stock_es_t]
print(results_t)

              VaR 5% $  Expected Shortfall 5% $
Portfolio  4370.396952              6015.125106
SPY         776.069970              1029.625609
AAPL       1060.162802              1508.531772
EQIX       3394.449844              4774.627478


In [14]:
## c. Directly use the history as the simulation
## make N random draws( rows) from the historical data (arithmetic_returns)
sim_returns_his=arithmetic_returns
sim_port_his=sim_returns_his @ np.diag(prices) @ np.diag(shares)  
sim_port_his.columns=stocks
sim_sum_his=sim_port_his.sum(axis=1)
var_5_his=-np.percentile(sim_sum_his, 5)
ES_5_his=-np.mean(sim_sum_his[sim_sum_his<=-var_5_his])
results_his=pd.DataFrame(columns=['VaR 5% $', 'Expected Shortfall 5% $'])
results_his.loc['Portfolio']=[var_5_his,ES_5_his]
for i, stock in enumerate(stocks):
    stock_his=sim_port_his[stock]
    stock_var_his=-np.percentile(stock_his, 5)
    stock_es_his=-np.mean(stock_his[stock_his<=-stock_var_his])
    results_his.loc[stock]=[stock_var_his,stock_es_his]

print(results_his)

              VaR 5% $  Expected Shortfall 5% $
Portfolio  4575.034060              6059.387076
SPY         872.403863              1080.104204
AAPL       1067.114956              1437.785272
EQIX       3635.077091              4714.893996


In [9]:
## C. Compare the methods
'''  
    Here goes the comparison of the three methods.
    
    Method 1: EWM, it generates the least loss, maybe because it assumes a more ideal nora distribution condition. 
             It calcultes the covariance matrix using the EWM method, which is more sensitive to the recent data.
    Method 2: T-distribution, it generates a result similar to historical simulation.
    Method 3: Historical simulation, it simple uses history data without any modeling to estimate the var and risk.
    
    There are many minor differences between method 2 and method 3, because they are mainly uniform based on the same
    data scope. Yet the ewm is a lot different because it focuses more on the recent data. I would say that the ewm method
    is something that is constructive and cutting edge, simply because a different focus.
    
    Maybe fitting copulas using ewm method instead of spearman rank correlation could generate some interesting results.


'''

'  \n    Here goes the comparison of the three methods.\n    \n    Method 1: EWM, it generates the least loss, maybe because it assumes a more ideal nora distribution condition. \n             It calcultes the covariance matrix using the EWM method, which is more sensitive to the recent data.\n    Method 2: T-distribution, it generates a result similar to historical simulation.\n    Method 3: Historical simulation, it simple uses history data without any modeling to estimate the var and risk.\n    \n    There are many minor differences between method 2 and method 3, because they are mainly uniform based on the same\n    data scope. Yet the ewm is a lot different because it focuses more on the recent data. I would say that the ewm method\n    is something that is constructive and cutting edge, simply because a different focus.\n    \n    Maybe fitting copulas using ewm method instead of spearman rank correlation could generate some interesting results.\n\n\n'