## **Section 0 — Submission Information**

**Date of submission:** 16 November 2025  
**Course:** QPM 2025–2026 — Assignment 1  

**Group members (alphabetical by last name):**  
- Sumeet **BAINS**  
- Zachary **PRESUTTO**  
- Nikita **RIABOV**  
- Robin **THOMAS**  
- Dongyang **ZHAO**

**Group diversity:**  
Our group is diverse in nationality, gender, prior academic background in finance, and levels of Python programming experience. This diversity allowed us to combine different strengths and perspectives throughout the assignment. 

Fun Fact: Our group represents diversity among 4 nuclear powers :)

**Comments:**  
The notebook is organised clearly by question, and the code is modular, readable, and well-commented. 

 

## Section 1

## Q1.1 – Downloading FAANG data

In [10]:
import pandas as pd
import numpy as np
import yfinance as yf
import sys

pd.set_option("display.max_columns", 20)
pd.set_option("display.float_format", "{:.6f}".format)

faang_tickers = ["META", "AMZN", "AAPL", "NFLX", "GOOG"]
start_date = "2015-01-01"
end_date = "2020-12-31"

# Download prices – Close is already adjusted because auto_adjust=True by default
faang_data = yf.download(
    faang_tickers,
    start=start_date,
    end=end_date,
    auto_adjust=True
)

# We take only the Close level as auto_adjust of prices=True
faang_prices = faang_data["Close"]

faang_prices.head()


[*********************100%***********************]  5 of 5 completed


Ticker,AAPL,AMZN,GOOG,META,NFLX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-01-02,24.237551,15.426,25.990791,77.969345,4.984857
2015-01-05,23.554745,15.1095,25.448996,76.717064,4.731142
2015-01-06,23.556955,14.7645,24.859163,75.683426,4.650142
2015-01-07,23.887281,14.921,24.816576,75.683426,4.674285
2015-01-08,24.805086,15.023,24.894821,77.700989,4.777928


## Q1.2 
Compute the first and second moments of stock returns for each of these stocks (i.e., their means, variances, and covariances).


In [11]:
# Computing daily log returns
faang_logret = np.log(faang_prices / faang_prices.shift(1)).dropna()

# First moment: mean (daily)
faang_mean = faang_logret.mean()

# Second moments: variance (diagonal) and covariance matrix
faang_var = faang_logret.var()
faang_cov = faang_logret.cov()

print("Daily mean log returns:")
display(faang_mean)

print("\nDaily return variances:")
display(faang_var)

print("\nCovariance matrix of daily returns:")
display(faang_cov)

Daily mean log returns:


Ticker
AAPL   0.001114
AMZN   0.001568
GOOG   0.000796
META   0.000824
NFLX   0.001560
dtype: float64


Daily return variances:


Ticker
AAPL   0.000349
AMZN   0.000375
GOOG   0.000286
META   0.000407
NFLX   0.000704
dtype: float64


Covariance matrix of daily returns:


Ticker,AAPL,AMZN,GOOG,META,NFLX
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,0.000349,0.000202,0.000193,0.000217,0.000209
AMZN,0.000202,0.000375,0.000215,0.000236,0.000276
GOOG,0.000193,0.000215,0.000286,0.000229,0.000217
META,0.000217,0.000236,0.000229,0.000407,0.000241
NFLX,0.000209,0.000276,0.000217,0.000241,0.000704


## Q1.3 
Compute the skewness and excess kurtosis for the returns for each of these stocks. Do the daily stock returns have a Normal distribution?


In [12]:
from scipy.stats import skew, kurtosis

faang_skew = faang_logret.apply(lambda x: skew(x, bias=False))
# fisher=True (default) in kurtosis => returns *excess* kurtosis
faang_excess_kurt = faang_logret.apply(lambda x: kurtosis(x, fisher=True, bias=False))

moments_q1 = pd.DataFrame({
    "mean": faang_mean,
    "variance": faang_var,
    "skewness": faang_skew,
    "excess_kurtosis": faang_excess_kurt
})

moments_q1

Unnamed: 0_level_0,mean,variance,skewness,excess_kurtosis
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAPL,0.001114,0.000349,-0.325756,6.871251
AMZN,0.001568,0.000375,0.520067,6.29188
GOOG,0.000796,0.000286,0.224432,9.305913
META,0.000824,0.000407,-0.777739,14.239211
NFLX,0.00156,0.000704,0.346611,6.315921


Based on the skewness and excess kurtosis results, the daily returns of all five FAANG stocks do not follow a Normal distribution. A Normal distribution should have skewness close to 0 and excess kurtosis equal to 0.
However:

Several stocks show negative or positive skewness, indicating asymmetry in returns.

All stocks exhibit large positive excess kurtosis, which means fat tails and more extreme return events than predicted by a Normal model.

Therefore, FAANG daily returns display non-normal behaviour, with heavy tails and skewness, which is consistent with typical financial return data.

In [21]:
from scipy.stats import jarque_bera

jb_res = jarque_bera(faang_logret.dropna(), axis=0)

jb_table = pd.DataFrame(
    {
        "JB_statistic": jb_res.statistic,
        "p_value": jb_res.pvalue
    },
    index=faang_logret.columns
)

print("Jarque–Bera test (per stock):")
display(jb_table)

Jarque–Bera test (per stock):


Unnamed: 0_level_0,JB_statistic,p_value
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
AAPL,2972.178824,0.0
AMZN,2537.38342,0.0
GOOG,5416.994783,0.0
META,12808.699399,0.0
NFLX,2518.570045,0.0


pvlaues for all stock < 0.05, hence null hypothesis of normally distibuted returns is rejected

## Q2.1 Preparation of Data

In [13]:
import pandas as pd
import requests
from io import StringIO

# 1. Download page with browser-like headers (to avoid 403)
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/120.0.0.0 Safari/537.36"
}
html = requests.get(url, headers=headers).text

# 2. Read all tables from the HTML
sp500_tables = pd.read_html(StringIO(html))

# 3. Finding the table that contains the 'Symbol' column
sp500_table = None
for i, tbl in enumerate(sp500_tables):
    print(f"Table {i} columns:", list(tbl.columns))
    if "Symbol" in tbl.columns:
        sp500_table = tbl
        break

# 4. Extract ticker list
sp500_tickers = sp500_table["Symbol"].unique().tolist()

len(sp500_tickers), sp500_tickers[:10]


Table 0 columns: [0, 1]
Table 1 columns: ['Symbol', 'Security', 'GICS Sector', 'GICS Sub-Industry', 'Headquarters Location', 'Date added', 'CIK', 'Founded']


(503, ['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL', 'A'])

In [19]:
start_date = "2000-01-01"
end_date   = "2022-12-31"

# Download daily prices (Close already adjusted because auto_adjust=True)
sp500_data = yf.download(
    sp500_tickers,
    start=start_date,
    end=end_date,
    auto_adjust=True
)

# Take only the 'Close' level (adjusted close prices)
sp500_prices = sp500_data["Close"]

print("Raw price shape:", sp500_prices.shape)
sp500_prices

[*********************100%***********************]  503 of 503 completed

8 Failed downloads:
['SOLS', 'SOLV', 'KVUE', 'Q', 'VLTO', 'GEV']: YFPricesMissingError('possibly delisted; no price data found  (1d 2000-01-01 -> 2022-12-31) (Yahoo error = "Data doesn\'t exist for startDate = 946702800, endDate = 1672462800")')
['BRK.B']: YFTzMissingError('possibly delisted; no timezone found')
['BF.B']: YFPricesMissingError('possibly delisted; no price data found  (1d 2000-01-01 -> 2022-12-31)')


Raw price shape: (5787, 503)


Ticker,A,AAPL,ABBV,ABNB,ABT,ACGL,ACN,ADBE,ADI,ADM,...,WY,WYNN,XEL,XOM,XYL,XYZ,YUM,ZBH,ZBRA,ZTS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-03,43.113319,0.839281,,,8.134049,1.215037,,16.274672,27.737482,6.064443,...,11.024052,,6.628230,17.255524,,,4.545775,,25.027779,
2000-01-04,39.819942,0.768521,,,7.901652,1.208433,,14.909400,26.334293,6.001269,...,10.609915,,6.780847,16.925013,,,4.454404,,24.666668,
2000-01-05,37.349911,0.779767,,,7.887132,1.320692,,15.204173,26.718723,5.906516,...,11.171964,,7.042488,17.847687,,,4.477246,,25.138889,
2000-01-06,35.927773,0.712287,,,8.163102,1.307485,,15.328289,25.988283,5.938099,...,11.694577,,6.977079,18.770370,,,4.439174,,23.777779,
2000-01-07,38.921757,0.746027,,,8.250253,1.380124,,16.072979,26.718723,6.032855,...,11.310016,,6.977079,18.715294,,,4.340186,,23.513889,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-12-23,146.000793,129.900299,146.128906,85.250000,102.049538,60.267948,254.155930,338.450012,156.080734,86.630249,...,28.382822,78.621727,64.013191,98.027626,106.173096,60.889999,122.271072,123.750938,248.220001,141.150131
2022-12-27,146.313858,128.097473,146.030350,83.489998,102.417442,60.496162,253.410858,335.089996,154.518387,87.811325,...,28.364796,82.137886,64.590790,99.389603,107.130997,59.860001,123.219658,124.327240,251.000000,140.704681
2022-12-28,144.885468,124.166771,145.349442,82.489998,101.719383,59.526245,251.319107,328.329987,152.689316,85.714691,...,27.607679,78.008102,64.125526,97.757011,105.408707,59.080002,122.660004,123.067162,246.839996,139.281189
2022-12-29,147.820557,127.683731,145.645081,85.230003,104.058838,60.011208,256.343201,337.579987,156.214111,85.256920,...,28.229593,79.147697,64.579865,98.496643,108.021172,62.919998,123.305023,124.864479,257.529999,143.464523


In [15]:
# Drop columns that are entirely NaN
sp500_prices = sp500_prices.dropna(axis=1, how="all")

# Drop companies with more than 100 missing observations
missing_counts = sp500_prices.isna().sum()
valid_cols = missing_counts[missing_counts <= 100].index
sp500_prices = sp500_prices[valid_cols]

# Drop rows where all remaining tickers are NaN
sp500_prices = sp500_prices.dropna(how="all")

print("Cleaned price shape:", sp500_prices.shape)


Cleaned price shape: (5787, 351)


## Q2.2 – Log returns

In [18]:


sp500_logret = np.log(sp500_prices).diff()

# Drop first NaN row and any columns that still have NaNs
sp500_logret = sp500_logret.dropna(how="all")
sp500_logret = sp500_logret.dropna(axis=1, how="any")

print("Log returns shape:", sp500_logret.shape)
sp500_logret


Log returns shape: (5786, 347)


Ticker,A,AAPL,ABT,ACGL,ADBE,ADI,ADM,ADP,ADSK,AEE,...,WMB,WMT,WRB,WSM,WST,WY,XEL,XOM,YUM,ZBRA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-04,-0.079464,-0.088078,-0.028988,-0.005450,-0.087618,-0.051914,-0.010472,0.000000,-0.049914,0.000000,...,-0.022427,-0.038136,-0.031253,-0.032026,0.010278,-0.038291,0.022765,-0.019340,-0.020305,-0.014533
2000-01-05,-0.064037,0.014527,-0.001840,0.088831,0.019578,0.014493,-0.015915,-0.009662,-0.065064,0.037956,...,0.060017,-0.020619,-0.016000,0.005900,-0.004098,0.051618,0.037859,0.053082,0.005115,0.018963
2000-01-06,-0.038820,-0.090514,0.034392,-0.010050,0.008130,-0.027719,0.005333,0.013261,-0.062859,-0.003732,...,0.021135,0.010854,0.056442,-0.223143,-0.010320,0.045717,-0.009331,0.050406,-0.008539,-0.055665
2000-01-07,0.080043,0.046281,0.010620,0.054067,0.047440,0.027719,0.015831,0.022500,0.112049,0.014842,...,0.026268,0.072845,-0.015362,-0.016683,0.010320,-0.033436,0.000000,-0.002939,-0.022551,-0.011160
2000-01-10,0.058813,-0.017744,-0.007068,0.032944,0.037883,0.083468,0.000000,0.024292,-0.010151,-0.007393,...,-0.018692,-0.018417,-0.009332,0.033091,0.020326,-0.004369,0.000000,-0.014079,0.039558,0.033114
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-12-23,0.001475,-0.002802,0.001387,0.008397,0.005719,0.000549,0.012014,0.007415,0.000319,0.013155,...,0.022766,0.002019,0.006420,0.018724,-0.000553,0.014071,0.012770,0.026102,0.000621,0.002864
2022-12-27,0.002142,-0.013975,0.003599,0.003780,-0.009977,-0.010060,0.013542,0.000373,-0.009988,0.008342,...,0.002697,0.000278,0.006243,-0.026808,0.006782,-0.000635,0.008983,0.013798,0.007728,0.011137
2022-12-28,-0.009811,-0.031166,-0.006839,-0.016163,-0.020380,-0.011908,-0.024166,-0.013281,-0.023848,-0.010019,...,-0.020869,-0.017679,-0.015132,-0.018193,-0.023464,-0.027055,-0.007229,-0.016563,-0.004552,-0.016713
2022-12-29,0.020055,0.027931,0.022739,0.008114,0.027783,0.022822,-0.005355,0.011579,0.033623,0.007024,...,0.007611,0.006068,0.006026,0.013997,0.032924,0.022277,0.007060,0.007538,0.005245,0.042396


# Q2.3 – Annualized mean, volatility, Sharpe ratio


In [20]:

trading_days = 252

# Daily statistics
mu_daily    = sp500_logret.mean()
sigma_daily = sp500_logret.std()

# Annualized
mu_annual    = mu_daily * trading_days
sigma_annual = sigma_daily * np.sqrt(trading_days)

rf_annual = 0.0  # assumption

sharpe_ratio = (mu_annual - rf_annual) / sigma_annual

sp500_stats = pd.DataFrame({
    "mu_annual":    mu_annual,
    "sigma_annual": sigma_annual,
    "Sharpe":       sharpe_ratio
})

# Look at the top 10 Sharpe ratios
sp500_stats.sort_values("Sharpe", ascending=False)


Unnamed: 0_level_0,mu_annual,sigma_annual,Sharpe
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MNST,0.306182,0.451895,0.677550
AZO,0.191228,0.283737,0.673962
TPL,0.258866,0.390458,0.662981
WRB,0.176441,0.268406,0.657364
TSCO,0.242690,0.370072,0.655793
...,...,...,...
AKAM,-0.058267,0.635652,-0.091666
CCL,-0.055425,0.464019,-0.119446
GE,-0.040278,0.336196,-0.119805
C,-0.071155,0.478158,-0.148810


# Q 2.4
Would it make sense to choose portfolio weights based only on the Sharpe ratios of the stocks in your dataset? Explain the reasons for your answer.

No, it would not be sensible to choose portfolio weights based only on the individual Sharpe ratios.
The Sharpe ratio of each stock ignores covariances between stocks, but portfolio risk depends on the full covariance matrix, not just individual volatilities. Maximizing Sharpe per stock would often lead to highly concentrated weights in a few names, with little diversification and high idiosyncratic risk. In addition, Sharpe ratios rely on estimates of expected returns, which have a high noise level and can change over time, so basing weights only on them is unstable. A better approach is to use expected returns together with the covariance matrix (e.g. mean-variance or other portfolio optimization methods) and include practical constraints such as position limits, turnover and liquidity.