Problem 1: CAPM

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from scipy.stats import skew, kurtosis
import datetime
import requests, zipfile, io

In [None]:
end_date = datetime.datetime.today()
start_date = end_date - datetime.timedelta(days=3*365)

In [None]:
stock = yf.download("AAPL", start=start_date, end=end_date)
market = yf.download("^GSPC", start=start_date, end=end_date)

In [None]:
stock['Return'] = stock['Close'].pct_change()
market['Return'] = market['Close'].pct_change()

In [None]:
rf_daily = 0.02 / 252

In [None]:
stock['Excess Return'] = stock['Return'] - rf_daily
market['Excess Return'] = market['Return'] - rf_daily

In [None]:
merged_df = pd.concat([stock['Excess Return'], market['Excess Return']], axis=1).dropna()
merged_df.columns = ['Stock Excess Return', 'Market Excess Return']

In [None]:
X = sm.add_constant(merged_df['Market Excess Return'])
y = merged_df['Stock Excess Return']
model = sm.OLS(y, X).fit()
beta = model.params
print(f"CAPM model APPL: beta is {beta}")
print(model.summary())

In [None]:
plt.figure(figsize=(10,6))
plt.scatter(df['Market Excess Return'], df['Stock Excess Return'], alpha=0.5, label='Data Points')
x_vals = np.linspace(df['Market Excess Return'].min(), df['Market Excess Return'].max(), 100)
y_vals = capm_model.params[0] + capm_model.params[1] * x_vals
plt.plot(x_vals, y_vals, color='red', label='Regression Line')
plt.xlabel('Market Excess Return')
plt.ylabel('Stock Excess Return')
plt.title('CAPM: AAPL vs. Market Excess Returns')
plt.legend()
plt.show()

In [None]:
if model.params[1] > 1:
    print("AAPL is more volatile compared to the market.")
else:
    print("AAPL is less volatile compared to the market.")

ANALYSIS: AAPL is more volatile compared to the market, which suggests that its results are more susceptible to changes in the market than the market overall. This indicates that AAPL's excess return tends to fluctuate by more than 1% for every 1% change in the market's excess return, indicating increased volatility. Rapid innovation cycles, market sentiment toward technology equities, and the ever-changing competitive environment in which AAPL works are some of the reasons for this increased sensitivity. As a result, even while the market fluctuates somewhat, AAPL's returns show pronounced swings, highlighting its increased intrinsic risk in comparison to the overall market.

Problem 2: Farma French Three Factor Model

In [None]:
ff_url = "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily_CSV.zip"
response = requests.get(ff_url)
z = zipfile.ZipFile(io.BytesIO(response.content))
csv_filename = [f for f in z.namelist() if f.lower().endswith('.csv')][0]
print("Extracting file:", csv_filename)

In [None]:
ff_df = pd.read_csv(z.open(csv_filename), skiprows=3)
ff_df = ff_df[ff_df.iloc[:, 0].apply(lambda x: str(x).strip().isdigit())]
ff_df.rename(columns={ff_df.columns[0]: 'Date'}, inplace=True)
ff_df['Date'] = pd.to_datetime(ff_df['Date'], format='%Y%m%d')

In [None]:

for col in ['Mkt-RF', 'SMB', 'HML', 'RF']:
    ff_df[col] = pd.to_numeric(ff_df[col], errors='coerce') / 100

In [None]:
ff_df = ff_df[(ff_df['Date'] >= start_date) & (ff_df['Date'] <= end_date)]
ff_df = ff_df.reset_index(drop=True)

In [None]:
aapl = yf.download("AAPL", start=start_date, end=end_date)
aapl['Return'] = aapl['Close'].pct_change()

In [None]:
aapl = aapl.reset_index()

In [None]:
if isinstance(aapl.columns, pd.MultiIndex):
    aapl.columns = aapl.columns.get_level_values(0)

In [None]:
aapl['Date'] = pd.to_datetime(aapl['Date'])
aapl['Date'] = aapl['Date'].dt.normalize()

In [None]:
if isinstance(ff_df.columns, pd.MultiIndex):
    ff_df.columns = ff_df.columns.get_level_values(0)
ff_df['Date'] = pd.to_datetime(ff_df['Date'])
ff_df['Date'] = ff_df['Date'].dt.normalize()

In [None]:
merged_ff = pd.merge(aapl[['Date', 'Return']], ff_df, on='Date', how='inner')
merged_ff = merged_ff.dropna()
merged_ff.set_index('Date', inplace=True)

In [None]:
merged_ff['Excess Return'] = merged_ff['Return'] - merged_ff['RF']

In [None]:
X_ff = merged_ff[['Mkt-RF', 'SMB', 'HML']]
X_ff = sm.add_constant(X_ff)
y_ff = merged_ff['Excess Return']

In [None]:
ff_model = sm.OLS(y_ff, X_ff).fit()


In [None]:
ff_model = sm.OLS(y_ff, X_ff).fit()

print("\nFama-French Three-Factor Model Regression Results:")
print(ff_model.summary())

In [None]:
coefficients = ff_model.params
r_squared = ff_model.rsquared

In [None]:
print("\nRegression Coefficients:")
print(coefficients)
print("\nR-squared:")
print(r_squared)

ANALYSIS: The CAPM model explains about 58.3% of the variation in AAPL's excess returns, while the Fama‑French model explains 62.6%, indicating that including the SMB and HML factors provides an additional 4.3% of explanatory power. The negative SMB coefficient (-0.265735) indicates that AAPL's excess return tends to decline when small-cap stocks perform better than large-cap stocks, confirming that AAPL acts like a large-cap company. In a similar vein, the negative HML coefficient (-0.330) shows that AAPL's return decreases when value firms (with high book-to-market ratios) beat growth stocks. This is consistent with AAPL's traits as a company focused on growth. All things considered, these findings suggest that although market fluctuations are the main source of AAPL's returns, size and value considerations offer some more context, with AAPL's large-cap and growth traits clearly seen in its negative exposure to both SMB and HML.

Problem 3: Clustering Stocks

In [None]:
end_date = datetime.datetime.today()
start_date = end_date - datetime.timedelta(days=1095)

In [None]:
stocks = ["AAPL", "MSFT", "AMZN", "TSLA", "JPM", "PFE", "KO", "XOM", "NVDA", "META"]

In [None]:
data = yf.download(stocks, start=start_date, end=end_date)['Close']

In [None]:
returns = data.pct_change().dropna()

In [None]:
stats_df = pd.DataFrame(index=stocks, columns=['Mean Return', 'Std Dev', 'Skewness', 'Kurtosis'])
for stock in stocks:
    stats_df.loc[stock, 'Mean Return'] = returns[stock].mean()
    stats_df.loc[stock, 'Std Dev'] = returns[stock].std()
    stats_df.loc[stock, 'Skewness'] = skew(returns[stock])
    stats_df.loc[stock, 'Kurtosis'] = kurtosis(returns[stock])
stats_df = stats_df.astype(float)

In [None]:
print("Summary Statistics for Stocks:")
print(stats_df)

In [None]:
scaler = StandardScaler()
stats_normalized = scaler.fit_transform(stats_df)

In [None]:
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
clusters = kmeans.fit_predict(stats_normalized)
stats_df['Cluster'] = clusters

In [None]:
plt.figure(figsize=(10,6))
colors = ['red', 'green', 'blue']
for cluster in range(3):
    cluster_data = stats_df[stats_df['Cluster'] == cluster]
    plt.scatter(cluster_data['Std Dev'], cluster_data['Mean Return'],
                color=colors[cluster], s=100, label=f'Cluster {cluster}')
    for stock in cluster_data.index:
        plt.text(cluster_data.loc[stock, 'Std Dev'], cluster_data.loc[stock, 'Mean Return'], 
                 stock, fontsize=9)
plt.xlabel('Standard Deviation of Returns')
plt.ylabel('Mean Return')
plt.title('Clustering of Stocks: Mean Return vs. Standard Deviation')
plt.legend()
plt.show()

ANALYSIS: The clustering results suggest distinct groupings based on risk and return characteristics. Cluster 0, which includes NVDA and TSLA, is characterized by the highest standard deviations and relatively high mean returns. According to their profiles as high-growth, tech-driven businesses, this suggests that these two stocks are more erratic and have the potential for larger gains. On the other hand, most of the stocks in Cluster 1, including XOM, JPM, and MSFT, show lower mean returns and lower volatility. This cluster likely represents more mature or stable companies with less aggressive growth profiles. Interestingly, within this cluster, XOM, JPM, and MSFT are particularly close together, suggesting they share similar return and risk dynamics. Cluster 2, which is represented solely by META, falls between the other clusters in terms of both mean return and standard deviation. This suggests that while META shows high return potential and risk, its profile is distinct enough to separate it from both the high-growth stocks of Cluster 0 and the more stable stocks of Cluster 1. Overall, the clustering highlights that NVDA and TSLA are outliers in terms of volatility and return, while the majority of stocks, including the closely grouped XOM, JPM, and MSFT, exhibit more moderate behavior, and META occupies an intermediate position.