FROM GROUP 1 = Jay Capozzoli, Sufyan Haroon, Noah Severin

Submission Instructions: - Submit a Jupyter Notebook with the complete code and analysis
for all three problems. - Ensure that plots are labeled appropriately, and all assumptions and
interpretations are clearly stated.

1.1 Problem 1: CAPM Model
Objective: Estimate the beta of a stock using the CAPM model and analyze its performance.
1.1.1 Steps:
1. Data Retrieval:
• Use the Yahoo Finance API to download daily adjusted closing prices for the stock of
your choice (e.g., AAPL) and a benchmark index (e.g., S&P 500) for the past 3 years.
2. Excess Returns:
• Download risk-free rate data from a reliable source (e.g., FRED) or use a constant risk-
free rate (e.g., 2% annualized).
• Calculate daily excess returns for both the stock and the index.
3. CAPM Estimation:
• Perform a linear regression with the stock’s excess returns as the dependent variable and
the index’s excess returns as the independent variable.
• Report the beta, alpha, and R-squared values.
4. Analysis:
• Interpret the beta and explain whether the stock is more or less volatile compared to
the market.
• Plot the regression line along with the scatterplot of excess returns.

In [None]:
## PROBLEM 1 Solution Code (JAY)


In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta # Import timedelta
import statsmodels.api as sm
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans


# Define start and end date manually for a fixed range (past 3 years)
end_date = datetime.today()
start_date = end_date - timedelta(days=3*365) # Now timedelta is defined

# Convert to string format for Yahoo Finance API
end_date_str = end_date.strftime('2024-12-31')
start_date_str = start_date.strftime('2022-01-01')

# Download Tesla (TSLA) and S&P 500 (^GSPC) data
tsla_data = yf.download('TSLA', start=start_date_str, end=end_date_str)
sp500_data = yf.download('^GSPC', start=start_date_str, end=end_date_str)

# Instead of searching for various column names, directly use 'Close'
# which is likely present and squeeze to ensure 1-dimensional Series
tsla = tsla_data['Close'].squeeze()
sp500 = sp500_data['Close'].squeeze()

# Combine into a DataFrame
data = pd.DataFrame({'Date': tsla.index, 'TSLA': tsla, 'SP500': sp500})
data.set_index('Date', inplace=True)

# Drop missing values
data.dropna(inplace=True)

# Compute daily returns
data['TSLA_ret'] = data['TSLA'].pct_change()
data['SP500_ret'] = data['SP500'].pct_change()

# Drop missing values (from first NaN row)
data.dropna(inplace=True)

# Assume a constant annual risk-free rate of 2% (converted to daily)
risk_free_rate = 0.02 / 252  # Convert annualized rate to daily

# Compute excess returns
data['TSLA_excess'] = data['TSLA_ret'] - risk_free_rate
data['SP500_excess'] = data['SP500_ret'] - risk_free_rate

# Define independent (market excess return) and dependent (stock excess return) variables
X = sm.add_constant(data['SP500_excess'])  # Add alpha (constant)
y = data['TSLA_excess']

# Run regression
model = sm.OLS(y, X).fit()

# Get beta, alpha, and R-squared
alpha, beta = model.params
r_squared = model.rsquared

# Print results
print(f"Alpha (Intercept): {alpha:.6f}")
print(f"Beta (Market Exposure): {beta:.4f}")
print(f"R-Squared: {r_squared:.4f}")

# Plot regression line and scatterplot
plt.figure(figsize=(8,6))

# Scatterplot of excess returns
plt.scatter(data['SP500_excess'], data['TSLA_excess'], alpha=0.5, label='Excess Returns')

# Regression Line
x_range = np.linspace(data['SP500_excess'].min(), data['SP500_excess'].max(), 100)
y_range = alpha + beta * x_range
plt.plot(x_range, y_range, color='red', label='CAPM Regression Line')

# Labels
plt.xlabel('S&P 500 Excess Return')
plt.ylabel('TSLA Excess Return')
plt.title('CAPM Regression for TSLA ;)')
plt.legend()
plt.grid()
plt.show()

Analysis: Teslas beta = 1.9558. this means tesla is more volatile than the market, suggesting it moves around twice as much as the S&P500. This means higher potential gains, but also can mean bigger losses.

1.2 Problem 2: Fama-French Three-Factor Model
Objective: Extend the analysis to the Fama-French Three-Factor Model.
1.2.1 Steps:
1. Data Retrieval:
• Download the Fama-French daily factors (MKT, SMB, and HML) from Kenneth French’s
website or another reliable source.
2. Excess Returns:
• Use the same stock as in Problem 1 and calculate its daily excess returns.
3. Model Estimation:
1
• Perform a multiple linear regression with the stock’s excess returns as the dependent
variable and the three factors (MKT, SMB, HML) as independent variables.
• Report the coeﬀicients, alpha, and R-squared values.
4. Analysis:
• Compare the R-squared values of the CAPM and Three-Factor Model.
• Interpret the SMB and HML coeﬀicients to discuss size and value effects.

In [None]:
## Problem 2 Solution Code (NOAH)

In [None]:
# Step 1: Fetch Data
start_date = "2021-01-01"
end_date = "2024-12-31"
risk_free_ticker = "^IRX"  # Use the 13-week Treasury yield as a proxy for risk-free rate
market_index_ticker = "^GSPC"  # S&P 500 index
equity_tickers = ["TSLA"]  # Replace with desired stock tickers

# Fetch data
risk_free_data = yf.download(risk_free_ticker, start=start_date, end=end_date)["Close"]
market_data = yf.download(market_index_ticker, start=start_date, end=end_date)["Close"]
stock_data = yf.download(equity_tickers, start=start_date, end=end_date)["Close"]

# Step 2: Prepare Data
# Calculate daily returns
market_returns = market_data.pct_change().dropna()
stock_returns = stock_data.pct_change().dropna()

# Import FF Factors
FF_df = pd.read_csv('F-F_Research_Data_Factors_daily.CSV')
FF_df['Date'] = pd.to_datetime(FF_df['Date'], format='%Y%m%d')
FF_df['Mkt-RF'] = FF_df['Mkt-RF']/100
FF_df['SMB'] = FF_df['SMB']/100
FF_df['HML'] = FF_df['HML']/100
FF_df['RF'] = FF_df['RF']/100


# Merge the FF Factors to Stocks
stock_returns = stock_returns.merge(FF_df, on = 'Date', how = 'inner')

# print minimal and maximum dates
print(stock_returns['Date'].min(), stock_returns['Date'].max())

In [None]:
# Check if is there any missing data
stock_returns[stock_returns['Mkt-RF'].isna()]

In [None]:
# Three factor model
y = stock_returns['TSLA'] - stock_returns['RF']
x = stock_returns[['Mkt-RF', 'SMB', 'HML']]
model = sm.OLS(y, x).fit()
print(model.summary())

#Compare the R-squared values of the CAPM and Three-Factor Model.

The R-Squared value for the Three Factor Model is .361, compared to the R-Squared value of .312 for the CAPM model. This indicates that the Three Factor model outperforms the CAPM model, with the independent variables in this Three Factor Model being able to explain approximately 36% of the variance of the daily returns for $TSLA.

#Interpret the SMB and HML coefficients to discuss size and value effects.

The SMB coefficient represents how Tesla performs relative to the size effect. In this analysis, the SMB coefficient for Tesla was 0.5827. This is an interesting correlation, as one would expect Tesla, a large cap company, to perform worse when small cap companies are outperforming large cap companies. One possible explanation for this strong positive correlation is that Tesla is still a rapidly growing company, and its return patterns may match those of other rapidly growing companies, many of which are small caps. The HML coefficient represents how Tesla performs relative to the value effect. In this case, the -0.7055 HML coefficient indicates a strong negative correlation with the performance of value stocks, showing that Tesla performs better when growth stocks outperform value stocks. This is not surprising as Tesla is one of the premier growth stocks on the market.

1.3 Problem 3: Clustering Stocks
Objective: Use clustering to group stocks based on their historical returns.
1.3.1 Steps:
1. Data Retrieval:
• Select 10 stocks from different sectors (e.g., AAPL, MSFT, AMZN, TSLA, JPM, PFE,
KO, XOM, NVDA, META).
• Download their daily adjusted closing prices for the past 3 years.
2. Feature Engineering:
• Calculate daily returns for each stock.
• Compute summary statistics (e.g., mean return, standard deviation, skewness, kurtosis)
for each stock.
3. Clustering:
• Normalize the summary statistics.
• Use k-means clustering to group the stocks into 3 clusters.
4. Visualization:
• Plot the clusters using a 2D scatterplot (e.g., mean return vs. standard deviation) with
different colors for each cluster.
5. Analysis:
• Interpret the clusters and discuss potential similarities among stocks in the same cluster.

In [None]:
## Problem 3 Solution Code (SUFYAN)

In [None]:
# Data Retrieval
# 10 Stocks From Different Sectors
stocks = ['AAPL', 'MSFT', 'AMZN', 'TSLA', 'JPM', 'PFE', 'KO', 'XOM', 'NVDA', 'META']

# Download daily adjusted closing prices for the past 3 years
data = yf.download(stocks, start="2022-01-01", end="2024-12-31")
adj_close = data['Close']

# Feature Engineering
# Calculate daily returns using the adjusted closing prices
returns = adj_close.pct_change().dropna()

# Compute summary statistics for each stock
summary_stats = pd.DataFrame({
    'Mean Return': returns.mean(),
    'Standard Deviation': returns.std(),
    'Skewness': returns.skew(),
    'Kurtosis': returns.kurt()
})

# Clustering
# Normalize the summary statistics
scaler = StandardScaler()
normalized_stats = scaler.fit_transform(summary_stats)

# Use k-means clustering to group the stocks into 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
summary_stats['Cluster'] = kmeans.fit_predict(normalized_stats)

# Visualization
# Plot the clusters using a 2D scatterplot (mean return vs. standard deviation)
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Mean Return', y='Standard Deviation', hue='Cluster', data=summary_stats, palette='viridis', s=100)
plt.title('Clustering of Stocks based on Historical Returns')
plt.xlabel('Mean Return')
plt.ylabel('Standard Deviation')
for i, txt in enumerate(summary_stats.index):
    plt.annotate(txt, (summary_stats['Mean Return'][i], summary_stats['Standard Deviation'][i]), fontsize=9)
plt.show()

# Analysis
# Display the clusters and summary statistics
print("Summary Statistics for Each Stock:")
print(summary_stats[['Mean Return', 'Standard Deviation', 'Skewness', 'Kurtosis', 'Cluster']])


# Analysis

The clustering placed these 10 stocks into three distinct groups, each with varying risk-return profiles. Cluster 0, which includes KO (Coca-Cola), is a stick with low volatility and negative skewness. This suggests that it is a stable, defensive stock with consistent returns, making it appealing to risk-averse investors. The negative skewness indicates that the stock's returns are more likely to fall below the mean rather than exceed it.

Cluster 1 consists of stocks like NVDA (Nvidia) and TSLA (Tesla), which show higher volatility and positive skewness. These stocks are growth-oriented and tend to have larger price swings, offering return on invest however they also pose a higher risk as well. The positive skewness indicates that these stocks have more potential for upside movement, making them appealing to investor look for high risk and high reward stocks.

Cluster 2 includes stocks such as AAPL (Apple), AMZN (Amazon), JPM (JPMorgan), META (Facebook), MSFT (Microsoft), PFE (Pfizer), and XOM (ExxonMobil). These stocks exhibit moderate volatility and varying degrees of skewness and kurtosis. While they have relatively low to moderate volatility, they show more diverse return distributions. Stocks like META, demonstrate extreme kurtosis, meaning they tend to experience larger, less frequent price swings. This cluster represents a broader range of stocks that balance risk and return, with some stocks showing more consistent patterns and others having potential for larger, more unpredictable moves.

In summary, Cluster 0 represents stable, defensive stocks with consistent, lower-risk returns; Cluster 1 consists of high-volatility growth stocks with greater potential for returns but higher risk; and Cluster 2 is a mix of stocks with moderate risk and varying degrees of return distribution, offering a balance of stability and growth potential with some stocks displaying more unpredictable price movements.