# Optimal Portfolio Simulator Refactored Notebook

This notebook demonstrates how to organize, refactor, and document code for an ETF portfolio optimization process. Specifically, it:

1. Loads and processes historical ETF price data.  
2. Calculates performance metrics (e.g., returns, volatility, Sharpe Ratio).  
3. Optimizes a portfolio subject to specified constraints.  
4. Illustrates best practices for code structure, readability, and explanatory markdown.

---

## Table of Contents
1. [Imports and Setup](#imports)  
2. [Helper Functions](#helper-functions)  
3. [Data Loading](#data-loading)  
4. [Data Preprocessing](#data-preprocessing)  
5. [Exploratory Data Analysis](#exploratory-data-analysis)  
6. [Portfolio Optimization](#portfolio-optimization)  
7. [Conclusion](#conclusion)

## 1. Imports and Setup
<a id="imports"></a>

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Optional: If you use PyPortfolioOpt or other libraries, import them similarly
# from pypfopt import EfficientFrontier, risk_models, expected_returns, plotting

# Configure pandas for better display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

# List of tickers for demonstration; replace or adapt with your actual tickers
tickers = [
    "SPY", "IVV", "VOO", "VTI", "QQQ", 
    "VUG", "VEA", "VTV", "BND", "AGG"
]
start_date = "2019-01-01"
end_date   = "2023-01-01"

# Example constraint parameters
max_etfs   = 8
max_weight = 0.20
benchmark_tickers = ["^GSPC", "^IXIC"]  # S&P 500, NASDAQ

## 2. Helper Functions
<a id="helper-functions"></a>

Below are utility functions that make our main notebook cleaner. Each function has a single purpose. By isolating logic into these functions, we avoid repeated code and improve maintainability.

In [None]:
def get_stock_data(tickers: list, start_date: str, end_date: str) -> pd.DataFrame:
    """
    Download historical adjusted close prices for a given list of tickers and date range.
    
    Args:
        tickers (list): A list of ticker symbols (strings) to download.
        start_date (str): Start date in 'YYYY-MM-DD' format.
        end_date (str): End date in 'YYYY-MM-DD' format.
    
    Returns:
        pd.DataFrame: DataFrame of adjusted close prices (columns as tickers, rows as dates).
    """
    data = yf.download(tickers, start=start_date, end=end_date)["Adj Close"]
    return data


def calculate_sharpe_ratio(returns_series: pd.Series, risk_free_rate: float = 0.0) -> float:
    """
    Calculate the Sharpe Ratio for a returns series.

    Args:
        returns_series (pd.Series): A time series of periodic returns (e.g., daily returns).
        risk_free_rate (float): The risk-free rate for the same timeframe. Defaults to 0.0.

    Returns:
        float: The calculated Sharpe Ratio.
    """
    # Convert daily risk-free rate to the same frequency if needed
    # e.g. if risk_free_rate is annual, but returns are daily, adjust accordingly
    mean_return = returns_series.mean()
    return_stdev = returns_series.std()

    # Avoid divide-by-zero errors
    if return_stdev == 0:
        return np.nan

    sharpe_ratio = (mean_return - risk_free_rate) / return_stdev
    return sharpe_ratio


def calculate_periodic_returns(price_data: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate daily (periodic) returns from price data.

    Args:
        price_data (pd.DataFrame): DataFrame of historical prices (one column per asset).

    Returns:
        pd.DataFrame: DataFrame of daily returns.
    """
    returns = price_data.pct_change().dropna()
    return returns

## 3. Data Loading
<a id="data-loading"></a>

In this section, we fetch historical ETF data from Yahoo Finance. We also fetch benchmark data for comparison.

In [None]:
# --- Load ETF data ---
etf_prices = get_stock_data(tickers, start_date, end_date)

# --- Load benchmark data ---
benchmark_data = get_stock_data(benchmark_tickers, start_date, end_date)

# Quick preview
print("ETF Prices (head):")
display(etf_prices.head())

print("Benchmark Prices (head):")
display(benchmark_data.head())

## 4. Data Preprocessing
<a id="data-preprocessing"></a>

For demonstration, let's clean or filter data if needed. In practice, you might:

- Drop tickers with insufficient data.  
- Fill missing values.  
- Align date indexes among multiple datasets.  
- Perform additional transformations (e.g., log prices or monthly aggregation).

In [None]:
# Example: drop rows with all NaNs
etf_prices.dropna(how="all", inplace=True)
benchmark_data.dropna(how="all", inplace=True)

# Ensure aligned date indexes for both DataFrames
etf_prices, benchmark_data = etf_prices.align(benchmark_data, join="inner", axis=0)

print(f"Final ETF data shape: {etf_prices.shape}")
print(f"Final benchmark data shape: {benchmark_data.shape}")

## 5. Exploratory Data Analysis
<a id="exploratory-data-analysis"></a>

Here's where you'd explore descriptive statistics, visualize price trends, etc. For brevity, let's just plot the historical prices of the first few ETFs:

In [None]:
# Simple price plot
etf_prices[["SPY","QQQ","VTI"]].plot(figsize=(12,6), title="Selected ETF Prices")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.grid(True)
plt.show()

# Quick descriptive stats
stats = etf_prices.describe()
print("Price Data Statistics:")
display(stats)

## 6. Portfolio Optimization
<a id="portfolio-optimization"></a>

In your original notebook, you had code for maximizing the Sharpe Ratio, subject to constraints (e.g., max number of ETFs, max weight). This section would house those methods. Below is a pseudo-structure to organize your existing approach.

**Note**: The actual portfolio optimization logic from your original file can be adapted and integrated here.

In [None]:
# Example function skeleton: adapt with your actual logic from the original notebook
def optimize_portfolio(price_data: pd.DataFrame, max_etfs: int, max_weight: float):
    """
    Optimize a portfolio to maximize Sharpe Ratio subject to constraints.
    
    Args:
        price_data (pd.DataFrame): Historical price data (columns = tickers).
        max_etfs (int): Maximum number of ETFs allowed in the portfolio.
        max_weight (float): Maximum weight (fraction) for any single ETF.

    Returns:
        dict: Dictionary of optimal weights {ticker: weight}.
        tuple: Tuple of performance metrics (annualized_return, annualized_volatility, sharpe_ratio).
    """
    # 1. Compute expected returns (example daily average or use PyPortfolioOpt)
    returns_data = calculate_periodic_returns(price_data)
    # ... additional logic, constraints, optimization ...
    # This is just a placeholder
    optimal_weights = {ticker: 1/len(price_data.columns) for ticker in price_data.columns[:max_etfs]}
    performance_metrics = (0.0, 0.0, 0.0)  # placeholder

    return optimal_weights, performance_metrics

# Example usage
example_weights, example_perf = optimize_portfolio(etf_prices, max_etfs, max_weight)
print("Example Portfolio Weights:", example_weights)
print("Example Performance:", example_perf)

## 7. Conclusion
<a id="conclusion"></a>

In this notebook, we demonstrated:

1. How to structure a Jupyter Notebook with clear sections and headings.  
2. How to create helper functions with docstrings to improve maintainability.  
3. Basic data loading, cleaning, and descriptive analysis.  
4. A placeholder for the portfolio optimization logic, which can be swapped with your actual optimization routines.  