In [None]:
# import necessary libs (available in jupyter/scipy-notebook docker image)
import os
import pandas as pd 
import numpy as np
import math
import matplotlib.pyplot as plt 
from matplotlib.ticker import FormatStrFormatter

# define watermark 
def add_watermark(ax, x, y):
    ax.text(ax.get_xlim()[0]+ x,
            ax.get_ylim()[0]+ y,
            "ladydragoncapital",
            alpha=0.3, fontsize=16)


# globals
HOME_DIR = '/home/jovyan/_jupyter'
DATA_DIR = os.path.join(HOME_DIR, 'data')

# read in csv data
index = pd.read_csv(os.path.join(DATA_DIR, 'Historical_Returns_on_Stocks_Bonds_and_Bills_1928_2024.csv'))
cd = pd.read_csv(os.path.join(DATA_DIR, 'CD_data_06182025.csv'))
nsdq = pd.read_csv(os.path.join(DATA_DIR, 'NASDAQ100.csv'))
                 

## Introduction ##

This article explores a structured investment methodology: the Time-Segmentation Strategy (TSS). We will define a framework for dividing capital based on future financial obligations and demonstrate how to construct optimal, goal-oriented portfolios for each segment by developing tailored utility functions.


## Time-Segmentation Strategy ##

A Time-Segmentation Strategy (also known as a "bucket" or "horizon-based" strategy) is an asset allocation approach that matches investments to specific future financial needs. Instead of managing a single portfolio for all goals, an investor divides their capital into separate segments, each corresponding to a distinct time horizon (e.g., short-term, medium-term, long-term). This allows each segment to be invested in a portfolio that optimally balances risk and return for its specific purpose, enhancing the overall plan's robustness.

## A Formal Framework for Segmentation ##

Consider an investor with a known annual cost of living $C$ and total life expense $W$ (including current wealth and future income). The strategy involves partitioning $W$ into $n$ segments. A Segment ($S_i$) is a collection of one or more consecutive years of expenses $Cs$, grouped based on a shared investment horizon and risk tolerance. (Note: This assumes flat expenses; the model can be extended to include inflation.)

$$
W = \sum_{i=1}^n S_i 
$$

where, for example:

+ $S_1$ (Short-Term): covers Year 1-3

+ $S_2$ (Medium-Term): covers Year 4-10

+ $S_3$ (Long-Term): covers the years beyond Year 10


## Utility Functions and Optimal Portfolios per Segments ##

Each segment $S_i$ has a unique investment objective, which is achieved by selecting an optimal portfolio from a curated set of assets. The process to build these portfolios has three steps: defining permissible investment options, creating a utility function if needed, and optimizing the portfolio to maximize utility.

**Step 1, define the permissible investments for each segment.**

Each segment $S_i$ has a filtered set of permissible investment options $O_i$, drawn from a global universe $\mathcal{O}$.

$$
O_i \subseteq \mathcal{O} 
$$

The filtering is governed by a rule $G_i$ based on the segment's goals:

$$
O_i = \{ A \in \mathcal{O} \mid G_i(A) = \text{True} \}
$$

Where:

+ $\mathcal{O}$: The global investment options space
  
+ $G_i$: The requirement for filtering the options for segment $S_i$
  
+ $O_i$: The set of investment options associated with segment $S_i$
  
+ $A$: The individual investment option

**Step 2, define a segment-specific utility function, if needed**

For each segment, if necessary, we define a utility function that quantifies the trade-off between expected return and risk. A common form is the utility function ($U$) of mean ($R$) and standard deviation ($\sigma$):

$$
U_k(R,\sigma): S_k \rightarrow P_k
$$

**Step 3, optimize the portfolio to maximize utility**

The optimal investment portfolio $P_i^*$ (i.e., the asset weights) for the capital in segment $S_i$ is the one that maximizes its utility function $U_i$ over its investment options $O_i$.


$$
P_i^* = \arg\max_{P} U_i[R(P), \sigma(P)]
$$

## Example ##

Assume an investor has an annual cost of living $C = 8,000$. We define a three-segment strategy for the total life expense $W$:

+ Segment 1 ($S_1$): Short-Term (Years 1-3). Capital: $S_1 = 3C$

+ Segment 2 ($S_2$): Medium-Term (Years 4-10). Capital: $S_2 = 7C$ 

+ Segment 3 ($S_3$): Long-Term (Year 11+). Capital: $S_3 = nC$

**Investment options**

Based on the writer's current knowledge and experience, $\mathcal{O}$ consists the following options: 

+ certificates of deposit (CDs) (China)
  
+ SP (S&P) 500 index fund ETF (USA)
  
+ NASDAQUE 100 index fund ETF (USA)
  
+ Hushen 300 index fund ETF (China)
  
+ CSI dividend low-volatility index ETF (China)
  
+ Hong Kong dividend low-volatility index fund ETF (China)
  
+ DAX 40 index fund ETF (Germany)

+ Cash
  
**Step 1, build Segment 1's portfolio**.

Goal: Capital preservation and high liquidity (funds needed in 1–3 years; no tolerance for loss).

**Step 1.1, define Permissible Investments**

Filter ($G_1$): 

$$
G_1: M_{1y} - D_{1y} > 0
$$

Where: 

+ $M_{1y}$: average annual growth rate of the investment option, including interest, dividend, and/or appreciation.
  
+ $D_{1y}$: annual standard deviation.

Applying this filter, the following options are selected (see Appendix I & II for data on means and standard deviations of all the investment options):

$$
O_1 = \{\text{3-month CD, 6-month CD, 1-year CD, 2-year CD, Cash}\}
$$

**Step 1.2, define Utility Function**

For $S_1$, utility depends almost entirely on liquidity and capital preservation (return is secondary). Thus, $U_1$ simplifies to: "Maximize liquidity (match CD maturity to expense timing) while ensuring no loss of principal."

**Step 1.3, optimize to find $P_1^*$**

Since $O_1$ consists of CDs (fixed maturity, no principal loss) and cash (instant liquidity), $P_1^*$ is designed to match CD maturities to annual expenses:

+ Cash: $0.25C$, to cover Q1 expenses.
  
+ 3-month CD: $0.25C$, to cover Q2 expenses.
  
+ 6-month CD: $0.5C$, to cover Q3 and Q4 expenses.
  
+ 1-year CD: $C$, to cover Year 2 expenses.
  
+ 2-year CD: $C$, to cover Year 3 expenses.


**Step 2, build Segment 2's portfolio**


Goal: Moderate growth with controlled risk. This segment has a longer time horizon than Segment 1 (funds needed in 4–10 years), allowing it to accept more market volatility in pursuit of higher returns.

**Step 2.1, define permissible investments**

Filter ($G_2$): To meet the above objective, assets must have demonstrated positive risk-adjusted returns over a relevant medium-term period. The filter rule $G_2$ requires that an asset's average 3-year return ($M_{3y}$) has exceeded its 3-year standard deviation ($D_{3y}$):

$$
G_2: M_{\text 3y} - D_{\text 3y} > 0
$$

Using the above filter and asset data in Appendix I, the following options are selected:

$$
O_2 = \{ \text{3-year CD}, \text{CSI Dividend Low-Volatility Index}, \text{SP 500 index} \}
$$

**Step 2.2, define utility function**

Based on the actual data of average return rates and standard deviations of $O_2$ (Appendix II), we choose to use linear mean-standard deviation utility function, i.e. You are willing to accept more risk if you are compensated with a linearly increasing amount of return:

$$
U_2 = \gamma R - \lambda \sigma
$$

$$
U_2 = \gamma \sum_{i=1}^N w_i r_i -  \lambda \sqrt{
    \sum_{i=1}^N w_i^2 \sigma_i^2 +
    \sum_{i=1}^N \sum_{j \neq i} w_i w_j \sigma_{ij}
}
$$

Where:

+ $\gamma$: Coefficient for return rate. 

+ $w_i$: Weight of asset i in the portfolio

+ $r_i$: Expected appreciation (growth rate and dividend rate) of asset i

+ $\lambda$: Risk aversion coefficient (higher $\lambda$ means more penalty for risk). 

+ $\sigma_i^2$: variance of asset i (individual risk). $\sigma_i$ is the standard deviation of asset i.

+ $\sigma_{ij}$: covariance between asset i and j (diversification effect)

The coefficients $\gamma = 1$ and $\lambda = 1$ explicitly encode the strategy's preference: return is as important as risk reduction in the utility calculation.

**Step 2.3, optimal portfolio**

Using the asset data in Appendix II, we solve for $w_i$ (weights) that maximize $U_2$:

In [None]:

# Return rates of the four assets
s2 = index[['S&P 500 (includes dividends) %', 'Chinese Dividend Low-Volatility Index Plus Dividend',
            '3-year CD']]
df_s2 = s2[86:]
return_rates = df_s2.mean()
variances = df_s2.var()
cov_matrix = df_s2.cov()  # Full covariance matrix

# Create DataFrame with Asset names as a column
statistics_df = pd.DataFrame({
    'Asset': return_rates.index,  # Add Asset names as a column
    'Expected Return': return_rates.values,
    'Variance': variances.values
}).round(2)

def portfolio_utility(weights, return_rates, variances, cov_matrix, gamma, lambda_, scale_factor=None):
    # Calculate portfolio return and variance (original logic)
    port_return = np.dot(weights, return_rates)
    port_variance = 0
    n_assets = len(weights)

    # Individual variance terms
    for i in range(n_assets):
        port_variance += (weights[i] ** 2) * variances.iloc[i]

    # Covariance terms (avoid double-counting)
    for i in range(n_assets):
        for j in range(i + 1, n_assets):
            port_variance += 2 * weights[i] * weights[j] * cov_matrix.iloc[i, j]

    scaled_variance = port_variance ** 0.5
    # Utility combines original return with scaled variance
    utility = (gamma * port_return) - (lambda_ * scaled_variance)

    return port_return, scaled_variance, utility


# Generate all valid allocations (1% intervals)
allocations = np.arange(0, 1.01, 0.05)
results = []

# Risk - aversion coefficient gamma and lambda_
gamma = 1
lambda_ = 1

for w_sp500 in allocations:
    for w_csi in allocations:
        w_3year_cd = 1 - w_sp500 - w_csi
        if w_3year_cd >= 0:  # Only valid combinations
            weights = np.array([w_sp500, w_csi, w_3year_cd])
            ret, var, util = portfolio_utility(weights, return_rates, variances, cov_matrix, gamma, lambda_)

            results.append({
                'S&P 500 weight': w_sp500,
                'CSI Dividend Low-Volatility Index weight': w_csi,
                '3-year CD weight': w_3year_cd,
                'Expected Return': round(ret, 6),
                'Portfolio Variance': round(var, 6),
                'Utility': round(util, 6)
            })

# Convert to DataFrame
results_df = pd.DataFrame(results)
results_df_sorted = results_df.sort_values(by='Utility', ascending=False)

# Find maximum utility point
max_idx = results_df['Utility'].idxmax()
max_point = results_df.loc[max_idx]

print("Optimal Allocation Details for Segment 2:")
print(f"S&P 500 weight                               {max_point['S&P 500 weight']:.2f}")
print(f"CSI Dividend Low-Volatility Index weight     {max_point['CSI Dividend Low-Volatility Index weight']:.2f}")
print(f"3-year CD weight                             {max_point['3-year CD weight']:.2f}")


**Step 3, build Segment 3's portfolio**

Goal: Maximize long-term growth. This segment has the longest investment horizon (funds needed in Year 11 and beyond), allowing it to fully tolerate short-term market volatility in exchange for the highest expected returns. 

**Step 3.1, define permissible investments**

Filter ($G_2$): To select assets suited for aggressive long-term growth, the filter $G_3$ requires a strong historical track record. An asset is included only if its average 10-year return ($M_{10y}$) has exceeded its 10-year standard deviation ($D_{10y}$):

$$
G_3: M_{\text 10y} - D_{\text 10y} > 0
$$

Using the above filter and the asset data in Appendix I, the following options are selected:

$$
O_3 = \{ \text{SP 500 index}, \text{NASDAQ 100 index}, \text{DAX 40} \}
$$

**Step 3.2, define utility function**

Based on the actual data of average return rates and standard deviations of $O_3$ (Appendix II), we continue to use mean-standard deviation utility function:

$$
U_3 = \gamma R - \lambda \sigma
$$

$$
U_3 = \gamma \sum_{i=1}^N w_i r_i - \lambda \sqrt{
    \sum_{i=1}^N w_i^2 \sigma_i^2 +
    \sum_{i=1}^N \sum_{j \neq i} w_i w_j \sigma_{ij}
}
$$

The coefficients $\gamma = 2$ and $\lambda = 1$ explicitly encode the strategy's preference: return is twice as important as risk reduction in the utility calculation.

**Step 3.3, optimize utility**



In [None]:
# Return rates of the S3 options
s3 = index[['S&P 500 (includes dividends) %', 'NASDAQ100', 
            'DAX 40 Index']]
df_s3 = s3[62:]
return_rates = df_s3.mean()
variances = df_s3.var()
cov_matrix = df_s3.cov()  # Full covariance matrix

def portfolio_utility(weights, return_rates, variances, cov_matrix, gamma, lambda_):
    # Calculate portfolio return
    port_return = np.dot(weights, return_rates)
    
    # Calculate portfolio variance (standard formula)
    port_variance = 0
    n_assets = len(weights)
    
    # Individual variance terms
    for i in range(n_assets):
        port_variance += (weights[i] ** 2) * variances.iloc[i]
    
    # Covariance terms (avoid double-counting)
    for i in range(n_assets):
        for j in range(i + 1, n_assets):
            port_variance += 2 * weights[i] * weights[j] * cov_matrix.iloc[i, j]
    
    # Use standard deviation (square root of variance) for risk measurement
    port_std_dev = port_variance**0.5
    
    # Utility function: return minus risk penalty
    utility = (gamma * port_return) - (lambda_ * port_std_dev)
    
    return port_return, port_std_dev, utility


# Generate all valid allocations (5% intervals to reduce computation time)
allocations = np.arange(0, 1.01, 0.05)
results = []

# Risk-aversion and return preference parameters (adjust based on your tolerance)
gamma = 2 # Weight on return (higher = more focus on returns)
lambda_ = 1  # Weight on risk (higher = more risk-averse)

# Loop through all possible weight combinations for 6 assets
for w_sp500 in allocations:
    for w_csi in allocations:
        for w_nasdaq in allocations:
            # Calculate the weight for the 6th asset (DAX 40 Index)
            w_dax = 1 - w_sp500 - w_nasdaq
            
            # Ensure all weights are non-negative and sum to 1
            if w_dax >= 0 and np.isclose(w_sp500 + w_nasdaq + w_dax, 1):
                weights = np.array([w_sp500, w_nasdaq, w_dax])
                
                # Calculate portfolio metrics
                ret, std_dev, util = portfolio_utility(weights, return_rates, variances, cov_matrix, gamma, lambda_)
                
                # Store results
                results.append({
                    'S&P 500 weight': w_sp500,
                    
                    'NASDAQ100 weight': w_nasdaq,
                    'DAX 40 Index weight': w_dax,
                    'Expected Return': round(ret, 6),
                    'Portfolio Std Dev': round(std_dev, 6),  # Standard deviation (risk)
                    'Utility': round(util, 6)
                })

# Convert results to DataFrame and analyze
results_df = pd.DataFrame(results)
results_df_sorted = results_df.sort_values(by='Portfolio Std Dev', ascending=False)

# Find the optimal portfolio
max_idx = results_df['Utility'].idxmax()
max_point = results_df.loc[max_idx]


print("Optimal Allocation Details for Segment 3:")
print(f"S&P 500 weight                               {max_point['S&P 500 weight']:.2f}")

print(f"NASDAQ100 weight                             {max_point['NASDAQ100 weight']:.2f}")
print(f"DAX 40 Index weight                          {max_point['DAX 40 Index weight']:.2f}")


## Conclusion ##

By grounding decisions in time horizons, risk tolerance, and measurable utility, TSS empowers investors to pursue growth without sacrificing security—ultimately building a more predictable path toward financial stability. The practical example further illustrates TSS’s flexibility: even with a modest initial wealth, the strategy scales to cover decades of expenses by combining current capital with future income, ensuring each phase of an investor’s financial journey is supported. For long-term segments (Years 11+), TSS also leverages the power of compounding by permitting higher allocations to growth assets. 

Moving forward, as we gain knowledge on more investment options, we will continue to expand the investment universe and improve this strategy.


## Appendix I ##

The following figures shows the average return rates and standard deviations of various periods of all the investment options considered in this article. The length of time horizon is determined by the length of existence of the investments.

In [None]:
# calculate the average and stdev of 1-year return rate of S&P 500
one_average_return = index['S&P 500 (includes dividends) %'].mean()
one_std_return = index['S&P 500 (includes dividends) %'].std()

# Calculate the 2-year return rate of S&P 500
two_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()

# Calculate the 3-year return rate of S&P 500
three_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()

# Calculate the 4-year return rate of S&P 500
four_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()

# Calculate the 5-year return rate of S&P 500
five_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()

# Calculate the 10-year return rate of S&P 500
ten_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(10).apply(lambda x: x.prod() - 1) * 100


# Calculate the average and standard deviation of the 3-year return rate
ten_average_return = ten_year_return.mean()
ten_std_return = ten_year_return.std()

# Calculate the 15-year return rate of S&P 500
fift_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(15).apply(lambda x: x.prod() - 1) * 100
# Calculate the average and standard deviation of the 3-year return rate
fift_average_return = fift_year_return.mean()
fift_std_return = fift_year_return.std()

# Calculate the 20-year return rate of S&P 500
twenty_year_return = (1 + index['S&P 500 (includes dividends) %'] / 100).rolling(20).apply(lambda x: x.prod() - 1) * 100
# Calculate the average and standard deviation of the 3-year return rate
twenty_average_return = twenty_year_return.mean()
twenty_std_return = twenty_year_return.std()

# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year', '10-year', '15-year', '20-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return, ten_average_return, fift_average_return, twenty_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return, ten_std_return, fift_std_return, twenty_std_return]    # Rounded to 2 decimals
}
df_sp500 = pd.DataFrame(data)

In [None]:
# plot the means and stdev
fig_count = 1
# set the style to a dark theme
plt.style.use("dark_background")

# match website background
plt.rcParams["figure.facecolor"] = "#181818"
plt.rcParams["axes.facecolor"] = "#181818"
plt.rcParams["axes.edgecolor"] = "#181818"

fig, ax = plt.subplots(figsize=(8, 6))

width = 0.4
ind = np.arange(len(df_sp500['Time Horizon']))  # x-axis positions


# Plot the bars
bars = ax.bar(ind, df_sp500['Average Return Rate'], width, color='lightblue')

# Add error bars (standard deviation)
ax.errorbar(ind, df_sp500['Average Return Rate'], yerr=df_sp500['Standard Deviation'], capsize=5, fmt='none', color='yellow')

# Add labels, title, and legend
ax.set_xticks(ind)
ax.set_xticklabels(df_sp500['Time Horizon'])

# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. American Index Fund S&P 500: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


In [None]:
# Convert the 'observation_date' column to datetime format
nsdq['observation_date'] = pd.to_datetime(nsdq['observation_date'])

# Extract the year from the 'observation_date' column
nsdq['Year'] = nsdq['observation_date'].dt.year

# Interpolate the missing values in the 'NASDAQ100' column
nsdq['NASDAQ100'] = nsdq['NASDAQ100'].interpolate()

# Group by year and get the earliest and latest NASDAQ 100 values each year
grouped = nsdq.groupby('Year')['NASDAQ100'].agg(['first', 'last'])

# Calculate the annual growth rate
grouped['Annual Growth Rate (%)'] = ((grouped['last'] - grouped['first']) / grouped['first'] * 100).round(2)


In [None]:
# calculate the average and stdev of 1-year return rate of NASDAQ 500
one_average_return = grouped['Annual Growth Rate (%)'].mean()
one_std_return = grouped['Annual Growth Rate (%)'].std()

# Calculate the 2-year return rate 
two_year_return = (1 + grouped['Annual Growth Rate (%)'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()

# Calculate the 3-year return rate 
three_year_return = (1 + grouped['Annual Growth Rate (%)'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()

# Calculate the 4-year return rate 
four_year_return = (1 + grouped['Annual Growth Rate (%)'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()

# Calculate the 5-year return rate 
five_year_return = (1 + grouped['Annual Growth Rate (%)'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()

# Calculate the 10-year return rate 
ten_year_return = (1 + grouped['Annual Growth Rate (%)'] / 100).rolling(10).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation 
ten_average_return = ten_year_return.mean()
ten_std_return = ten_year_return.std()


# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year', '10-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return, ten_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return, ten_std_return]    # Rounded to 2 decimals
}
df_nasdaq = pd.DataFrame(data)


In [None]:
# plot the means and stdev of nasdaq
fig, ax = plt.subplots(figsize=(8, 6))

# Create the bar plot with error bars
bars = plt.bar(df_nasdaq['Time Horizon'], df_nasdaq['Average Return Rate'], yerr=df_nasdaq['Standard Deviation'], capsize=5, color='skyblue', ecolor='yellow')

# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. American Index Fund NASDAQ 100: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


In [None]:
# calculate the average and stdev of 1-year return rate of DAX 40
one_average_return = index['DAX 40 Index'].mean()
one_std_return = index['DAX 40 Index'].std()

# Calculate the 2-year return rate of S&P 500
two_year_return = (1 + index['DAX 40 Index'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()

# Calculate the 3-year return rate of S&P 500
three_year_return = (1 + index['DAX 40 Index'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()

# Calculate the 4-year return rate of S&P 500
four_year_return = (1 + index['DAX 40 Index'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()

# Calculate the 5-year return rate of S&P 500
five_year_return = (1 + index['DAX 40 Index'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()

# Calculate the 10-year return rate of S&P 500
ten_year_return = (1 + index['DAX 40 Index'] / 100).rolling(10).apply(lambda x: x.prod() - 1) * 100


# Calculate the average and standard deviation of the 3-year return rate
ten_average_return = ten_year_return.mean()
ten_std_return = ten_year_return.std()


# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year', '10-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return, ten_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return, ten_std_return]    # Rounded to 2 decimals
}
df_dax = pd.DataFrame(data)


In [None]:
# plot the means and stdev of DAX
fig, ax = plt.subplots(figsize=(8, 6))

# Create the bar plot with error bars
bars = plt.bar(df_dax['Time Horizon'], df_dax['Average Return Rate'], yerr=df_dax['Standard Deviation'], capsize=5, color='skyblue', ecolor='yellow')

# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. German Index Fund Dax 40: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


In [None]:
# Exclude the 'Year' column as we only want to analyze CD return rate columns
cd_columns = cd.columns[1:]

# Calculate the mean and standard deviation for each CD type
mean_values = cd[cd_columns].mean()
std_values = cd[cd_columns].std()

statistics_cd = pd.DataFrame({
    'Asset': mean_values.index,  # Add Asset names as a column
    'Expected Annual Return (%)': mean_values.values,
    'Standard Deviation': std_values.values
}).round(2)

# set the style to a dark theme
plt.style.use("dark_background")

# match website background
plt.rcParams["figure.facecolor"] = "#181818"
plt.rcParams["axes.facecolor"] = "#181818"
plt.rcParams["axes.edgecolor"] = "#181818"

fig, ax = plt.subplots(figsize=(8, 6))


# Create the bar plot with error bars
bars = plt.bar(mean_values.index, mean_values, yerr=std_values, capsize=5, color='skyblue', ecolor='yellow')

# Add labels and title
plt.xlabel('CD Type')
plt.ylabel('Return Rate (%)')
# set title
plt.suptitle(
    f"Figure {fig_count}. Mean and Standard Deviation of Different CDs", y=0.0001, fontsize=10
)
fig_count += 1
# Add annotations on top of the bars
for bar in bars:
    height = bar.get_height()
    plt.annotate(f'{height:.2f}', 
                 xy=(bar.get_x() + bar.get_width() / 2, height),
                 xytext=(0, 3),  # 3 points vertical offset
                 textcoords='offset points',
                 ha='center', va='bottom')

plt.show()


In [None]:
# calculate the average and stdev of 1-year return rate of hushen300
hushen = index[77:]
one_average_return = hushen['Hushen300'].mean()
one_std_return = hushen['Hushen300'].std()

# Calculate the 2-year return rate of hushen300
two_year_return = (1 + hushen['Hushen300'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()

# Calculate the 3-year return rate of hushen300
three_year_return = (1 + hushen['Hushen300'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()

# Calculate the 4-year return rate of hushen300
four_year_return = (1 + hushen['Hushen300'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 4-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()

# Calculate the 5-year return rate of hushen300
five_year_return = (1 + hushen['Hushen300'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 5-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()


# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return]    # Rounded to 2 decimals
}
df_hushen = pd.DataFrame(data)


In [None]:
# plot hushen300 average and stdev
fig, ax = plt.subplots(figsize=(8, 6))


bars = plt.bar(df_hushen['Time Horizon'], df_hushen['Average Return Rate'], yerr=df_hushen['Standard Deviation'], capsize=5, color='skyblue', ecolor='yellow')

# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. Chinese Index Fund Hushen 300: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


In [None]:
# Calculate for CSI Dividend Low-vol index with dividend
# calculate the average and stdev of 1-year return rate 
dividend = index[86:]
one_average_return = dividend['Chinese Dividend Low-Volatility Index Plus Dividend'].mean()
one_std_return = dividend['Chinese Dividend Low-Volatility Index Plus Dividend'].std()


# Calculate the 2-year return rate 
two_year_return = (1 + dividend['Chinese Dividend Low-Volatility Index Plus Dividend'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()


# Calculate the 3-year return rate 
three_year_return = (1 + dividend['Chinese Dividend Low-Volatility Index Plus Dividend'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()
three_divi_return = 4.5 * 3

# Calculate the 4-year return rate 
four_year_return = (1 + dividend['Chinese Dividend Low-Volatility Index Plus Dividend'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 4-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()


# Calculate the 5-year return rate 
five_year_return = (1 + dividend['Chinese Dividend Low-Volatility Index Plus Dividend'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 5-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()


# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return]    # Rounded to 2 decimals
    
}
df_withdividend = pd.DataFrame(data)


In [None]:
# Calculate for CSI Dividend Low-vol index with dividend
# calculate the average and stdev of 1-year return rate 
dividend = index[86:]
one_average_return = dividend['CSI Dividend Low-Volatility Index'].mean()
one_std_return = dividend['CSI Dividend Low-Volatility Index'].std()


# Calculate the 2-year return rate 
two_year_return = (1 + dividend['CSI Dividend Low-Volatility Index'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()


# Calculate the 3-year return rate 
three_year_return = (1 + dividend['CSI Dividend Low-Volatility Index'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()
three_divi_return = 4.5 * 3

# Calculate the 4-year return rate 
four_year_return = (1 + dividend['CSI Dividend Low-Volatility Index'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 4-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()


# Calculate the 5-year return rate 
five_year_return = (1 + dividend['CSI Dividend Low-Volatility Index'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 5-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()



# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return]    # Rounded to 2 decimals
    
}
df_csi = pd.DataFrame(data)


In [None]:
# plot CSI w/ dividend and without dividend average and stdev
fig, ax = plt.subplots(figsize=(8, 6))

width = 0.4
ind = np.arange(len(df_withdividend['Time Horizon']))  # x-axis positions
average_return_rate = df_withdividend['Average Return Rate'] 


bars = plt.bar(df_withdividend['Time Horizon'], average_return_rate, yerr=df_withdividend['Standard Deviation'], capsize=5, color='skyblue', ecolor='yellow')


# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. Chinese Dividend Low-Volatility Index: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


In [None]:
# calculate the average and stdev of 1-year return rate of HK dividend
hkdividend = index[89:]
one_average_return = hkdividend['HK  Dividend Low-Volatility Index'].mean()
one_std_return = hkdividend['HK  Dividend Low-Volatility Index'].std()
one_divi_return = 4.5

# Calculate the 2-year return rate 
two_year_return = (1 + hkdividend['HK  Dividend Low-Volatility Index'] / 100).rolling(2).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
two_average_return = two_year_return.mean()
two_std_return = two_year_return.std()
two_divi_return = 4.5 * 2

# Calculate the 3-year return rate 
three_year_return = (1 + hkdividend['HK  Dividend Low-Volatility Index'] / 100).rolling(3).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 3-year return rate
three_average_return = three_year_return.mean()
three_std_return = three_year_return.std()
three_divi_return = 4.5 * 3

# Calculate the 4-year return rate 
four_year_return = (1 + hkdividend['HK  Dividend Low-Volatility Index'] / 100).rolling(4).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 4-year return rate
four_average_return = four_year_return.mean()
four_std_return = four_year_return.std()
four_divi_return = 4.5 * 4

# Calculate the 5-year return rate 
five_year_return = (1 + hkdividend['HK  Dividend Low-Volatility Index'] / 100).rolling(5).apply(lambda x: x.prod() - 1) * 100

# Calculate the average and standard deviation of the 5-year return rate
five_average_return = five_year_return.mean()
five_std_return = five_year_return.std()
five_divi_return = 4.5 * 5


# Data for S&P 500 returns (mean and standard deviation)
data = {
    'Time Horizon': ['1-year', '2-year', '3-year', '4-year', '5-year'],
    'Average Return Rate': [one_average_return, two_average_return, three_average_return, four_average_return, five_average_return],  # Rounded to 2 decimals
    'Standard Deviation': [one_std_return, two_std_return, three_std_return, four_std_return, five_std_return],    # Rounded to 2 decimals
    'Dividend Rate': [one_divi_return, two_divi_return, three_divi_return, four_divi_return, five_divi_return]
}
hk_dividend = pd.DataFrame(data)

In [None]:
# plot hushen300 average and stdev
fig, ax = plt.subplots(figsize=(8, 6))

width = 0.4
ind = np.arange(len(hk_dividend['Time Horizon']))  # x-axis positions
average_return_rate = hk_dividend['Average Return Rate'] + hk_dividend['Dividend Rate']


bars = plt.bar(hk_dividend['Time Horizon'], average_return_rate, yerr=hk_dividend['Standard Deviation'], capsize=5, color='skyblue', ecolor='yellow')


# Customize plot
plt.xlabel('Time Horizon', fontsize=12)
plt.ylabel('Average Return Rate (%)', fontsize=12)
#plt.xlim([-1, 3])
# set title
plt.suptitle(
    f"Figure {fig_count}. Hong Kong Dividend Low-Volatility Index: Average Return Rate with Standard Deviation", y=0.0001, fontsize=10
)
fig_count += 1
plt.show()


## Appendix II ##

Table 1 presents the data of the average annual return rates and standard deviations of all the investment options considered in this article.

In [None]:
# Return rates of investments for s2
df_2 = index[['S&P 500 (includes dividends) %', 'Hushen300',
               'Chinese Dividend Low-Volatility Index Plus Dividend',
           'HK  Dividend Low-Volatility Index Dividend']]
df_s2 = df_2[86:]
return_rates = df_s2.mean()
std_devs = df_s2.std()  # Changed from variance to standard deviation
cov_matrix = df_s2.cov()  # Full covariance matrix remains the same

df_3 = index[[ 'NASDAQ100', 'DAX 40 Index']]
df_s3 = df_3[62:]
return_rates_3 = df_s3.mean()
std_devs_3 = df_s3.std()  # Changed from variance to standard deviation
cov_matrix_3 = df_s3.cov()  # Full covariance matrix remains the same

# Create DataFrame for s2 investments
statistics_df2 = pd.DataFrame({
    'Asset': return_rates.index,  # Add Asset names as a column
    'Expected Annual Return (%)': return_rates.values,
    'Standard Deviation': std_devs.values  # Updated column name
}).round(2)

# Create DataFrame for s3 investments
statistics_df3 = pd.DataFrame({
    'Asset': return_rates_3.index,  # Add Asset names as a column
    'Expected Annual Return (%)': return_rates_3.values,
    'Standard Deviation': std_devs_3.values  # Updated column name
}).round(2)

# Combine them by adding rows (stack them on top of each other)
combined_df = pd.concat([statistics_cd, statistics_df2, statistics_df3], axis=0) # axis=0 is the default

# Reset the index since the old indices are preserved
combined_df = combined_df.reset_index(drop=True)

# MANUALLY WRAP THE LONG ASSET NAMES INTO MULTIPLE LINES
def wrap_text(text, max_line_length=20):
    """Wrap text into multiple lines at approximately max_line_length characters"""
    words = text.split()
    lines = []
    current_line = []
    
    for word in words:
        if len(' '.join(current_line + [word])) <= max_line_length:
            current_line.append(word)
        else:
            lines.append(' '.join(current_line))
            current_line = [word]
    
    if current_line:
        lines.append(' '.join(current_line))
    
    return '\n'.join(lines)

# Apply text wrapping to the Asset column
combined_df['Asset'] = combined_df['Asset'].apply(wrap_text)

# Set dark theme styling
plt.style.use("dark_background")
plt.rcParams["figure.facecolor"] = "#181818"
plt.rcParams["axes.facecolor"] = "#181818"
plt.rcParams["axes.edgecolor"] = "#181818"

# Create the table - make the figure taller to accommodate wrapped text
plt.figure(figsize=(10, 3))  # Increased height from 2 to 3
ax = plt.gca()
ax.axis('off')

# Create the table
table = plt.table(
    cellText=combined_df.values,
    colLabels=combined_df.columns,
    cellLoc='center',
    loc='center',
    colColours=['#40466e'] * len(combined_df.columns)
)

# Style cells and adjust row heights for multi-line text
for (i, j), cell in table.get_celld().items():
    if i == 0:  # Header row
        cell.set_text_props(weight='bold', color='white')
        cell.set_facecolor('#40466e')
        cell.set_height(0.15)  # Header row height
    else:
        cell.set_facecolor('#181818')
        # Increase row height for data rows to accommodate wrapped text
        cell.set_height(0.25)
    cell.set_edgecolor('gray')
    
    # Specifically adjust width of first column to be wider
    if j == 0:  # First column (Asset names)
        cell.set_width(0.5)  # Make first column wider

# Style the table
table.auto_set_font_size(False)
table.set_fontsize(11)  # Slightly smaller font to fit wrapped text
table.scale(1.2, 1.8)  # Increased vertical scale
plt.title('Table 1. Investment Options Statistics', y=-2.55, color='white', pad=20)

plt.show()


## References ##

+ https://baijiahao.baidu.com/s?id=1802034225405933396&wfr=spider&for=pc
+ https://xueqiu.com/8821615579/328936666
+ https://www.nasdaq.com/market-activity/index/ndx/historical?page=252&rows_per_page=10&timeline=y10