# Task 4: [Strategy] Simple RSI-Based Trading Strategy

What is the total profit (in $thousands) you would have earned by investing $1000 every time a stock was oversold (RSI < 25)?

Goal:

Apply a simple rule-based trading strategy using the Relative Strength Index (RSI) technical indicator to identify oversold signals and calculate profits.

Steps:

## 1. Run the full notebook from Lecture 2 (33 stocks)

- Ensure you can generate the merged DataFrame containing:
        OHLCV data
        Technical indicators
        Macro indicators
- Focus on getting RSI computed using Code Snippets 8 and 9.
- This process is essential and will help during the capstone project.
## 2. Alternative (if tech indicators fail to generate):
Download precomputed data using this snippet:

In [2]:
import gdown
import pandas as pd

file_id = "1grCTCzMZKY5sJRtdbLVCXg8JXA8VPyg-"
gdown.download(f"https://drive.google.com/uc?id={file_id}", "data.parquet", quiet=False)
df = pd.read_parquet("data.parquet", engine="fastparquet")


Downloading...
From (original): https://drive.google.com/uc?id=1grCTCzMZKY5sJRtdbLVCXg8JXA8VPyg-
From (redirected): https://drive.google.com/uc?id=1grCTCzMZKY5sJRtdbLVCXg8JXA8VPyg-&confirm=t&uuid=627243a4-2832-46f6-b4ce-24a790e36ed9
To: /Users/tuantran/Documents/stock-market-analytics/artifacts/Module2/data.parquet
100%|██████████| 130M/130M [00:03<00:00, 42.4MB/s] 


In [3]:
df.to_pickle('dataframe_final.pkl')

In [21]:
print(f'Final stocks_df shape = {df.shape}')

Final stocks_df shape = (229932, 203)


In [22]:
print(df.head(6))

       Open      High       Low   Close_x        Volume  Dividends  \
0  0.054277  0.062259  0.054277  0.059598  1.031789e+09        0.0   
1  0.059598  0.062791  0.059598  0.061726  3.081600e+08        0.0   
2  0.061726  0.063323  0.061726  0.062791  1.331712e+08        0.0   
3  0.062791  0.063323  0.060662  0.061194  6.776640e+07        0.0   
4  0.061194  0.061726  0.059598  0.060130  4.789440e+07        0.0   
5  0.060130  0.060130  0.058001  0.058533  5.843520e+07        0.0   

   Stock Splits Ticker  Year      Month  ...  growth_brent_oil_7d  \
0           0.0   MSFT  1986 1986-03-01  ...                  NaN   
1           0.0   MSFT  1986 1986-03-01  ...                  NaN   
2           0.0   MSFT  1986 1986-03-01  ...                  NaN   
3           0.0   MSFT  1986 1986-03-01  ...                  NaN   
4           0.0   MSFT  1986 1986-03-01  ...                  NaN   
5           0.0   MSFT  1986 1986-03-01  ...                  NaN   

  growth_brent_oil_30d  g

In [23]:
print(df.describe())

                Open           High            Low        Close_x  \
count  229932.000000  229932.000000  229932.000000  229932.000000   
mean      173.752609     175.632427     171.771753     173.699429   
min         0.000000       0.032597       0.030566       0.031283   
25%         9.211445       9.323409       9.091979       9.209488   
50%        35.180027      35.647109      34.722365      35.209103   
75%       142.764649     144.364608     141.074416     142.786453   
max      4493.237606    4509.193819    4430.395122    4471.390137   
std       422.196533     426.346835     417.778376     422.002171   

             Volume      Dividends   Stock Splits           Year  \
count  2.299320e+05  229932.000000  229932.000000  229932.000000   
mean   4.938914e+07       0.010303       0.001398    2009.337861   
min    0.000000e+00       0.000000       0.000000    1972.000000   
25%    1.095200e+06       0.000000       0.000000    2003.000000   
50%    3.684500e+06       0.000000    

In [24]:
print(df.columns.tolist())


['Open', 'High', 'Low', 'Close_x', 'Volume', 'Dividends', 'Stock Splits', 'Ticker', 'Year', 'Month', 'Weekday', 'Date', 'growth_1d', 'growth_3d', 'growth_7d', 'growth_30d', 'growth_90d', 'growth_365d', 'growth_future_30d', 'SMA10', 'SMA20', 'growing_moving_average', 'high_minus_low_relative', 'volatility', 'is_positive_growth_30d_future', 'ticker_type', 'index_x', 'adx', 'adxr', 'apo', 'aroon_1', 'aroon_2', 'aroonosc', 'bop', 'cci', 'cmo', 'dx', 'macd', 'macdsignal', 'macdhist', 'macd_ext', 'macdsignal_ext', 'macdhist_ext', 'macd_fix', 'macdsignal_fix', 'macdhist_fix', 'mfi', 'minus_di', 'mom', 'plus_di', 'dm', 'ppo', 'roc', 'rocp', 'rocr', 'rocr100', 'rsi', 'slowk', 'slowd', 'fastk', 'fastd', 'fastk_rsi', 'fastd_rsi', 'trix', 'ultosc', 'willr', 'index_y', 'ad', 'adosc', 'obv', 'atr', 'natr', 'ht_dcperiod', 'ht_dcphase', 'ht_phasor_inphase', 'ht_phasor_quadrature', 'ht_sine_sine', 'ht_sine_leadsine', 'ht_trendmod', 'avgprice', 'medprice', 'typprice', 'wclprice', 'index', 'cdl2crows', '

In [26]:
print(df['rsi'].describe())



count    229470.000000
mean         52.972321
std          12.085901
min           0.000000
25%          44.535042
50%          53.066628
75%          61.511647
max         100.000000
Name: rsi, dtype: float64


## 3. RSI Strategy Setup:

- RSI is already available in the dataset as a field.
- The threshold for oversold is defined as RSI < 25.
## 4. Filter the dataset by RSI and date:



- RSI is a momentum oscillator used in technical analysis to measure the speed and change of price movements.
- Range: 0 to 100
Interpretation:
- RSI > 70 → Overbought (possible sell signal)
- RSI < 30 → Oversold (possible buy signal)
- RSI around 50 → Neutral

In [27]:
rsi_threshold = 25
selected_df = df[
    (df['rsi'] < rsi_threshold) &
    (df['Date'] >= '2000-01-01') &
    (df['Date'] <= '2025-06-01')
]

## 5. Calculate Net Profit Over 25 Years:

- Total number of trades: 1568

- For each trade, you invest $1000

- Use the 30-day forward return (growth_future_30d) to compute net earnings:

KEY ASSUMPTION
- For each signal, return is measured by growth_future_30d
E.g., if growth_future_30d = 0.10, your $1,000 becomes $1,100
- You compound nothing (i.e., each trade is independent with fresh $1,000)
- You want the final value today of all those trades over ~25 years

In [28]:
net_income = 1000 * (selected_df['growth_future_30d'] - 1).sum()

Final Answer:
What is the net income in $K (i.e., in thousands of dollars) that could be earned using this RSI-based oversold strategy from 2000–2025

In [29]:
print(net_income)

24295.523125248386


In [33]:
# df['date'] = pd.to_datetime(df['date'])  # skip if already datetime

# 2. Filter data between Jan 1, 2000 and Dec 31, 2024
df_filtered = df[(df['Year'] >= 2000) & (df['Year'] <= 2024)]

In [34]:
# 1. Filter trades where RSI is under 25
rsi_signals = df[df['rsi'] < 25].copy()

# 2. Drop rows where forward return is missing
rsi_signals = rsi_signals.dropna(subset=['growth_future_30d'])

# 3. Each trade invests $1,000
initial_investment = 1000

# 4. Calculate earnings per trade
rsi_signals['trade_return'] = initial_investment * (1 + rsi_signals['growth_future_30d'])

# 5. Total earnings: sum of all trades
final_value = rsi_signals['trade_return'].sum()

# 6. Number of trades taken
num_trades = len(rsi_signals)

print(f"Total trades taken: {num_trades}")
print(f"Total capital invested: ${num_trades * initial_investment:,.2f}")
print(f"Total return over 25 years: ${final_value:,.2f}")
print(f"Net gain: ${final_value - num_trades * initial_investment:,.2f}")


Total trades taken: 2001
Total capital invested: $2,001,000.00
Total return over 25 years: $4,036,022.73
Net gain: $2,035,022.73
