# Task 3: [IPO] ‘Fixed Months Holding Strategy’

What is the optimal number of months (1 to 12) to hold a newly IPO'd stock in order to maximize average growth?
(Assume you buy at the close of the first trading day and sell after a fixed number of trading days.)

## Goal:

Investigate whether holding an IPO stock for a fixed number of months after its first trading day produces better returns, using future growth columns.






In [3]:
import pandas as pd

stocks_df = pd.read_pickle('dataframe.pkl')


In [4]:
print(stocks_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26094 entries, 0 to 26093
Data columns (total 30 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   Open                           26094 non-null  float64       
 1   High                           26094 non-null  float64       
 2   Low                            26094 non-null  float64       
 3   Close                          26094 non-null  float64       
 4   Volume                         26094 non-null  int64         
 5   Dividends                      26094 non-null  float64       
 6   Stock Splits                   26094 non-null  float64       
 7   Ticker                         26094 non-null  object        
 8   Sector                         26094 non-null  object        
 9   Industry                       26094 non-null  object        
 10  Year                           26094 non-null  int32         
 11  Month          

## 1. Start from the existing DataFrame from Question 2 (75 tickers from IPOs in the first 5 months of 2024).

Add 12 new columns:
future_growth_1m, future_growth_2m, ..., future_growth_12m
(Assume 1 month = 21 trading days, so growth is calculated over 21, 42, ..., 252 trading days)
This logic is similar to historyPrices['growth_future_30d'] from Code Snippet 7, but extended to longer timeframes.

In [5]:
# Assuming stocks_df already has a 'Close' column and is sorted by date per ticker

for m in range(1, 13):
    days = 21 * m
    col_name = f'future_growth_{m}m'
    stocks_df[col_name] = stocks_df['Close'].shift(-days) / stocks_df['Close']


In [6]:
print(stocks_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26094 entries, 0 to 26093
Data columns (total 42 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   Open                           26094 non-null  float64       
 1   High                           26094 non-null  float64       
 2   Low                            26094 non-null  float64       
 3   Close                          26094 non-null  float64       
 4   Volume                         26094 non-null  int64         
 5   Dividends                      26094 non-null  float64       
 6   Stock Splits                   26094 non-null  float64       
 7   Ticker                         26094 non-null  object        
 8   Sector                         26094 non-null  object        
 9   Industry                       26094 non-null  object        
 10  Year                           26094 non-null  int32         
 11  Month          

## 2. Determine the first trading day (min_date) for each ticker.
This is the earliest date in the data for each stock.

In [7]:
# Ensure 'Date' is datetime type
stocks_df['Date'] = pd.to_datetime(stocks_df['Date'])

# Group by ticker and get earliest date
earliest_dates = stocks_df.groupby('Ticker')['Date'].min().reset_index()

print(earliest_dates)


   Ticker       Date
0     AHR 2024-02-07
1    ALAB 2024-03-20
2    ANRO 2024-02-02
3      AS 2024-02-01
4    AUNA 2024-03-22
..    ...        ...
72   YIBO 2024-01-25
73   YYGH 2024-04-22
74   ZBAO 2024-04-02
75     ZK 2024-05-10
76   ZONE 2024-04-26

[77 rows x 2 columns]


## 3. Join the data:
Perform an inner join between the min_date DataFrame and the future growth data on both ticker and date.
➤ You should end up with 75 records (one per IPO) with all 12 future_growth_... fields populated.

In [8]:
# Step 1: Get earliest trading date per ticker (min_date)
min_date = stocks_df.groupby('Ticker')['Date'].min().reset_index().rename(columns={'Date': 'min_date'})

# Step 2: Merge min_date back with stocks_df on Ticker and Date == min_date
merged = pd.merge(
    stocks_df,
    min_date,
    left_on=['Ticker', 'Date'],
    right_on=['Ticker', 'min_date'],
    how='inner'
)

# Step 3: Drop the redundant 'min_date' column
merged = merged.drop(columns=['min_date'])

# Step 4: Select only the columns you want, including all future_growth_... columns
future_growth_cols = [f'future_growth_{m}m' for m in range(1, 13)]

result = merged[['Ticker', 'Date'] + future_growth_cols]

print(f"Number of records: {len(result)}")  # should be 75
print(result.head())


Number of records: 77
  Ticker       Date  future_growth_1m  future_growth_2m  future_growth_3m  \
0    RAY 2024-05-15          0.839243          0.777778          0.368794   
1    HDL 2024-05-17          0.775236          0.750112          0.673845   
2   JDZG 2024-05-15          0.242998          0.205160          0.122850   
3   NAKA 2024-05-31          0.728477          0.552980          0.397351   
4   RFAI 2024-07-05          1.002191          1.006972          1.007968   

   future_growth_4m  future_growth_5m  future_growth_6m  future_growth_7m  \
0          0.463357          0.397163          0.406619          0.395508   
1          0.643786          0.720502          0.719605          1.255271   
2          0.160442          0.272727          0.199017          0.139066   
3          0.350993          0.341060          0.387417          0.410596   
4          1.014940          1.014940          1.017928          1.022908   

   future_growth_8m  future_growth_9m  future_growth

## 4. Compute descriptive statistics for the resulting DataFrame:
Use .describe() or similar to analyze each of the 12 columns:

- future_growth_1m
- future_growth_2m
...
- future_growth_12m

In [9]:
future_growth_cols = [f'future_growth_{m}m' for m in range(1, 13)]

# Describe statistics for these columns
stats = stocks_df[future_growth_cols].describe()

print(stats)


       future_growth_1m  future_growth_2m  future_growth_3m  future_growth_4m  \
count      26073.000000      26052.000000      26031.000000      26010.000000   
mean          10.534496         17.267833         23.665922         26.121714   
std          343.997991        425.618372        484.630526        473.698450   
min            0.001658          0.001637          0.001453          0.001253   
25%            0.856459          0.760305          0.673591          0.598742   
50%            0.998026          0.986687          0.982424          0.973898   
75%            1.100290          1.164379          1.241441          1.309559   
max        23733.331825      26933.331800      26666.665400      25466.665012   

       future_growth_5m  future_growth_6m  future_growth_7m  future_growth_8m  \
count      25989.000000      25968.000000      25947.000000      25926.000000   
mean          28.458947         29.659655         30.201446         30.072224   
std          481.801807    

## 5. Determine the best holding period:

Find the number of months (1 to 12) where the average (mean) future growth is maximal.
This optimal month shows an uplift of >1% compared to all others.
Still, the average return remains less than 1 (i.e., expected return is less than doubling your investment).

In [10]:
# List of future growth columns
future_growth_cols = [f'future_growth_{m}m' for m in range(1, 13)]

# Calculate the mean of each column
mean_growth = stocks_df[future_growth_cols].mean()

# Find the month with maximal mean growth
max_month = mean_growth.idxmax()  # e.g., 'future_growth_5m'
max_value = mean_growth[max_month]

# Remove the max month and compare difference > 0.01
others = mean_growth.drop(max_month)

# Check if max_value is at least 1% higher than all others
is_uplift = all((max_value - others) > 0.01)

# Check if max average is less than 1
less_than_double = max_value < 1

print(f"Month with max average growth: {max_month}")
print(f"Max average growth value: {max_value:.4f}")
print(f"Is uplift > 1% compared to all others? {'Yes' if is_uplift else 'No'}")
print(f"Is max average growth less than 1? {'Yes' if less_than_double else 'No'}")


Month with max average growth: future_growth_7m
Max average growth value: 30.2014
Is uplift > 1% compared to all others? Yes
Is max average growth less than 1? No
