# Project submission
**Due Friday May 16th before class.** Counts for 25% of the final course grade.

You should address all the questions relevant to your project.
You will not be graded based on the values of the model performance, but on whether or not you have applied the right methodology: formulated the business model, translated it into a right machine learning approach, analyzed your data, prepared it for modeling, applied at least 5 different machine learning algorithms, as well as neural networks, used cross validation for model tuning, justified your tuning metric, set up the proper machine learning pipeline without data leakage, evaluated your model using all the relevant metrics, interpreted your model and justified all your decisions.

If you have tried different approaches, please include them all, and not just the best one.
If doing some feature engineering has improved your model, also please include all of the steps, not just the most successful ones.

You should submit the notebook with the code, output and explanations. The notebook should be executable and comprehensible.

The points will be deducted for the following reasons:
- data leakage
- unjustified decisions (no discussion on: choice of metric for optimization, blind removal of features, blind removal of outliers...)
- notebook not comprehensible
- notebook with incomplete output
- notebook not executable
- blind copy pasting from ChatGPT, if the copied code is not suitable for the task
- writing your own code (or copy pasting them from outside source) for simple functions that we covered and that already exist in `sklearn` (train test split, plain grid search, encoding of categorical variables,...), as this leads to:
    - convoluted code prone to bugs
    - code that is hard to understand and review
    - waste of data scientist's time if ready-made simple functions exist

Additional points will be awarded for trying and testing different relevant approaches, from exploratory data analysis, to feature engineering, to modeling and evaluation.

There should be one submission per group, but team member evaluation can be submitted per person. If not submitted, the default is that all the team members have contributed equally to the project and should get the same grade.

### Group number:
### Student IDs:
### Project name:

## What business problem are you solving?
- Please state clearly what business problem are you solving. (one sentence)
- Elaborate why is this a relevant problem, and what can you do with the model output to create business value, i.e., how is the model output actionable. (2-3 paragraphs)

## What is the machine learning problem that you are solving?
- Please state clearly what is the ML problem. 
- If applicable state your target.

## Data exploration and preparation 

- How many data instances do you have?
- Do you have duplicates?
- How many features? What type are they?
- If they are categorical, what categories they have, what is their frequency?
- If they are numerical, what is their distribution?
- Do you have outliers, and do you need to do anything about them?
- What is the distribution of the target variable?
- If you have a target, you can also check the relationship between the target and the variables.
- Do you have missing data? If yes, how are you going to handle it?
- Can you use the features in their original form, or do you need to alter them in some way?
- What have you learned about your data? Is there anything that can help you in feature engineering or modeling?


## Feature engineering
Creating good features is probably the most important step in the machine learning process. 
This might involve doing:
- transformations
- aggregating over data points or over time and space, or finding differences (for example: differences between two monthly bills, time difference between two contacts with the client) 
- creating dummy (binary) variables
- discretization

Business insight is very relevant in this process. If it is possible you can also find additional relevant data.

## Modeling
You should implement AT LEAST FIVE approaches we covered, and tune of at least two hyperparameters of each approach.
Do not forget that you should split your data.
You should do model selection and tuning using cross validation on the train set, avoiding data leakage.
Explain and justify what is the metric you are using for model selection and tuning. If your data is imbalanced, consider using techniques for data balancing.

Separately, you should train a neural network. Visualize the training and validation loss. Discuss the network performance

In model selection, make sure when you compare different models and approaches that you compare them on the same dataset, though different transformations could be applied to the comparison dataset.

## Model evaluation

After selecting your final model, which could be a compromise of performance, interpretability and complexity, you should evaluate its performance on the test set. 
You might have tuned your model using a certain metric, but now you should describe the model performance using all relevant metrics. 
If you have some business insight, why a certain metric is relevant, you should explain it. 
Construct a suitable baseline to benchmark your result and to put them in the context.
Discuss your results, do they seem good enough to be used in practice? If not, what should be improved. Discuss what type of errors is your model making.


## Model interpretation

Use at least two different techniques for model interpretability. Discuss what are the most important features of your model, and how they impact the model performance. Pick a few examples of errors that your model is making, and check which features lead to thess errors.

In [182]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_colwidth', None)

In [183]:
df = pd.read_csv('pr13_stocks (1).csv', index_col=0)
df.head()

Unnamed: 0,Date,Dividends,Stock Splits,Brand_Name,Ticker,Industry_Tag,Country,Volume,Open,High,Low,Close
0,2021-01-25 00:00:00-05:00,0.0,0.0,crocs,CROX,footwear,usa,1102500.0,73.18,74.75,71.050003,73.910004
1,2019-09-12 00:00:00-04:00,0.0,0.0,target,TGT,retail,usa,3185700.0,100.810594,101.077753,99.999911,100.359192
2,2015-12-29 00:00:00-05:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1278700.0,33.785507,33.970634,33.700657,33.908924
3,2014-06-13 00:00:00-04:00,0.0,0.0,amd,AMD,technology,usa,17734600.0,4.36,4.39,4.24,4.28
4,2017-10-06 00:00:00-04:00,0.0,0.0,the walt disney company,DIS,entertainment,usa,4360200.0,96.426167,96.734885,95.837672,96.541939


In [184]:
df['Date'] = pd.to_datetime(df['Date'], utc=True)

df

Unnamed: 0,Date,Dividends,Stock Splits,Brand_Name,Ticker,Industry_Tag,Country,Volume,Open,High,Low,Close
0,2021-01-25 05:00:00+00:00,0.0,0.0,crocs,CROX,footwear,usa,1102500.0,73.180000,74.750000,71.050003,73.910004
1,2019-09-12 04:00:00+00:00,0.0,0.0,target,TGT,retail,usa,3185700.0,100.810594,101.077753,99.999911,100.359192
2,2015-12-29 05:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1278700.0,33.785507,33.970634,33.700657,33.908924
3,2014-06-13 04:00:00+00:00,0.0,0.0,amd,AMD,technology,usa,17734600.0,4.360000,4.390000,4.240000,4.280000
4,2017-10-06 04:00:00+00:00,0.0,0.0,the walt disney company,DIS,entertainment,usa,4360200.0,96.426167,96.734885,95.837672,96.541939
...,...,...,...,...,...,...,...,...,...,...,...,...
99995,2004-06-18 04:00:00+00:00,0.0,0.0,fedex,FDX,logistics,usa,957800.0,66.332358,66.936849,65.991798,66.604805
99996,2022-06-24 04:00:00+00:00,0.0,0.0,hershey company,HSY,food & beverage,usa,1154300.0,213.073770,216.005978,212.311397,215.966888
99997,2017-09-06 04:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1079900.0,47.701035,47.904677,47.546265,47.880238
99998,2021-06-04 04:00:00+00:00,0.0,0.0,amazon,AMZN,e-commerce,usa,44994000.0,160.600006,161.050003,159.940506,160.311005


In [185]:
df.isna().sum()

Date             16
Dividends        59
Stock Splits    458
Brand_Name      225
Ticker          175
Industry_Tag     34
Country         100
Volume          852
Open            803
High            664
Low             681
Close           194
dtype: int64

In [186]:
#drop the 16 rows without any date data
df = df.dropna(subset=['Date'])

In [187]:
df

Unnamed: 0,Date,Dividends,Stock Splits,Brand_Name,Ticker,Industry_Tag,Country,Volume,Open,High,Low,Close
0,2021-01-25 05:00:00+00:00,0.0,0.0,crocs,CROX,footwear,usa,1102500.0,73.180000,74.750000,71.050003,73.910004
1,2019-09-12 04:00:00+00:00,0.0,0.0,target,TGT,retail,usa,3185700.0,100.810594,101.077753,99.999911,100.359192
2,2015-12-29 05:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1278700.0,33.785507,33.970634,33.700657,33.908924
3,2014-06-13 04:00:00+00:00,0.0,0.0,amd,AMD,technology,usa,17734600.0,4.360000,4.390000,4.240000,4.280000
4,2017-10-06 04:00:00+00:00,0.0,0.0,the walt disney company,DIS,entertainment,usa,4360200.0,96.426167,96.734885,95.837672,96.541939
...,...,...,...,...,...,...,...,...,...,...,...,...
99995,2004-06-18 04:00:00+00:00,0.0,0.0,fedex,FDX,logistics,usa,957800.0,66.332358,66.936849,65.991798,66.604805
99996,2022-06-24 04:00:00+00:00,0.0,0.0,hershey company,HSY,food & beverage,usa,1154300.0,213.073770,216.005978,212.311397,215.966888
99997,2017-09-06 04:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1079900.0,47.701035,47.904677,47.546265,47.880238
99998,2021-06-04 04:00:00+00:00,0.0,0.0,amazon,AMZN,e-commerce,usa,44994000.0,160.600006,161.050003,159.940506,160.311005


In [188]:
df[df['Brand_Name'].isna() & df['Ticker'].isna()]

Unnamed: 0,Date,Dividends,Stock Splits,Brand_Name,Ticker,Industry_Tag,Country,Volume,Open,High,Low,Close
29955,2000-11-16 05:00:00+00:00,0.0,0.0,,,apparel,usa,17245350.0,4.804951,5.124243,4.804951,5.038579


In [189]:
#drop the row without both Brand_name and Ticker as we cannot say which company this entry refers to
df = df[~(df['Brand_Name'].isna() & df['Ticker'].isna())].copy()

In [190]:
df

Unnamed: 0,Date,Dividends,Stock Splits,Brand_Name,Ticker,Industry_Tag,Country,Volume,Open,High,Low,Close
0,2021-01-25 05:00:00+00:00,0.0,0.0,crocs,CROX,footwear,usa,1102500.0,73.180000,74.750000,71.050003,73.910004
1,2019-09-12 04:00:00+00:00,0.0,0.0,target,TGT,retail,usa,3185700.0,100.810594,101.077753,99.999911,100.359192
2,2015-12-29 05:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1278700.0,33.785507,33.970634,33.700657,33.908924
3,2014-06-13 04:00:00+00:00,0.0,0.0,amd,AMD,technology,usa,17734600.0,4.360000,4.390000,4.240000,4.280000
4,2017-10-06 04:00:00+00:00,0.0,0.0,the walt disney company,DIS,entertainment,usa,4360200.0,96.426167,96.734885,95.837672,96.541939
...,...,...,...,...,...,...,...,...,...,...,...,...
99995,2004-06-18 04:00:00+00:00,0.0,0.0,fedex,FDX,logistics,usa,957800.0,66.332358,66.936849,65.991798,66.604805
99996,2022-06-24 04:00:00+00:00,0.0,0.0,hershey company,HSY,food & beverage,usa,1154300.0,213.073770,216.005978,212.311397,215.966888
99997,2017-09-06 04:00:00+00:00,0.0,0.0,unilever,UL,consumer goods,netherlands,1079900.0,47.701035,47.904677,47.546265,47.880238
99998,2021-06-04 04:00:00+00:00,0.0,0.0,amazon,AMZN,e-commerce,usa,44994000.0,160.600006,161.050003,159.940506,160.311005


In [191]:
df['Brand_Name'] = df['Brand_Name'].str.lower()
df['Ticker'] = df['Ticker'].str.upper()

brand_to_ticker = df.dropna(subset=['Brand_Name', 'Ticker'])\
                        .drop_duplicates(subset=['Brand_Name'])\
                        .set_index('Brand_Name')['Ticker'].to_dict()

ticker_to_brand = df.dropna(subset=['Brand_Name', 'Ticker'])\
                        .drop_duplicates(subset=['Ticker'])\
                        .set_index('Ticker')['Brand_Name'].to_dict()

    # Fill missing Ticker using Brand_Name
df.loc[df['Ticker'].isna() & df['Brand_Name'].notna(), 'Ticker'] = (
        df.loc[df['Ticker'].isna() & df['Brand_Name'].notna(), 'Brand_Name']
        .map(brand_to_ticker)
    )

    # Fill missing Brand_Name using Ticker
df.loc[df['Brand_Name'].isna() & df['Ticker'].notna(), 'Brand_Name'] = (
    df.loc[df['Brand_Name'].isna() & df['Ticker'].notna(), 'Ticker']
    .map(ticker_to_brand))

In [192]:
df.isna().sum() 

Date              0
Dividends        59
Stock Splits    458
Brand_Name        0
Ticker            0
Industry_Tag     34
Country         100
Volume          852
Open            803
High            664
Low             681
Close           194
dtype: int64

In [193]:
#As dividend payments and stock splits are sparse, event-based features, missing values here are likely because 
#no such event has occured. Therefore, missing values are filled with 0. Also, the occurence of missing values 
#is relatively low here(59 and 458, respectively) compared to the overall dataset size(~100,000)
df['Dividends'] = df['Dividends'].fillna(0.0)
df['Stock Splits'] = df['Stock Splits'].fillna(0.0)

In [194]:
df.isna().sum()

Date              0
Dividends         0
Stock Splits      0
Brand_Name        0
Ticker            0
Industry_Tag     34
Country         100
Volume          852
Open            803
High            664
Low             681
Close           194
dtype: int64

In [195]:
df['Industry_Tag'].value_counts()

Industry_Tag
technology            21573
retail                 8600
automotive             8047
finance                7146
apparel                6708
consumer goods         6378
food & beverage        6371
entertainment          4095
food                   3836
footwear               3748
aviation               3583
gaming                 3575
hospitality            3310
e-commerce             2797
healthcare             2153
logistics              2106
manufacturing          2105
luxury goods           1597
financial services      726
music                   483
social media            396
fitness                 376
cryptocurrency          220
techcology                1
fooT                      1
technRlogy                1
tecanology                1
tecbnology                1
financial Tervices        1
fgnance                   1
retasl                    1
e-commVrce                1
healthcarE                1
axparel                   1
ghming                    1
tRchnol

In [196]:
#Imputing industry tag values with the most frequent value for each brand

df['Industry_Tag'] = df['Industry_Tag'].str.strip().str.lower()
industry_map = (
        df.dropna(subset=['Industry_Tag'])
          .groupby('Brand_Name')['Industry_Tag']
          .agg(lambda x: x.mode()[0]) 
          .to_dict()
    )

df['Industry_Tag'] = df.apply(
        lambda row: industry_map[row['Brand_Name']]
        if pd.isna(row['Industry_Tag']) or row['Industry_Tag'] != industry_map.get(row['Brand_Name'])
        else row['Industry_Tag'],
        axis=1
    )

In [197]:
df['Industry_Tag'].value_counts()

Industry_Tag
technology            21587
retail                 8606
automotive             8054
finance                7150
apparel                6710
consumer goods         6381
food & beverage        6372
entertainment          4096
food                   3838
footwear               3748
aviation               3584
gaming                 3577
hospitality            3310
e-commerce             2803
healthcare             2155
logistics              2106
manufacturing          2106
luxury goods           1597
financial services      728
music                   483
social media            396
fitness                 376
cryptocurrency          220
Name: count, dtype: int64

In [198]:
#same for country

df['Country'] = df['Country'].str.strip()

country_map = (
        df.dropna(subset=['Country'])
          .groupby('Brand_Name')['Country']
          .agg(lambda x: x.mode()[0]) 
          .to_dict()
    )

df['Country'] = df.apply(
        lambda row: country_map[row['Brand_Name']]
        if pd.isna(row['Country']) or row['Country'] != country_map.get(row['Brand_Name'])
        else row['Country'],
        axis=1
    )

In [199]:
print(df['Country'].value_counts())
print(df.isna().sum())
print(f"Number of duplicates: {df.duplicated().sum()}")

Country
usa            78572
japan           6413
germany         5082
netherlands     4263
france          2823
switzerland     2132
canada           698
Name: count, dtype: int64
Date              0
Dividends         0
Stock Splits      0
Brand_Name        0
Ticker            0
Industry_Tag      0
Country           0
Volume          852
Open            803
High            664
Low             681
Close           194
dtype: int64
Number of duplicates: 0


In [200]:
print(df.shape)
df = df.drop(columns=['Open', 'High', 'Low','Brand_Name'])
df = df.dropna(subset=['Close'])
print(df.shape)

(99983, 12)
(99789, 8)


Delete Open, High and Low since this data is same day date and not lagged --> otherwise data leakage
Delete Brand_Name because it contains the same info as Ticker
Delete 194 with no Data since it is only 194/100k values and is spread between dates/companies/strategies

In [201]:
df['Date'] = pd.to_datetime(df['Date']) ### was machen wir mit den Stunden???

# Time-based features
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['DayOfWeek'] = df['Date'].dt.dayofweek  # 0 = Monday, 6 = Sunday

# Running day number
df['Day_Number'] = (df['Date'] - df['Date'].min()).dt.days
df = df.sort_values(by='Date').reset_index(drop=True)
df

Unnamed: 0,Date,Dividends,Stock Splits,Ticker,Industry_Tag,Country,Volume,Close,Year,Month,Day,DayOfWeek,Day_Number
0,2000-01-03 05:00:00+00:00,0.0,0.0,TM,automotive,japan,21200.0,72.034958,2000,1,3,0,0
1,2000-01-03 05:00:00+00:00,0.0,0.0,KO,food & beverage,usa,10997000.0,14.781797,2000,1,3,0,0
2,2000-01-03 05:00:00+00:00,0.0,0.0,CSCO,technology,usa,53076000.0,37.496925,2000,1,3,0,0
3,2000-01-03 05:00:00+00:00,0.0,0.0,FL,footwear,usa,240000.0,3.957437,2000,1,3,0,0
4,2000-01-03 05:00:00+00:00,0.0,0.0,HD,retail,usa,12030800.0,40.083256,2000,1,3,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
99784,2023-09-20 04:00:00+00:00,0.0,0.0,TSLA,automotive,usa,122514600.0,262.589996,2023,9,20,2,8660
99785,2023-09-20 04:00:00+00:00,0.0,0.0,PMMAF,apparel,germany,0.0,63.369999,2023,9,20,2,8660
99786,2023-09-20 04:00:00+00:00,0.0,0.0,NVDA,technology,usa,36710800.0,422.390015,2023,9,20,2,8660
99787,2023-09-20 04:00:00+00:00,0.0,0.0,LOGI,technology,switzerland,473100.0,71.800003,2023,9,20,2,8660


In [202]:
def add_lags(df, lags):
    for lag in lags:
        df[f'Close_Lag_{lag}'] = df.groupby('Ticker')['Close'].shift(lag)
        df[f'Volume_Lag_{lag}'] = df.groupby('Ticker')['Volume'].shift(lag)
        df[f'Days_Since_Lag_{lag}'] = df.groupby('Ticker')['Date'].diff(lag).dt.days
    return df

In [203]:
def add_weighted_mean_std(df, lookback_periods):
    df = df.sort_values(['Ticker', 'Date']).copy()

    for lookback in lookback_periods:
        weighted_means = []
        weighted_stds = []

        for ticker, group in df.groupby('Ticker'):
            group = group.sort_values('Date')
            closes = group['Close'].reset_index(drop=True)
            dates = group['Date'].reset_index(drop=True)

            n = len(group)
            mean_vals = np.full(n, np.nan)
            std_vals = np.full(n, np.nan)

            for idx in range(n):
                if idx < lookback:
                    continue  # can't calculate for first points

                close_lags = closes.iloc[idx-lookback:idx]
                date_lags = dates.iloc[idx-lookback:idx]
                current_date = dates.iloc[idx]

                days_diff = (current_date - date_lags).dt.days

                if days_diff.sum() == 0:
                    continue  # avoid division by zero
                
                weight = 1/days_diff

                if weight.sum() == 0:
                    continue  # avoid division by zero

                if pd.isna(close_lags).any():
                    continue  # Skip iteration or handle NaNs in the appropriate way

                weighted_mean = (weight * close_lags).sum() / weight.sum()
                mean_vals[idx] = weighted_mean

                weighted_var = (weight * (close_lags - weighted_mean)**2).sum() / weight.sum()
                weighted_std = np.sqrt(weighted_var)
                std_vals[idx] = weighted_std

            weighted_means.append(pd.Series(mean_vals, index=group.index))
            weighted_stds.append(pd.Series(std_vals, index=group.index))

        df[f'Weighted_Mean_{lookback}'] = pd.concat(weighted_means)
        df[f'Weighted_Std_{lookback}'] = pd.concat(weighted_stds)

    return df

In [204]:
def add_simple_mean_std(df, lookback_periods):
    df = df.sort_values(['Ticker', 'Date']).copy()

    for lookback in lookback_periods:
        means = []
        stds = []

        for ticker, group in df.groupby('Ticker'):
            group = group.sort_values('Date')
            closes = group['Close_Lag_1'].reset_index(drop=True)

            mean_vals = closes.rolling(window=lookback, min_periods=lookback).mean()
            std_vals = closes.rolling(window=lookback, min_periods=lookback).std()

            means.append(pd.Series(mean_vals.values, index=group.index))
            stds.append(pd.Series(std_vals.values, index=group.index))

        df[f'Simple_Mean_{lookback}'] = pd.concat(means)
        df[f'Simple_Std_{lookback}'] = pd.concat(stds)

    return df


In [205]:
df = add_lags(df, [1,2,3])

df = add_weighted_mean_std(df,[3,50, 100])

df = add_simple_mean_std(df,[3,50, 100])

df


  weighted_mean = (weight * close_lags).sum() / weight.sum()
  weighted_mean = (weight * close_lags).sum() / weight.sum()
  weighted_mean = (weight * close_lags).sum() / weight.sum()


Unnamed: 0,Date,Dividends,Stock Splits,Ticker,Industry_Tag,Country,Volume,Close,Year,Month,Day,DayOfWeek,Day_Number,Close_Lag_1,Volume_Lag_1,Days_Since_Lag_1,Close_Lag_2,Volume_Lag_2,Days_Since_Lag_2,Close_Lag_3,Volume_Lag_3,Days_Since_Lag_3,Weighted_Mean_3,Weighted_Std_3,Weighted_Mean_50,Weighted_Std_50,Weighted_Mean_100,Weighted_Std_100,Simple_Mean_3,Simple_Std_3,Simple_Mean_50,Simple_Std_50,Simple_Mean_100,Simple_Std_100
8,2000-01-03 05:00:00+00:00,0.0,0.0,AAPL,technology,usa,535796800.0,0.848323,2000,1,3,0,0,,,,,,,,,,,,,,,,,,,,,
22,2000-01-04 05:00:00+00:00,0.0,0.0,AAPL,technology,usa,512377600.0,0.776801,2000,1,4,1,1,0.848323,535796800.0,1.0,,,,,,,,,,,,,,,,,,
25,2000-01-05 05:00:00+00:00,0.0,0.0,AAPL,technology,usa,778321600.0,0.788168,2000,1,5,2,2,0.776801,512377600.0,1.0,0.848323,535796800.0,2.0,,,,,,,,,,,,,,,
50,2000-01-07 05:00:00+00:00,0.0,0.0,AAPL,technology,usa,460734400.0,0.754065,2000,1,7,4,4,0.788168,778321600.0,2.0,0.776801,512377600.0,3.0,0.848323,535796800.0,4.0,0.798552,0.027695,,,,,0.804431,0.038434,,,,
71,2000-01-11 05:00:00+00:00,0.0,0.0,AAPL,technology,usa,441548800.0,0.702910,2000,1,11,1,8,0.754065,460734400.0,4.0,0.788168,778321600.0,6.0,0.776801,512377600.0,7.0,0.770028,0.014953,,,,,0.773011,0.017364,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99405,2023-08-28 04:00:00+00:00,0.0,0.0,ZM,technology,usa,1841300.0,67.570000,2023,8,28,0,8637,67.699997,2454200.0,3.0,65.830002,13083400.0,6.0,65.360001,2315600.0,11.0,66.812563,1.020372,67.895422,2.855357,69.997881,8.683605,66.296666,1.237832,69.4347,4.525669,80.42825,16.853808
99479,2023-09-01 04:00:00+00:00,0.0,0.0,ZM,technology,usa,4024300.0,71.720001,2023,9,1,4,8641,67.570000,1841300.0,4.0,67.699997,2454200.0,7.0,65.830002,13083400.0,10.0,67.254637,0.720929,68.009222,2.838600,70.197776,8.793639,67.033333,1.044140,69.3887,4.532838,80.03395,16.686280
99611,2023-09-08 04:00:00+00:00,0.0,0.0,ZM,technology,usa,3885348.0,72.059998,2023,9,8,4,8648,71.720001,4024300.0,7.0,67.570000,1841300.0,11.0,67.699997,2454200.0,14.0,69.542978,2.042771,68.784276,3.225493,71.247199,9.290619,68.996666,2.359373,69.4037,4.539413,79.67665,16.474492
99696,2023-09-15 04:00:00+00:00,0.0,0.0,ZM,technology,usa,6238300.0,71.110001,2023,9,15,4,8655,72.059998,3885348.0,7.0,71.720001,4024300.0,14.0,67.570000,1841300.0,18.0,71.045587,1.775437,69.282407,3.337904,71.727347,9.221328,70.449999,2.499940,69.4563,4.554935,79.24265,16.089462


In [None]:
pd.set_option('display.max_columns', None)
df[df['Ticker'] == 'FL'][:20]

Unnamed: 0,Date,Dividends,Stock Splits,Ticker,Industry_Tag,Country,Volume,Close,Year,Month,Day,DayOfWeek,Day_Number,Close_Lag_1,Volume_Lag_1,Days_Since_Lag_1,Close_Lag_2,Volume_Lag_2,Days_Since_Lag_2,Close_Lag_3,Volume_Lag_3,Days_Since_Lag_3,Weighted_Mean_3,Weighted_Std_3,Weighted_Mean_50,Weighted_Std_50,Weighted_Mean_100,Weighted_Std_100,Simple_Mean_3,Simple_Std_3,Simple_Mean_50,Simple_Std_50,Simple_Mean_100,Simple_Std_100
3172,2001-01-25 05:00:00+00:00,0.0,0.0,FL,footwear,usa,1500800.0,6.572171,2001,1,25,3,388,6.607511,1629400.0,1.0,5.900821,2222000.0,3.0,6.077493,3219300.0,6.0,6.391578,0.309138,6.927219,0.810857,,,6.195275,0.367773,7.574959,0.927493,,
3274,2001-02-07 05:00:00+00:00,0.0,0.0,FL,footwear,usa,1760100.0,7.298642,2001,2,7,2,401,6.572171,1500800.0,13.0,6.607511,1629400.0,14.0,5.900821,2222000.0,16.0,6.385143,0.31471,7.344043,0.901807,,,6.360168,0.398198,7.589093,0.906048,,
3334,2001-02-14 05:00:00+00:00,0.0,0.0,FL,footwear,usa,508000.0,6.891595,2001,2,14,2,408,7.298642,1760100.0,7.0,6.572171,1500800.0,20.0,6.607511,1629400.0,21.0,7.010736,0.348467,7.399514,0.817329,,,6.826108,0.409608,7.612809,0.881813,,
3337,2001-02-15 05:00:00+00:00,0.0,0.0,FL,footwear,usa,714800.0,6.67111,2001,2,15,3,409,6.891595,508000.0,1.0,7.298642,1760100.0,8.0,6.572171,1500800.0,21.0,6.922014,0.1445,7.124765,0.604909,,,6.920803,0.364115,7.634038,0.850232,,
3367,2001-02-20 05:00:00+00:00,0.0,0.0,FL,footwear,usa,237500.0,6.535422,2001,2,20,1,414,6.67111,714800.0,5.0,6.891595,508000.0,6.0,7.298642,1760100.0,13.0,6.862772,0.223203,7.216861,0.746867,,,6.953782,0.318354,7.64803,0.827773,,
3385,2001-02-21 05:00:00+00:00,0.0,0.0,FL,footwear,usa,468700.0,6.388434,2001,2,21,2,415,6.535422,237500.0,1.0,6.67111,714800.0,6.0,6.891595,508000.0,7.0,6.591547,0.11416,6.885681,0.639414,,,6.699376,0.179761,7.657895,0.811014,,
3413,2001-02-23 05:00:00+00:00,0.0,0.0,FL,footwear,usa,500800.0,6.218827,2001,2,23,4,417,6.388434,468700.0,2.0,6.535422,237500.0,3.0,6.67111,714800.0,8.0,6.476431,0.100965,6.876608,0.696505,,,6.531655,0.141376,7.659874,0.807721,,
3473,2001-03-02 05:00:00+00:00,0.0,0.0,FL,footwear,usa,1533200.0,5.992689,2001,3,2,4,424,6.218827,500800.0,7.0,6.388434,468700.0,9.0,6.535422,237500.0,10.0,6.361508,0.13034,7.030398,0.810733,,,6.380894,0.158432,7.661994,0.803707,,
3513,2001-03-07 05:00:00+00:00,0.0,0.0,FL,footwear,usa,1330400.0,6.897249,2001,3,7,2,429,5.992689,1533200.0,5.0,6.218827,500800.0,12.0,6.388434,468700.0,14.0,6.125489,0.160965,6.910187,0.875624,,,6.199983,0.198544,7.645458,0.829487,,
3551,2001-03-12 05:00:00+00:00,0.0,0.0,FL,footwear,usa,,6.902904,2001,3,12,0,434,6.897249,1330400.0,5.0,5.992689,1533200.0,10.0,6.218827,500800.0,17.0,6.533942,0.414114,7.017764,0.794449,,,6.369588,0.470748,7.630759,0.836212,,
