# FEATURE SELECTION USING RECURSIVE FEATURE ELIMINATION
AIM: Eliminating features recursively based on outputs from an estimator that assigns some kind of weights to features.

As the name suggests, recursive feature elimination (RFE) works by eliminating features recursively. The elimination is done based on outputs from an estimator that assigns some kind of weights to features. For instance, the weights can be the coefficients of a linear regression or feature importances of a decision tree.
The process starts by training the estimator on the entire dataset. Then, the least important features are pruned. After that, the estimator is trained with the remaining features and the least important features are pruned again. This process is repeated until the desired number of features is reached.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# importing MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
# importing train_test_split
from sklearn.model_selection import train_test_split
# importing warnings
import warnings
# importing LinearRegression
from sklearn.linear_model import LinearRegression
# importing RFE
from sklearn.feature_selection import RFE

In [2]:
pd.pandas.set_option('display.max_columns',None)
warnings.filterwarnings('ignore')

In [3]:
df = pd.read_csv('project_data.csv')

In [4]:
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SP_open,SP_high,SP_low,SP_close,SP_Ajclose,SP_volume,DJ_open,DJ_high,DJ_low,DJ_close,DJ_Ajclose,DJ_volume,EG_open,EG_high,EG_low,EG_close,EG_Ajclose,EG_volume,EU_Price,EU_open,EU_high,EU_low,EU_Trend,OF_Price,OF_Open,OF_High,OF_Low,OF_Volume,OF_Trend,OS_Price,OS_Open,OS_High,OS_Low,OS_Trend,SF_Price,SF_Open,SF_High,SF_Low,SF_Volume,SF_Trend,USB_Price,USB_Open,USB_High,USB_Low,USB_Trend,PLT_Price,PLT_Open,PLT_High,PLT_Low,PLT_Trend,PLD_Price,PLD_Open,PLD_High,PLD_Low,PLD_Trend,RHO_PRICE,USDI_Price,USDI_Open,USDI_High,USDI_Low,USDI_Volume,USDI_Trend,GDX_Open,GDX_High,GDX_Low,GDX_Close,GDX_Adj Close,GDX_Volume,USO_Open,USO_High,USO_Low,USO_Close,USO_Adj Close,USO_Volume
0,2011-12-15,154.740005,154.949997,151.710007,152.330002,152.330002,21521900,123.029999,123.199997,121.989998,122.180000,105.441238,199109200,11825.29004,11967.83984,11825.21973,11868.80957,11868.80957,136930000,74.550003,76.150002,72.150002,72.900002,70.431755,787900,1.3018,1.2982,1.3051,1.2957,1,105.09,104.88,106.50,104.88,14330,1,93.42,94.91,96.00,93.33,0,53604,54248,54248,52316,119440,1,1.911,1.911,1.911,1.911,1,1414.65,1420.30,1423.35,1376.85,0,618.85,614.70,615.00,614.60,1,1425,80.341,80.565,80.630,80.130,22850,0,53.009998,53.139999,51.570000,51.680000,48.973877,20605600,36.900002,36.939999,36.049999,36.130001,36.130001,12616700
1,2011-12-16,154.309998,155.369995,153.899994,155.229996,155.229996,18124300,122.230003,122.949997,121.300003,121.589996,105.597549,220481400,11870.25000,11968.17969,11819.30957,11866.38965,11866.38965,389520000,73.599998,75.099998,73.349998,74.900002,72.364037,896600,1.3035,1.3020,1.3087,1.2997,1,103.35,103.51,104.56,102.46,140080,0,93.79,93.43,94.80,92.53,1,53458,53650,54030,52890,65390,0,1.851,1.851,1.851,1.851,0,1420.25,1414.75,1431.75,1400.70,1,623.65,622.60,623.45,622.30,1,1400,80.249,80.175,80.395,79.935,13150,0,52.500000,53.180000,52.040001,52.680000,49.921513,16285400,36.180000,36.500000,35.730000,36.270000,36.270000,12578800
2,2011-12-19,155.479996,155.860001,154.360001,154.869995,154.869995,12547200,122.059998,122.320000,120.029999,120.290001,104.468536,183903000,11866.54004,11925.87988,11735.19043,11766.25977,11766.25977,135170000,69.099998,69.800003,64.199997,64.699997,62.509384,2096700,1.2995,1.3043,1.3044,1.2981,0,103.64,103.63,104.57,102.37,147880,1,94.09,93.77,94.43,92.55,1,52961,53400,53400,52544,67280,0,1.810,1.810,1.810,1.810,0,1411.10,1422.65,1427.60,1404.60,0,608.80,626.00,630.00,608.60,0,1400,80.207,80.300,80.470,80.125,970,0,52.490002,52.549999,51.029999,51.169998,48.490578,15120200,36.389999,36.450001,35.930000,36.200001,36.200001,7418200
3,2011-12-20,156.820007,157.429993,156.580002,156.979996,156.979996,9136300,122.180000,124.139999,120.370003,123.930000,107.629784,225418100,11769.20996,12117.12988,11768.83008,12103.58008,12103.58008,165180000,66.449997,68.099998,66.000000,67.000000,64.731514,875300,1.3079,1.3003,1.3133,1.2994,1,106.73,104.30,107.27,103.91,170240,1,95.55,96.39,99.70,96.39,1,53487,52795,53575,52595,55130,1,1.927,1.927,1.927,1.927,1,1434.75,1408.95,1436.55,1408.15,1,626.65,622.45,622.45,622.45,1,1400,80.273,80.890,80.940,80.035,22950,1,52.380001,53.250000,52.369999,52.990002,50.215282,11644900,37.299999,37.610001,37.220001,37.560001,37.560001,10041600
4,2011-12-21,156.979996,157.529999,156.130005,157.160004,157.160004,11996100,123.930000,124.360001,122.750000,124.169998,107.838242,194230900,12103.58008,12119.70020,11999.44043,12107.74023,12107.74023,163250000,67.099998,69.400002,66.900002,68.500000,66.180725,837600,1.3045,1.3079,1.3197,1.3024,0,107.71,107.15,108.17,106.16,145090,1,99.01,97.54,99.26,96.81,1,53148,53519,54184,52937,75950,0,1.970,1.970,1.970,1.970,1,1429.05,1434.40,1453.75,1417.65,0,635.90,625.70,641.50,623.80,1,1400,80.350,80.105,80.445,79.550,24140,1,53.150002,53.430000,52.419998,52.959999,50.186852,8724300,37.669998,38.240002,37.520000,38.110001,38.110001,10728000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1713,2018-12-24,119.570000,120.139999,119.570000,120.019997,120.019997,9736400,239.039993,240.839996,234.270004,234.339996,234.339996,147311600,22317.27930,22339.86914,21792.19922,21792.19922,21792.19922,308420000,2.950000,3.050000,2.900000,2.950000,2.950000,655100,1.1400,1.1370,1.1439,1.1350,1,50.47,53.49,54.66,50.36,76220,0,42.27,45.34,45.95,42.22,0,37541,37325,37600,37305,9460,1,2.736,2.826,2.826,2.733,0,788.40,791.85,798.30,788.30,0,1172.80,1163.40,1177.60,1157.90,1,2480,96.007,96.440,96.440,95.870,13930,0,20.700001,21.110001,20.650000,21.090000,21.090000,60507000,9.490000,9.520000,9.280000,9.290000,9.290000,21598200
1714,2018-12-26,120.620003,121.000000,119.570000,119.660004,119.660004,14293500,235.970001,246.179993,233.759995,246.179993,246.179993,218485400,21857.73047,22878.91992,21712.52930,22878.44922,22878.44922,433080000,3.000000,3.050000,2.900000,3.000000,3.000000,746300,1.1353,1.1363,1.1423,1.1342,0,54.47,50.84,55.29,49.93,77000,1,46.39,43.09,46.78,42.34,1,38253,37607,38489,37574,19410,1,2.810,2.751,2.815,2.720,1,799.25,788.75,804.30,788.75,1,1190.10,1176.00,1191.00,1175.50,1,2480,96.568,96.100,96.650,96.020,15660,1,21.350000,21.400000,20.530001,20.620001,20.620001,76365200,9.250000,9.920000,9.230000,9.900000,9.900000,40978800
1715,2018-12-27,120.570000,120.900002,120.139999,120.570000,120.570000,11874400,242.570007,248.289993,238.960007,248.070007,248.070007,186267300,22629.06055,23138.89063,22267.41992,23138.82031,23138.82031,407940000,2.950000,3.000000,2.900000,2.950000,2.950000,744000,1.1430,1.1353,1.1457,1.1349,1,52.16,54.65,54.67,51.94,102590,0,45.23,46.41,46.41,44.20,0,38690,38274,38783,38081,19650,1,2.774,2.803,2.806,2.733,0,795.50,799.40,802.20,785.10,0,1196.00,1190.05,1198.40,1181.50,1,2470,96.001,96.460,96.495,95.935,20520,0,20.840000,21.000000,20.700001,20.969999,20.969999,52393000,9.590000,9.650000,9.370000,9.620000,9.620000,36578700
1716,2018-12-28,120.800003,121.080002,120.720001,121.059998,121.059998,6864700,249.580002,251.399994,246.449997,247.750000,247.750000,153100200,23213.60938,23381.88086,22981.33008,23062.40039,23062.40039,336510000,2.850000,2.950000,2.850000,2.900000,2.900000,1061100,1.1438,1.1429,1.1478,1.1424,1,52.20,53.44,53.80,51.60,17110,1,44.92,45.23,46.02,44.27,0,38706,38749,38880,38587,14200,1,2.716,2.768,2.781,2.713,0,790.25,795.60,800.50,787.75,0,1185.20,1196.00,1204.80,1179.50,0,2460,95.965,96.020,96.050,95.740,14170,0,20.889999,21.020000,20.570000,20.600000,20.600000,49835000,9.540000,9.650000,9.380000,9.530000,9.530000,22803400


In [5]:
df.shape

(1718, 81)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1718 entries, 0 to 1717
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           1718 non-null   object 
 1   Open           1718 non-null   float64
 2   High           1718 non-null   float64
 3   Low            1718 non-null   float64
 4   Close          1718 non-null   float64
 5   Adj Close      1718 non-null   float64
 6   Volume         1718 non-null   int64  
 7   SP_open        1718 non-null   float64
 8   SP_high        1718 non-null   float64
 9   SP_low         1718 non-null   float64
 10  SP_close       1718 non-null   float64
 11  SP_Ajclose     1718 non-null   float64
 12  SP_volume      1718 non-null   int64  
 13  DJ_open        1718 non-null   float64
 14  DJ_high        1718 non-null   float64
 15  DJ_low         1718 non-null   float64
 16  DJ_close       1718 non-null   float64
 17  DJ_Ajclose     1718 non-null   float64
 18  DJ_volum

Since there are no null values, we can easily proceed with feature selection

In [7]:
# Droping date column
df.drop(labels = ['Date'], axis = 1, inplace = True)

In [8]:
df

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,SP_open,SP_high,SP_low,SP_close,SP_Ajclose,SP_volume,DJ_open,DJ_high,DJ_low,DJ_close,DJ_Ajclose,DJ_volume,EG_open,EG_high,EG_low,EG_close,EG_Ajclose,EG_volume,EU_Price,EU_open,EU_high,EU_low,EU_Trend,OF_Price,OF_Open,OF_High,OF_Low,OF_Volume,OF_Trend,OS_Price,OS_Open,OS_High,OS_Low,OS_Trend,SF_Price,SF_Open,SF_High,SF_Low,SF_Volume,SF_Trend,USB_Price,USB_Open,USB_High,USB_Low,USB_Trend,PLT_Price,PLT_Open,PLT_High,PLT_Low,PLT_Trend,PLD_Price,PLD_Open,PLD_High,PLD_Low,PLD_Trend,RHO_PRICE,USDI_Price,USDI_Open,USDI_High,USDI_Low,USDI_Volume,USDI_Trend,GDX_Open,GDX_High,GDX_Low,GDX_Close,GDX_Adj Close,GDX_Volume,USO_Open,USO_High,USO_Low,USO_Close,USO_Adj Close,USO_Volume
0,154.740005,154.949997,151.710007,152.330002,152.330002,21521900,123.029999,123.199997,121.989998,122.180000,105.441238,199109200,11825.29004,11967.83984,11825.21973,11868.80957,11868.80957,136930000,74.550003,76.150002,72.150002,72.900002,70.431755,787900,1.3018,1.2982,1.3051,1.2957,1,105.09,104.88,106.50,104.88,14330,1,93.42,94.91,96.00,93.33,0,53604,54248,54248,52316,119440,1,1.911,1.911,1.911,1.911,1,1414.65,1420.30,1423.35,1376.85,0,618.85,614.70,615.00,614.60,1,1425,80.341,80.565,80.630,80.130,22850,0,53.009998,53.139999,51.570000,51.680000,48.973877,20605600,36.900002,36.939999,36.049999,36.130001,36.130001,12616700
1,154.309998,155.369995,153.899994,155.229996,155.229996,18124300,122.230003,122.949997,121.300003,121.589996,105.597549,220481400,11870.25000,11968.17969,11819.30957,11866.38965,11866.38965,389520000,73.599998,75.099998,73.349998,74.900002,72.364037,896600,1.3035,1.3020,1.3087,1.2997,1,103.35,103.51,104.56,102.46,140080,0,93.79,93.43,94.80,92.53,1,53458,53650,54030,52890,65390,0,1.851,1.851,1.851,1.851,0,1420.25,1414.75,1431.75,1400.70,1,623.65,622.60,623.45,622.30,1,1400,80.249,80.175,80.395,79.935,13150,0,52.500000,53.180000,52.040001,52.680000,49.921513,16285400,36.180000,36.500000,35.730000,36.270000,36.270000,12578800
2,155.479996,155.860001,154.360001,154.869995,154.869995,12547200,122.059998,122.320000,120.029999,120.290001,104.468536,183903000,11866.54004,11925.87988,11735.19043,11766.25977,11766.25977,135170000,69.099998,69.800003,64.199997,64.699997,62.509384,2096700,1.2995,1.3043,1.3044,1.2981,0,103.64,103.63,104.57,102.37,147880,1,94.09,93.77,94.43,92.55,1,52961,53400,53400,52544,67280,0,1.810,1.810,1.810,1.810,0,1411.10,1422.65,1427.60,1404.60,0,608.80,626.00,630.00,608.60,0,1400,80.207,80.300,80.470,80.125,970,0,52.490002,52.549999,51.029999,51.169998,48.490578,15120200,36.389999,36.450001,35.930000,36.200001,36.200001,7418200
3,156.820007,157.429993,156.580002,156.979996,156.979996,9136300,122.180000,124.139999,120.370003,123.930000,107.629784,225418100,11769.20996,12117.12988,11768.83008,12103.58008,12103.58008,165180000,66.449997,68.099998,66.000000,67.000000,64.731514,875300,1.3079,1.3003,1.3133,1.2994,1,106.73,104.30,107.27,103.91,170240,1,95.55,96.39,99.70,96.39,1,53487,52795,53575,52595,55130,1,1.927,1.927,1.927,1.927,1,1434.75,1408.95,1436.55,1408.15,1,626.65,622.45,622.45,622.45,1,1400,80.273,80.890,80.940,80.035,22950,1,52.380001,53.250000,52.369999,52.990002,50.215282,11644900,37.299999,37.610001,37.220001,37.560001,37.560001,10041600
4,156.979996,157.529999,156.130005,157.160004,157.160004,11996100,123.930000,124.360001,122.750000,124.169998,107.838242,194230900,12103.58008,12119.70020,11999.44043,12107.74023,12107.74023,163250000,67.099998,69.400002,66.900002,68.500000,66.180725,837600,1.3045,1.3079,1.3197,1.3024,0,107.71,107.15,108.17,106.16,145090,1,99.01,97.54,99.26,96.81,1,53148,53519,54184,52937,75950,0,1.970,1.970,1.970,1.970,1,1429.05,1434.40,1453.75,1417.65,0,635.90,625.70,641.50,623.80,1,1400,80.350,80.105,80.445,79.550,24140,1,53.150002,53.430000,52.419998,52.959999,50.186852,8724300,37.669998,38.240002,37.520000,38.110001,38.110001,10728000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1713,119.570000,120.139999,119.570000,120.019997,120.019997,9736400,239.039993,240.839996,234.270004,234.339996,234.339996,147311600,22317.27930,22339.86914,21792.19922,21792.19922,21792.19922,308420000,2.950000,3.050000,2.900000,2.950000,2.950000,655100,1.1400,1.1370,1.1439,1.1350,1,50.47,53.49,54.66,50.36,76220,0,42.27,45.34,45.95,42.22,0,37541,37325,37600,37305,9460,1,2.736,2.826,2.826,2.733,0,788.40,791.85,798.30,788.30,0,1172.80,1163.40,1177.60,1157.90,1,2480,96.007,96.440,96.440,95.870,13930,0,20.700001,21.110001,20.650000,21.090000,21.090000,60507000,9.490000,9.520000,9.280000,9.290000,9.290000,21598200
1714,120.620003,121.000000,119.570000,119.660004,119.660004,14293500,235.970001,246.179993,233.759995,246.179993,246.179993,218485400,21857.73047,22878.91992,21712.52930,22878.44922,22878.44922,433080000,3.000000,3.050000,2.900000,3.000000,3.000000,746300,1.1353,1.1363,1.1423,1.1342,0,54.47,50.84,55.29,49.93,77000,1,46.39,43.09,46.78,42.34,1,38253,37607,38489,37574,19410,1,2.810,2.751,2.815,2.720,1,799.25,788.75,804.30,788.75,1,1190.10,1176.00,1191.00,1175.50,1,2480,96.568,96.100,96.650,96.020,15660,1,21.350000,21.400000,20.530001,20.620001,20.620001,76365200,9.250000,9.920000,9.230000,9.900000,9.900000,40978800
1715,120.570000,120.900002,120.139999,120.570000,120.570000,11874400,242.570007,248.289993,238.960007,248.070007,248.070007,186267300,22629.06055,23138.89063,22267.41992,23138.82031,23138.82031,407940000,2.950000,3.000000,2.900000,2.950000,2.950000,744000,1.1430,1.1353,1.1457,1.1349,1,52.16,54.65,54.67,51.94,102590,0,45.23,46.41,46.41,44.20,0,38690,38274,38783,38081,19650,1,2.774,2.803,2.806,2.733,0,795.50,799.40,802.20,785.10,0,1196.00,1190.05,1198.40,1181.50,1,2470,96.001,96.460,96.495,95.935,20520,0,20.840000,21.000000,20.700001,20.969999,20.969999,52393000,9.590000,9.650000,9.370000,9.620000,9.620000,36578700
1716,120.800003,121.080002,120.720001,121.059998,121.059998,6864700,249.580002,251.399994,246.449997,247.750000,247.750000,153100200,23213.60938,23381.88086,22981.33008,23062.40039,23062.40039,336510000,2.850000,2.950000,2.850000,2.900000,2.900000,1061100,1.1438,1.1429,1.1478,1.1424,1,52.20,53.44,53.80,51.60,17110,1,44.92,45.23,46.02,44.27,0,38706,38749,38880,38587,14200,1,2.716,2.768,2.781,2.713,0,790.25,795.60,800.50,787.75,0,1185.20,1196.00,1204.80,1179.50,0,2460,95.965,96.020,96.050,95.740,14170,0,20.889999,21.020000,20.570000,20.600000,20.600000,49835000,9.540000,9.650000,9.380000,9.530000,9.530000,22803400


In [9]:
# normalize the dataset using the MinMaxScaler
scaler = MinMaxScaler()
df2 = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
df2

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,SP_open,SP_high,SP_low,SP_close,SP_Ajclose,SP_volume,DJ_open,DJ_high,DJ_low,DJ_close,DJ_Ajclose,DJ_volume,EG_open,EG_high,EG_low,EG_close,EG_Ajclose,EG_volume,EU_Price,EU_open,EU_high,EU_low,EU_Trend,OF_Price,OF_Open,OF_High,OF_Low,OF_Volume,OF_Trend,OS_Price,OS_Open,OS_High,OS_Low,OS_Trend,SF_Price,SF_Open,SF_High,SF_Low,SF_Volume,SF_Trend,USB_Price,USB_Open,USB_High,USB_Low,USB_Trend,PLT_Price,PLT_Open,PLT_High,PLT_Low,PLT_Trend,PLD_Price,PLD_Open,PLD_High,PLD_Low,PLD_Trend,RHO_PRICE,USDI_Price,USDI_Open,USDI_High,USDI_Low,USDI_Volume,USDI_Trend,GDX_Open,GDX_High,GDX_Low,GDX_Close,GDX_Adj Close,GDX_Volume,USO_Open,USO_High,USO_Low,USO_Close,USO_Adj Close,USO_Volume
0,0.744604,0.738369,0.708213,0.708932,0.708932,0.216899,0.005672,0.005128,0.011410,0.010907,0.005227,0.357232,0.003723,0.002793,0.005980,0.006808,0.006808,0.144065,0.927031,0.937940,0.923507,0.910390,0.899367,0.062991,0.741754,0.731583,0.736430,0.732978,1.0,0.785133,0.783233,0.780231,0.794484,0.005052,1.0,0.798448,0.814096,0.807724,0.809329,0.0,0.636137,0.654244,0.643157,0.624960,0.586185,1.0,0.293993,0.291288,0.278075,0.308901,1.0,0.664293,0.673522,0.665882,0.645945,0.0,0.204113,0.211690,0.192849,0.215172,1.0,0.548077,0.081679,0.093315,0.079047,0.080869,0.159639,0.0,0.899375,0.893381,0.882804,0.871333,0.866741,0.069810,0.860865,0.843595,0.843889,0.827313,0.827313,0.106029
1,0.738655,0.744116,0.738341,0.748598,0.748598,0.180089,0.000994,0.003671,0.007393,0.007502,0.006067,0.401814,0.006707,0.002815,0.005588,0.006648,0.006648,0.427205,0.914762,0.924504,0.939470,0.936364,0.925062,0.073974,0.746546,0.742309,0.746503,0.744186,1.0,0.767439,0.769278,0.760763,0.769765,0.231123,0.0,0.802866,0.796265,0.793551,0.799687,1.0,0.631592,0.635704,0.636378,0.643179,0.320831,0.0,0.262095,0.259220,0.245989,0.277487,0.0,0.670114,0.667815,0.674665,0.670759,1.0,0.210715,0.222403,0.204337,0.225793,1.0,0.538462,0.077997,0.077795,0.069712,0.073024,0.091692,0.0,0.887996,0.894269,0.893396,0.893556,0.889119,0.050814,0.839550,0.830756,0.834374,0.831424,0.831424,0.105682
2,0.754842,0.750821,0.744669,0.743674,0.743674,0.119667,0.000000,0.000000,0.000000,0.000000,0.000000,0.325512,0.006461,0.000000,0.000000,0.000000,0.000000,0.142092,0.856645,0.856686,0.817746,0.803896,0.794015,0.195237,0.735269,0.748800,0.734471,0.739703,0.0,0.770388,0.770500,0.760863,0.768846,0.245146,1.0,0.806448,0.800361,0.789182,0.799928,1.0,0.616120,0.627953,0.616786,0.632197,0.330109,0.0,0.240298,0.237306,0.224064,0.256021,0.0,0.660603,0.675938,0.670326,0.674817,0.0,0.190290,0.227014,0.213242,0.206897,0.0,0.538462,0.076317,0.082770,0.072691,0.080668,0.006374,0.0,0.887773,0.880275,0.870633,0.860000,0.855329,0.045691,0.845767,0.829297,0.840321,0.829369,0.829369,0.058437
3,0.773381,0.772304,0.775210,0.772534,0.772534,0.082714,0.000702,0.010605,0.001979,0.021005,0.016988,0.412112,0.000000,0.012728,0.002235,0.022395,0.022395,0.175731,0.822420,0.834933,0.841692,0.833766,0.823565,0.071822,0.758951,0.737511,0.759373,0.743345,1.0,0.801810,0.777325,0.787958,0.784576,0.285344,1.0,0.823881,0.831928,0.851423,0.846209,1.0,0.632495,0.609196,0.622228,0.633816,0.270460,1.0,0.302499,0.299840,0.286631,0.317277,1.0,0.685187,0.661851,0.679684,0.678510,1.0,0.214841,0.222200,0.202977,0.226000,1.0,0.538462,0.078958,0.106248,0.091360,0.077047,0.160339,1.0,0.885319,0.895824,0.900834,0.900444,0.896056,0.030410,0.872706,0.863146,0.878680,0.869310,0.869310,0.082454
4,0.775595,0.773673,0.769019,0.774997,0.774997,0.113697,0.010934,0.011887,0.015834,0.022390,0.018108,0.347056,0.022196,0.012899,0.017554,0.022671,0.022671,0.173568,0.830815,0.851568,0.853665,0.853247,0.842836,0.068013,0.749366,0.758961,0.777280,0.751751,0.0,0.811775,0.806356,0.796989,0.807559,0.240130,1.0,0.865194,0.845783,0.846227,0.851272,1.0,0.621941,0.631643,0.641167,0.644671,0.372674,0.0,0.325359,0.322822,0.309626,0.339791,1.0,0.679262,0.688021,0.697668,0.688394,0.0,0.227563,0.226607,0.228876,0.227862,1.0,0.538462,0.082039,0.075010,0.071698,0.057534,0.168675,1.0,0.902499,0.899822,0.901961,0.899778,0.895385,0.017568,0.883659,0.881529,0.887600,0.885463,0.885463,0.088738
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1713,0.258024,0.262042,0.266061,0.266995,0.266995,0.089215,0.683974,0.690595,0.665037,0.658145,0.697890,0.249183,0.700205,0.693068,0.668067,0.665639,0.665639,0.336296,0.002325,0.002559,0.002262,0.001948,0.001995,0.049572,0.285593,0.276602,0.285395,0.282712,1.0,0.229713,0.259753,0.260010,0.237589,0.116317,0.0,0.187701,0.216867,0.216606,0.193323,0.0,0.136075,0.129565,0.125447,0.148511,0.046247,1.0,0.732589,0.780331,0.767380,0.739267,0.0,0.013306,0.027301,0.012338,0.033606,0.0,0.966027,0.955791,0.957719,0.964552,1.0,0.953846,0.708620,0.725030,0.707051,0.714142,0.097156,0.0,0.178492,0.181919,0.185936,0.191556,0.208286,0.245259,0.049438,0.043478,0.047874,0.039060,0.039060,0.188254
1714,0.272551,0.273810,0.266061,0.262071,0.262071,0.138587,0.666024,0.721711,0.662068,0.726470,0.761514,0.397651,0.669699,0.728943,0.662775,0.737757,0.737757,0.476034,0.002970,0.002559,0.002262,0.002597,0.002660,0.058787,0.272343,0.274626,0.280918,0.280471,0.0,0.270388,0.232759,0.266332,0.233197,0.117719,1.0,0.236896,0.189759,0.226408,0.194769,1.0,0.158240,0.138308,0.153093,0.157049,0.095095,1.0,0.771930,0.740246,0.761497,0.732461,1.0,0.024584,0.024113,0.018611,0.034074,1.0,0.989822,0.972878,0.975936,0.988828,1.0,0.953846,0.731071,0.711500,0.715392,0.720177,0.109274,1.0,0.192994,0.188361,0.183232,0.181111,0.197188,0.314989,0.042333,0.055150,0.046387,0.056975,0.056975,0.365682
1715,0.271859,0.272441,0.273903,0.274518,0.274518,0.112378,0.704613,0.734005,0.692339,0.737377,0.771670,0.330444,0.720902,0.746244,0.699635,0.755043,0.755043,0.447853,0.002325,0.001919,0.002262,0.001948,0.001995,0.058555,0.294051,0.271804,0.290431,0.282432,1.0,0.246899,0.271570,0.260110,0.253728,0.163724,0.0,0.223045,0.229759,0.222039,0.217187,0.0,0.171845,0.158988,0.162235,0.173142,0.096274,1.0,0.752791,0.768038,0.756684,0.739267,0.0,0.020686,0.035064,0.016416,0.030276,0.0,0.997937,0.991931,0.985997,0.997103,1.0,0.950000,0.708380,0.725826,0.709235,0.716757,0.143317,0.0,0.181615,0.179476,0.187063,0.188889,0.205452,0.209582,0.052398,0.047272,0.050550,0.048752,0.048752,0.325400
1716,0.275042,0.274904,0.281882,0.281220,0.281220,0.058103,0.745600,0.752127,0.735941,0.735530,0.769951,0.261258,0.759705,0.762415,0.747059,0.749970,0.749970,0.367784,0.001033,0.001280,0.001596,0.001299,0.001330,0.090596,0.296307,0.293254,0.296307,0.303446,1.0,0.247305,0.259244,0.251380,0.250255,0.010050,1.0,0.219343,0.215542,0.217432,0.218031,0.0,0.172343,0.173715,0.165252,0.189202,0.069517,1.0,0.721956,0.749332,0.743316,0.728796,0.0,0.015229,0.031157,0.014638,0.033033,0.0,0.983082,1.000000,0.994698,0.994345,0.0,0.946154,0.706939,0.708317,0.691559,0.708912,0.098837,0.0,0.182731,0.179920,0.184133,0.180667,0.196715,0.198334,0.050918,0.047272,0.050847,0.046109,0.046109,0.199288


In [10]:
# taking Adj Close  as target variable
# independent variables x
X = df2.drop(columns=['Adj Close'])
# dependent variable y
y = df2['Adj Close']

In [11]:
# using a 70-30 train-test set split with a random state of 42
# this separation helps to prevent overfitting
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

We have 79 features and a terget variable. Using RFE to select the best 20 features.

In [12]:
# applying LinearRegression
lr = LinearRegression()
# applying RFE
rfe = RFE(estimator=lr, n_features_to_select=20, step=1)
# fitting rfe to training datasets
rfe.fit(X_train, y_train)

RFE(estimator=LinearRegression(), n_features_to_select=20)

In [13]:
rfe.get_support()

array([ True,  True,  True,  True, False, False, False,  True, False,
       False, False, False, False,  True, False,  True, False,  True,
       False,  True,  True,  True, False,  True, False,  True,  True,
       False,  True, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False,  True,  True,  True,  True,
       False, False, False, False, False, False, False])

In [14]:
rfe.ranking_

array([ 1,  1,  1,  1, 55, 15, 24,  1, 37, 14, 43,  2,  9,  1, 56,  1, 47,
        1, 20,  1,  1,  1, 35,  1,  8,  1,  1, 48,  1, 38, 25,  7, 53, 57,
        6,  1, 29, 18, 52, 10, 11, 28, 12, 54, 46, 41, 40, 44, 23, 49, 60,
       34, 33, 22, 50, 42, 26, 13,  5, 51, 58, 27, 32,  4, 19, 45, 59,  3,
        1,  1,  1,  1, 39, 31, 17, 21, 16, 30, 36])

In [15]:
# total number of features selected
sum(rfe.get_support())

20

In [16]:
# dropped columns
dropped_columns = [column for column in X_train
                  if column not in X_train.columns[rfe.get_support()]]
dropped_columns

['Volume',
 'SP_open',
 'SP_high',
 'SP_close',
 'SP_Ajclose',
 'SP_volume',
 'DJ_open',
 'DJ_high',
 'DJ_close',
 'DJ_volume',
 'EG_high',
 'EG_volume',
 'EU_open',
 'EU_Trend',
 'OF_Open',
 'OF_High',
 'OF_Low',
 'OF_Volume',
 'OF_Trend',
 'OS_Price',
 'OS_High',
 'OS_Low',
 'OS_Trend',
 'SF_Price',
 'SF_Open',
 'SF_High',
 'SF_Low',
 'SF_Volume',
 'SF_Trend',
 'USB_Price',
 'USB_Open',
 'USB_High',
 'USB_Low',
 'USB_Trend',
 'PLT_Price',
 'PLT_Open',
 'PLT_High',
 'PLT_Low',
 'PLT_Trend',
 'PLD_Price',
 'PLD_Open',
 'PLD_High',
 'PLD_Low',
 'PLD_Trend',
 'RHO_PRICE',
 'USDI_Price',
 'USDI_Open',
 'USDI_High',
 'USDI_Low',
 'USDI_Volume',
 'USDI_Trend',
 'GDX_Open',
 'GDX_Volume',
 'USO_Open',
 'USO_High',
 'USO_Low',
 'USO_Close',
 'USO_Adj Close',
 'USO_Volume']

In [17]:
# number of column dropped
len(dropped_columns)

59

Train and Test dataset after feature selection using RFE

In [18]:
# selecting best 20 features
X_train = X_train.drop(dropped_columns, axis = 1).reset_index(drop = True)
X_train

Unnamed: 0,Open,High,Low,Close,SP_low,DJ_low,DJ_Ajclose,EG_open,EG_low,EG_close,EG_Ajclose,EU_Price,EU_high,EU_low,OF_Price,OS_Open,GDX_High,GDX_Low,GDX_Close,GDX_Adj Close
0,0.202961,0.202791,0.197414,0.200383,0.488357,0.374166,0.399139,0.375565,0.376081,0.377922,0.383338,0.535382,0.541130,0.539087,0.319199,0.351687,0.131719,0.127789,0.139778,0.139617
1,0.198395,0.202244,0.201541,0.206812,0.703342,0.633996,0.640202,0.120496,0.121990,0.121429,0.124336,0.304483,0.297426,0.291959,0.199715,0.208193,0.195913,0.199684,0.203778,0.214516
2,0.089236,0.089217,0.093135,0.090685,0.513040,0.372431,0.383672,0.225107,0.227750,0.233117,0.237267,0.195658,0.197538,0.200897,0.212121,0.179036,0.048645,0.052964,0.056444,0.056050
3,0.333702,0.336344,0.338561,0.342498,0.920072,0.891253,0.890468,0.036549,0.036185,0.035714,0.036569,0.548915,0.539452,0.546372,0.376957,0.419880,0.197024,0.201713,0.206444,0.221403
4,0.548838,0.555008,0.537488,0.555464,0.221621,0.196086,0.194839,0.456929,0.452574,0.450649,0.453102,0.787426,0.790711,0.790137,0.732866,0.793012,0.381386,0.376831,0.381778,0.382540
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1197,0.386967,0.385605,0.390012,0.391328,0.571545,0.451584,0.453551,0.243187,0.240388,0.240909,0.245204,0.272625,0.265249,0.263939,0.233984,0.236265,0.396713,0.403876,0.406222,0.424731
1198,0.295102,0.291051,0.289586,0.290384,0.658691,0.574228,0.573507,0.200568,0.201144,0.198701,0.203459,0.091345,0.089256,0.100588,0.254728,0.308193,0.247668,0.245211,0.242667,0.255294
1199,0.067654,0.072934,0.069060,0.066612,0.531843,0.407413,0.408510,0.174738,0.179859,0.175974,0.178633,0.155061,0.154169,0.131690,0.296522,0.272410,0.035762,0.036511,0.036667,0.035547
1200,0.251522,0.250411,0.254230,0.252086,0.847421,0.848516,0.845989,0.046235,0.046162,0.046104,0.047208,0.392162,0.404029,0.400392,0.360281,0.354337,0.205464,0.211855,0.213333,0.224536


In [19]:
# transforming test dataset
X_test = X_test.drop(dropped_columns, axis = 1).reset_index(drop = True)
X_test

Unnamed: 0,Open,High,Low,Close,SP_low,DJ_low,DJ_Ajclose,EG_open,EG_low,EG_close,EG_Ajclose,EU_Price,EU_high,EU_low,OF_Price,OS_Open,GDX_High,GDX_Low,GDX_Close,GDX_Adj Close
0,0.374516,0.371237,0.375017,0.372042,0.443009,0.338340,0.344560,0.459512,0.459891,0.448701,0.454668,0.928108,0.918299,0.934155,0.858349,0.941325,0.304976,0.309894,0.306000,0.310810
1,0.216658,0.213738,0.220250,0.218438,0.700373,0.635729,0.634291,0.129536,0.131968,0.129870,0.132980,0.292078,0.281477,0.276828,0.205715,0.220843,0.195024,0.203741,0.200444,0.211021
2,0.779469,0.772441,0.769432,0.777185,0.041390,0.019922,0.022254,0.709415,0.723294,0.743506,0.739211,0.595151,0.585339,0.572990,0.721680,0.670241,0.764771,0.754113,0.774667,0.772439
3,0.743636,0.746579,0.744394,0.746683,0.090174,0.070433,0.073371,0.753326,0.775841,0.786364,0.781880,0.619961,0.630666,0.623704,0.706325,0.690602,0.713461,0.720757,0.717555,0.714763
4,0.446182,0.444581,0.423580,0.418137,0.266038,0.232469,0.238223,0.407852,0.410004,0.399351,0.401453,0.691288,0.691382,0.687868,0.780557,0.819036,0.316970,0.314627,0.309111,0.308397
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
511,0.180548,0.181719,0.183381,0.185337,0.486785,0.390040,0.390482,0.362004,0.368099,0.375325,0.380701,0.581618,0.604085,0.589241,0.522981,0.586145,0.146379,0.141086,0.153778,0.154036
512,0.231738,0.231253,0.235521,0.233073,0.829258,0.829412,0.836799,0.004908,0.004257,0.003247,0.003324,0.261911,0.275042,0.269823,0.328656,0.283855,0.165704,0.170385,0.167556,0.180296
513,0.279607,0.280104,0.284221,0.281904,0.544708,0.427402,0.427321,0.199923,0.206465,0.205195,0.208825,0.165492,0.163962,0.172037,0.252085,0.286145,0.268103,0.269552,0.275556,0.288111
514,0.212369,0.210317,0.207181,0.206812,0.520666,0.404516,0.405451,0.244479,0.239058,0.238312,0.242558,0.205244,0.218802,0.214626,0.218019,0.265542,0.223012,0.220194,0.218000,0.227933


In [20]:
# shape of test and train data after feature selection
X_train.shape, X_test.shape

((1202, 20), (516, 20))

In [21]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1202 entries, 0 to 1201
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Open           1202 non-null   float64
 1   High           1202 non-null   float64
 2   Low            1202 non-null   float64
 3   Close          1202 non-null   float64
 4   SP_low         1202 non-null   float64
 5   DJ_low         1202 non-null   float64
 6   DJ_Ajclose     1202 non-null   float64
 7   EG_open        1202 non-null   float64
 8   EG_low         1202 non-null   float64
 9   EG_close       1202 non-null   float64
 10  EG_Ajclose     1202 non-null   float64
 11  EU_Price       1202 non-null   float64
 12  EU_high        1202 non-null   float64
 13  EU_low         1202 non-null   float64
 14  OF_Price       1202 non-null   float64
 15  OS_Open        1202 non-null   float64
 16  GDX_High       1202 non-null   float64
 17  GDX_Low        1202 non-null   float64
 18  GDX_Clos