## Introduction to Recursive Feature Elimination
Welcome! Today's topic is an essential technique in data science and machine learning, called Recursive Feature Elimination (RFE). It's a method used for feature selection, that is, for choosing the relevant input variables in our training data.

In Recursive Feature Elimination, we initially fit the model using all available features. Then we recursively eliminate the least important features and fit the model again. We continue this process until we are left with the specified number of features. What we achieve at the end is a model that's potentially more efficient and performs better.

Sound exciting? Let's go ahead and dive into action!

## Understanding the Recursive Feature Elimination
The concept of Recursive Feature Elimination is simple yet powerful. It is based on the idea of recursively removing the least important features from the model. The process involves the following steps:

Fit the model using all available features.
Rank the features based on their importance to the model using a specific criterion (like coefficients, feature importance, etc.).
Remove the least important feature(s) from the model.
Repeat steps 1-3 until the desired number of features is reached.

## Data Generation With Scikit-learn
Our exploration of RFE starts with generating some data. We will use a utility from Scikit-learn called make_classification to create a mock (synthetic) dataset. It is extremely useful for trying out different algorithms and understanding their impacts. Here is how we do it.


In [1]:
# Import necessary libraries
from sklearn.datasets import make_classification

# Creating a mock data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)

The make_classification function generates a random n-class classification problem. In the above example, we've generated data with 1000 samples and 10 features, out of which only 5 features are informative. The rest 5 features are redundant.

After generating the dataset, we can split the data into training and testing sets using the train_test_split function from Scikit-learn, so that we can train our model on the training data and evaluate it on the testing data. We will skip this step in this lesson, and focus on applying RFE directly.

## Applying Recursive Feature Elimination
Now comes the engaging part — applying the RFE. We will be using the Scikit-learn's RFE function. It simplifies the whole recursive feature elimination process under the hood and gives the user a comfortable way to apply it.

Before applying RFE, let's talk about DecisionTreeClassifier brieflt, which we will use as the base estimator in RFE. A decision tree is a flowchart-like structure in which each internal node represents a feature, each branch represents a decision rule, and each leaf node represents the outcome. It's a popular choice for classification tasks. We will use it as the base estimator in RFE to rank the features.

Having that, let's now see how to use it in practice:

In [2]:
# Import necessary libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import RFE

# Initialize the base estimator
model = DecisionTreeClassifier(random_state=1)

# Applying RFE
rfe = RFE(estimator=model, n_features_to_select=5)
rfe.fit(X, y)

In the above code, we first initialize a DecisionTreeClassifier model, which will be our base estimator. Then we instantiate the RFE function with the base model and the desired number of features to select. Finally, we fit the RFE to our training data.

## Interpreting Feature Rankings from RFE
The last step in applying RFE is getting the rankings of the features. RFE provides an attribute called ranking_ which gives the ranking of all features. The most important features are assigned rank 1. So, let's get the ranking now.



In [3]:
# Retrieving the feature ranking
ranking = rfe.ranking_
print('Feature Ranking:', ranking)

Feature Ranking: [3 5 1 1 1 6 1 4 1 2]


This output indicates the rank assigned to each feature. Features with a rank of 1 are considered most informative for the model according to RFE analysis. We can also get the selected features using the support_ attribute of the RFE object.

In [4]:
# Retrieving the selected features by RFE
selected_features = rfe.support_
print('Selected Features:', selected_features) # [False False True True True False True False True False]

Selected Features: [False False  True  True  True False  True False  True False]


A True value in the output indicates that the corresponding feature is selected by RFE. In this case, the selected features are the 3rd, 4th, 5th, 7th, and 9th features.

## Importance of Feature Selection
While analyzing the result, think about this: what if we had used all the features without selection? Thinking about this gives us a clear understanding of why feature selection is an essential step in machine learning model development.

Feature selection not only improves the efficiency of the model by reducing the computational complexity but also improves its performance by eliminating irrelevant and redundant features. Using techniques like RFE, we can narrow down the most significant features from hundreds or thousands of features in our dataset.

Moreover, by understanding which features are most definitive to the model’s decisions, we can gain insightful knowledge about the problem at hand. This is especially useful in data-driven decision making where understanding the most influential factors becomes crucial.



## Lesson Summary and Practice
Congratulations! You've just understood the concept of Recursive Feature Elimination, and you learned how to generate synthetic data with Scikit-learn and apply RFE on that data. Furthermore, you also got hands-on with interpreting the RFE results, understanding the importance of each feature in the model.

Remember that understanding the theory behind the process is equally essential as getting hands-on. So, make sure you understand the insight you gained from this lesson.

Now, it's time to practice implementing and interpreting Recursive Feature Elimination for different datasets and models. This practice will give you a greater intuition of how to go about choosing features in real scenarios. Not to mention, it will bring you one step closer to becoming a data science expert. Happy learning!



## Unveiling the Top Features with Recursive Feature Elimination

In the context of predicting creditworthiness, which attributes of a person are most indicative of their likelihood to repay a loan? To find out, we will use the Recursive Feature Elimination (RFE) method with a Decision Tree Classifier to identify the top three contributing features from a synthetic dataset. Let's run the code and see what we discover!

In [5]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import RFE

# Creating a synthetic dataset to simulate predicting creditworthiness
X, y = make_classification(n_samples=100, n_features=8, n_informative=3, n_clusters_per_class=1, random_state=42)

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing the Decision Tree Classifier
model = DecisionTreeClassifier(random_state=42)

# Applying Recursive Feature Elimination to select the top 3 features
selector = RFE(estimator=model, n_features_to_select=3)
selector = selector.fit(X_train, y_train)

# Getting the ranking of the features
ranking = selector.ranking_

# Printing the ranking of features
print('Feature Ranking:', ranking)

Feature Ranking: [1 1 1 5 4 3 6 2]


## Adjusting Feature Selection with RFE

On we go, Space Explorer! Now, let's tweak the number of features selected by the Recursive Feature Elimination (RFE) process. Your mission is to adjust the code to select the 3 most important features instead of 5. Remember, this can impact our model's view of the data universe!

In [6]:
# Import necessary libraries
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import RFE

# Creating mock data
X, y = make_classification(n_samples=200, n_features=15, n_informative=5, n_redundant=10, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the base estimator
model = DecisionTreeClassifier(random_state=42)

# Applying RFE
rfe = RFE(estimator=model, n_features_to_select=5)
rfe.fit(X_train, y_train)

# Retrieving the feature ranking
ranking = rfe.ranking_

# Print statement added for output readability
print('Feature Ranking:', ranking)

# Retrieving the selected features based on the ranking
selected_features = [i for i in range(len(ranking)) if ranking[i] == 1]
print('Selected Features:', selected_features)

Feature Ranking: [11 10  4  8  9  1  5  1  6  7  1  1  1  3  2]
Selected Features: [5, 7, 10, 11, 12]


## Navigating the Stars of Feature Selection

Great strides, Space Voyager! Now that you understand how to apply Recursive Feature Elimination, let's see if you can handle the feature selection controls. In the snippet below, add the lines of code needed to initialize a classifier and apply RFE for feature ranking. Remember, the galaxy's knowledge is within your grasp!

In [7]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import RFE
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Loading the Iris dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the decision tree classifier with a random state of 42
classifier = DecisionTreeClassifier(random_state=42)

# Create an RFE instance with the classifier you initialized, aiming to select 2 features, and fit it to the training data
rfe = RFE(estimator=classifier, n_features_to_select=2)
rfe.fit(X_train, y_train)

# Retrieving the feature ranking
ranking = rfe.ranking_
print('Feature Ranking:', ranking)

# Retrieving the selected feature names based on the ranking
selected_features = [feature_names[i] for i in range(len(ranking)) if ranking[i] == 1]
print('Selected Features:', selected_features)


Feature Ranking: [3 2 1 1]
Selected Features: ['petal length (cm)', 'petal width (cm)']


## Navigating the Stars: Recursive Feature Elimination

Having explored the depths of space with Recursive Feature Elimination, you're now ready to construct your own starship. Calibrate your spacecraft's instruments to identify the most informative features in a dataset of celestial objects. Remember to assess the data, prepare for the journey, and outfit your vessel with the right tools to navigate the cosmos!

In [9]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import RFE

# Creating artificial data for classification
X, y = make_classification(n_samples=200, n_features=8, n_informative=4, n_redundant=2, n_classes=2, random_state=7)

# Splitting into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=7)

# Initialize the classifier model you'll be using
classifier = DecisionTreeClassifier(random_state=7)

# Apply Recursive Feature Elimination to find the top features for the model
rfe = RFE(estimator=classifier, n_features_to_select=4)
rfe.fit(X_train, y_train)

# Print out the selected features and their rankings
ranking = rfe.ranking_
selected_features = [i for i in range(len(ranking)) if ranking[i] == 1]

print('Feature Ranking:', ranking)
print('Selected Features:', selected_features)


Feature Ranking: [3 1 1 5 4 1 2 1]
Selected Features: [1, 2, 5, 7]
