# Feature Selection – Dropping Constant Features

Feature selection is the process of selecting relevant features for building
machine learning models.

# why drop constant features?

-constant features have zero variance
-They do not help the model differentiate between data points
-removing them helps improve model efficiency

To remove constant features, we use **VarianceThreshold** from scikit-learn.


In [5]:
import pandas as pd
from sklearn.feature_selection import VarianceThreshold

In [6]:
df = pd.DataFrame({"A":[1,2,4,1,2,4], 
                    "B":[4,5,6,7,8,9], 
                    "C":[0,0,0,0,0,0],
                    "D":[1,1,1,1,1,1]}) 

## VarianceThreshold

VarianceThreshold removes all features whose variance is below a given threshold.

- `threshold = 0` → removes only **constant features**
- It works only on **input features (X)** and not on target variable (y)


In [7]:
vt = VarianceThreshold(threshold=0)
vt.fit(df)

0,1,2
,"threshold  threshold: float, default=0 Features with a training-set variance lower than this threshold will be removed. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples.",0


## Support Mask

The `get_support()` method returns a boolean mask:
- `True` → feature is kept
- `False` → feature is removed


In [8]:
vt.get_support()

array([ True,  True, False, False])

## Selecting Non-Constant Features

We now extract the column names that are not constant.


In [9]:
df.columns[vt.get_support()]

Index(['A', 'B'], dtype='object')

In [10]:
# Identify constant columns

const_col = [column for column in df.columns
            if column not in df.columns[vt.get_support()]]

In [11]:

# drop constant columns

df_filtered = df.drop(const_col,axis = 1)

## Final Result

- Constant features **C** and **D** are removed
- Remaining features **A** and **B** have useful information
- Dataset is now cleaner and more suitable for machine learning models
