## Drop constant features - low varience

**Removing Constant & Low Variance Features**

In Machine Learning, some features may contain little to no useful information.
One common preprocessing step is removing constant and low variance features.

In [34]:
import pandas as pd 
from sklearn.feature_selection import VarianceThreshold

In [35]:
data=pd.DataFrame({"A":[1,2,3,4,5],"B":[1,1,1,1,1],"c":[2,3,4,5,6],"d":[2,2,2,2,2]})
data

Unnamed: 0,A,B,c,d
0,1,1,2,2
1,2,1,3,2
2,3,1,4,2
3,4,1,5,2
4,5,1,6,2


In [36]:
varience=VarianceThreshold(threshold=0)

In [37]:
reduced_feature=varience.fit_transform(data)

In [39]:
varience.get_support()

array([ True, False,  True, False])

In [41]:
remaining_feature=data.columns[varience.get_support(indices=True)]

In [42]:
remaining_feature

Index(['A', 'c'], dtype='str')

In [45]:
type(varience)

sklearn.feature_selection._variance_threshold.VarianceThreshold

In [47]:
reduced_data=pd.DataFrame(reduced_feature,columns=remaining_feature)
reduced_data.head()

Unnamed: 0,A,c
0,1,2
1,2,3
2,3,4
3,4,5
4,5,6


# Pros

Very simple and fast

Removes constant / near-constant features

Reduces dataset size

Does not need target variable

## Cons

Ignores target (may remove useful features)

Scale dependent

Cannot detect correlated features

May remove imbalanced binary features

# When to Use

As a first data cleaning step

When dataset has many useless features

To remove constant columns