# Filtering

Load an example data set to apply filtering and split features and labels.

In [1]:
from sklearn.datasets import fetch_covtype

data = fetch_covtype()

features = data.data
labels = data.target

Print description of the dataset.

In [2]:
print(data.DESCR)

.. _covtype_dataset:

Forest covertypes
-----------------

The samples in this dataset correspond to 30Ã—30m patches of forest in the US,
collected for the task of predicting each patch's cover type,
i.e. the dominant species of tree.
There are seven covertypes, making this a multiclass classification problem.
Each sample has 54 features, described on the
`dataset's homepage <https://archive.ics.uci.edu/ml/datasets/Covertype>`__.
Some of the features are boolean indicators,
while others are discrete or continuous measurements.

**Data Set Characteristics:**

    Classes                        7
    Samples total             581012
    Dimensionality                54
    Features                     int

:func:`sklearn.datasets.fetch_covtype` will load the covertype dataset;
it returns a dictionary-like object
with the feature matrix in the ``data`` member
and the target values in ``target``.
The dataset will be downloaded from the web if necessary.



Apply mutual information method and get 10 best features

In [3]:
from sklearn.feature_selection import chi2, SelectKBest, mutual_info_classif

features_filtered = SelectKBest(score_func=mutual_info_classif, k=10).fit_transform(features,labels)

Print results

In [4]:
print('there are '+str(features.shape[1])+' features before filtering')
print('there are '+str(features_filtered.shape[1])+' features after filtering')

there are 54 features before filtering
there are 10 features after filtering
