This project includes the implementation of evolutionary feature selection models based on MapReduce.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This repository includes the MapReduce implementations used in [1]. This implementation is based on Apache Mahout 0.8 library. The Apache Mahout ( project's goal is to build an environment for quickly creating scalable performant machine learning applications.


  • Hadoop 2.5.
  • ant

Associated paper:

  • D. Peralta, S. Del Río, S. Ramírez-Gallego, I. Triguero, J.M. Benítez, F. Herrera. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach.

Compile the whole project with ANT:

$ ant

Put the dataset folder into the HDFS system:

hadoop fs -put datasets/

Generate descriptor file needed by the mahout code. (Check:

$ hadoop jar Model.jar -p  datasets/page-blocks-10-fold/  -f  datasets/page-blocks-10-fold/ -d  10 N L
hadoop jar Model.jar org.apache.mahout.classifier.feature_selection.mapreduce.FeatureSelectionModel -h

 [--data  --dataset  --header  --output ]          
  --data (-d) path           Data path                                          
  --dataset (-ds) dataset    The path of the file descriptor of the dataset     
  --header (-he) header      Header of the dataset in Keel format               
  --output (-o) path         Output path, will contain the set of selected      

Example of use:

To compute the number of mappers, we have to check the number of bytes of the training file:

$ ls -l datasets/page-blocks-10-fold/ 
 -rw-rw-r-- 1 isaac isaac 221580 jul 15  2013 datasets/page-blocks-10-fold/ 

If we want to have 4 maps, we should divide this number by 4 (55395).

hadoop jar Model.jar org.apache.mahout.classifier.feature_selection.mapreduce.FeatureSelectionModel -Dmapred.max.split.size=55395 -d  -d datasets/page-blocks-5-fold/  -he datasets/page-blocks-5-fold/page-blocks.header  -ds datasets/page-blocks-5-fold/  -o output-FS-pageblocks

Build the preprocessed dataset for classification purposes:

hadoop jar Model.jar org.apache.mahout.classifier.feature_selection.mapreduce.FSconstructor -i datasets/page-blocks-5-fold/ -fs output-FS-pageblocks/seleccionadas.txt -ds datasets/page-blocks-5-fold/ -he datasets/page-blocks-5-fold/page-blocks.header -o output-FSconstructor