Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling
This repository contains a Python 3 implementation of meta-feature representations, sparsification and selective sampling studied on the proposed extended pre-processing pipeline.
@article{cunha20,
title = {Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling},
journal = {Information Processing & Management},
volume = {57},
number = {4},
pages = {102263},
year = {2020},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2020.102263},
url = {https://www.sciencedirect.com/science/article/pii/S030645731931461X},
author = {Washington Cunha and Sérgio Canuto and Felipe Viegas and Thiago Salles and Christian Gomes and Vitor Mangaravite and Elaine Resende and Thierson Rosa and Marcos André Gonçalves and Leonardo Rocha}
}
Clone this repository in your machine and execute the installation with pip.
This project is based on python==3.6
. The dependencies are as follow:
Cython==0.29.23
joblib==1.0.1
mlpack3==3.0.2.post1
numpy==1.17.4
pandas==1.1.5
python-dateutil==2.8.1
pytz==2021.1
scikit-learn==0.21.3
scipy==1.5.4
six==1.16.0
tqdm==4.60.0
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
python pipeline.py -d webkb --folds 10 --out out --datain datasets/ --MFmetric cosine l2 --MFapproach knn cent --cover 0.70