Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling

This repository contains a Python 3 implementation of meta-feature representations, sparsification and selective sampling studied on the proposed extended pre-processing pipeline.

Citation

@article{cunha20,
title = {Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling},
journal = {Information Processing & Management},
volume = {57},
number = {4},
pages = {102263},
year = {2020},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2020.102263},
url = {https://www.sciencedirect.com/science/article/pii/S030645731931461X},
author = {Washington Cunha and Sérgio Canuto and Felipe Viegas and Thiago Salles and Christian Gomes and Vitor Mangaravite and Elaine Resende and Thierson Rosa and Marcos André Gonçalves and Leonardo Rocha}
}

Installing

Clone this repository in your machine and execute the installation with pip.

Requirements

This project is based on python==3.6. The dependencies are as follow:

Cython==0.29.23
joblib==1.0.1
mlpack3==3.0.2.post1
numpy==1.17.4
pandas==1.1.5
python-dateutil==2.8.1
pytz==2021.1
scikit-learn==0.21.3
scipy==1.5.4
six==1.16.0
tqdm==4.60.0

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Authors

Washington Cunha - Initial work - waashk
Sergio Canuto - scanfbr

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Example

python pipeline.py -d webkb --folds 10 --out out --datain datasets/ --MFmetric cosine l2 --MFapproach knn cent --cover 0.70

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
classifiers		classifiers
datasets/webkb/tfidf		datasets/webkb/tfidf
inout		inout
instanceselection		instanceselection
mf		mf
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
exec.sh		exec.sh
pipeline.py		pipeline.py
printresults.py		printresults.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling

Citation

Installing

Requirements

Contributing

Authors

License

Example

About

Releases

Packages

Languages

License

waashk/extended-pipeline

Folders and files

Latest commit

History

Repository files navigation

Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling

Citation

Installing

Requirements

Contributing

Authors

License

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages