Skip to content

Commit

Permalink
refactor code to reduce redundancy
Browse files Browse the repository at this point in the history
  • Loading branch information
yzhao062 committed Jul 29, 2019
1 parent ec7be23 commit 9bb6fc5
Show file tree
Hide file tree
Showing 6 changed files with 22 additions and 23 deletions.
3 changes: 2 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ v<0.0.4>, <07/17/2019> -- Update documentation.
v<0.0.4>, <07/21/2019> -- Add code maintainability.
v<0.0.5>, <07/27/2019> -- Add median combination and score_to_proba function.
v<0.0.5>, <07/28/2019> -- Add Stacking (meta ensembling).
v<0.0.6>, <07/29/2019> -- Enable Appveyor integration.
v<0.0.6>, <07/29/2019> -- Enable Appveyor integration.
v<0.0.6>, <07/29/2019> -- Update requirements file.
21 changes: 10 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,16 @@ combo: A Python Toolbox for Machine Learning Model Combination
-----


**combo** is a Python toolbox for combining or aggregating ML models and
scores for various tasks, including **classification**, **clustering**,
**anomaly detection**, and **raw score**. It has been widely used in data
science competitions and real-world tasks, such as Kaggle.
**combo** is a comprehensive Python toolbox for combining machine
learning (ML) models and scores for various tasks, including **classification**,
**clustering**, **anomaly detection**, and **raw score**.

Model and score combination can be regarded as a subtask of
Model combination has been widely used in data science competitions and
real-world tasks, such as Kaggle. It can be considered as a subtask of
`ensemble learning <https://en.wikipedia.org/wiki/Ensemble_learning>`_,
but is often beyond the scope of ensemble learning. For instance,
averaging the results of multiple runs of a ML model is deemed as
a reliable way of eliminating the randomness for better stability. See
a reliable way of eliminating the randomness. See
figure below for some popular combination approaches.

.. image:: https://raw.githubusercontent.com/yzhao062/combo/master/docs/figs/framework_demo.png
Expand All @@ -88,9 +88,8 @@ figure below for some popular combination approaches.
combo is featured for:

* **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
* **Advanced models**, including dynamic classifier/ensemble selection and LSCP.
* **Broad applications** for classification, clustering, anomaly detection, and raw score.
* **Comprehensive coverage** for supervised, unsupervised, and semi-supervised scenarios.
* **Advanced models**, such as dynamic classifier/ensemble selection.
* **Comprehensive coverage** for classification, clustering, anomaly detection, and raw score.
* **Optimized performance with JIT and parallelization** when possible, using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.


Expand All @@ -106,7 +105,7 @@ combo is featured for:
KNeighborsClassifier(), RandomForestClassifier(),
GradientBoostingClassifier()]
clf = Stacking(base_clfs=classifiers) # initialize a Stacking model
clf = Stacking(base_estimators=classifiers) # initialize a Stacking model
clf.fit(X_train)
# predict on unseen data
Expand Down Expand Up @@ -340,7 +339,7 @@ demonstrates the basic API of stacking (meta ensembling).
from combo.models.stacking import Stacking
clf = Stacking(base_clfs=classifiers, n_folds=4, shuffle_data=False,
clf = Stacking(base_estimators=classifiers, n_folds=4, shuffle_data=False,
keep_original=True, use_proba=False, random_state=random_state)
clf.fit(X_train, y_train)
Expand Down
2 changes: 1 addition & 1 deletion docs/example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ demonstrates the basic API of stacking (meta ensembling).
from combo.models.stacking import Stacking
clf = Stacking(base_clfs=classifiers, n_folds=4, shuffle_data=False,
clf = Stacking(base_estimators=classifiers, n_folds=4, shuffle_data=False,
keep_original=True, use_proba=False, random_state=random_state)
clf.fit(X_train, y_train)
Expand Down
17 changes: 8 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,16 +73,16 @@ Welcome to combo's documentation!
-----


**combo** is a Python toolbox for combining or aggregating ML models and
scores for various tasks, including **classification**, **clustering**,
**anomaly detection**, and **raw score**. It has been widely used in data
science competitions and real-world tasks, such as Kaggle.
**combo** is a comprehensive Python toolbox for combining machine
learning (ML) models and scores for various tasks, including **classification**,
**clustering**, **anomaly detection**, and **raw score**.

Model and score combination can be regarded as a subtask of
Model combination has been widely used in data science competitions and
real-world tasks, such as Kaggle. It can be considered as a subtask of
`ensemble learning <https://en.wikipedia.org/wiki/Ensemble_learning>`_,
but is often beyond the scope of ensemble learning. For instance,
averaging the results of multiple runs of a ML model is deemed as
a reliable way of eliminating the randomness for better stability. See
a reliable way of eliminating the randomness. See
figure below for some popular combination approaches.

.. image:: https://raw.githubusercontent.com/yzhao062/combo/master/docs/figs/framework_demo.png
Expand All @@ -93,9 +93,8 @@ figure below for some popular combination approaches.
combo is featured for:

* **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
* **Advanced models**, including dynamic classifier/ensemble selection and LSCP.
* **Broad applications** for classification, clustering, anomaly detection, and raw score.
* **Comprehensive coverage** for supervised, unsupervised, and semi-supervised scenarios.
* **Advanced models**, such as dynamic classifier/ensemble selection.
* **Comprehensive coverage** for classification, clustering, anomaly detection, and raw score.
* **Optimized performance with JIT and parallelization** when possible, using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.


Expand Down
Empty file added docs/rebuilt.sh
Empty file.
2 changes: 1 addition & 1 deletion examples/stacking_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@

print()
# build a Stacking model and evaluate
clf = Stacking(base_clfs=classifiers, n_folds=4, shuffle_data=False,
clf = Stacking(classifiers, n_folds=4, shuffle_data=False,
keep_original=True, use_proba=False,
random_state=random_state)

Expand Down

0 comments on commit 9bb6fc5

Please sign in to comment.