Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for paper #5

Merged
merged 1 commit into from
Mar 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 10 additions & 12 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,30 +21,28 @@ bibliography: paper.bib

# Summary

Performance metrics are pivotal in machine learning field, especially for tasks like regression, classification, and clustering [@saura_using_2021]. They offer quantitative measures to assess the accuracy and efficacy of models, aiding researchers and practitioners in evaluating, contrasting, and enhancing algorithms and models.
Performance metrics are pivotal in machine learning, especially for tasks like regression, classification, and clustering [@saura_using_2021]. They offer quantitative measures to assess the accuracy and efficacy of models, aiding researchers and practitioners in evaluating, contrasting, and enhancing algorithms and models.
In regression tasks, where continuous predictions are made, metrics such as mean squared error (MSE), root mean square error (RMSE), and Coefficient of Determination (COD) [@nguyen2018resource; @nguyen2019building] can reveal how well models capture data patterns. In classification tasks, metrics such as accuracy, precision, recall, F1-score, and AUC-ROC [@luque_impact_2019] assess a model's ability to classify instances correctly, detect false results, and gauge overall predictive performance. Clustering tasks aim to discover inherent patterns and structures within unlabeled data by grouping similar instances together. Metrics like Silhouette coefficient, Davies-Bouldin index, and Calinski-Harabasz index [@nainggolan_improved_2019] measure clustering quality, helping evaluate how well algorithms capture data distribution and assign instances to clusters.
In general, performance metrics serve multiple purposes. They enable researchers to compare different models and algorithms [@ahmed2021comprehensive], identify strengths and weaknesses [@Nguyen2019], and make informed decisions about model selection and parameter tuning [@nguyen2020new]. Moreover, it also plays a crucial role in the iterative process of model development and improvement. By quantifying the model's performance, metrics guide the optimization process [@thieu_groundwater_2023], allowing researchers to fine-tune algorithms, explore feature engineering techniques [@nguyen2021multi], and address issues such as overfitting, underfitting, and bias [@nguyen2020eo].
This paper introduces a Python framework named **PerMetrics** (PERformance METRICS), designed to offer comprehensive performance metrics for machine learning models. The library, packaged as `permetrics`, is open-source and written in Python. It provides a wide number of metrics to enable users to evaluate their models effectively. `permetrics` is hosted on GitHub and is under continuous development and maintenance by the dedicated team. The framework is accompanied by comprehensive documentation, examples, and test cases, facilitating easy comprehension and integration into users' workflows.

In general, performance metrics serve multiple purposes. They enable researchers to compare different models and algorithms [@ahmed2021comprehensive], identify strengths and weaknesses [@Nguyen2019], and make informed decisions about model selection and parameter tuning [@nguyen2020new]. Moreover, it also plays a crucial role in the iterative model development and improvement process. By quantifying the model's performance, metrics guide the optimization process [@thieu_groundwater_2023], allowing researchers to fine-tune algorithms, explore feature engineering techniques [@nguyen2021multi], and address issues such as overfitting, underfitting, and bias [@nguyen2020eo].
This paper introduces a Python framework, **PerMetrics** (PERformance METRICS), designed to offer comprehensive performance metrics for machine learning models. The library, packaged as `permetrics`, is open-source and written in Python. It provides a wide number of metrics to enable users to evaluate their models effectively. `permetrics` is hosted on GitHub and is under continuous development and maintenance by the dedicated team. The framework comprises comprehensive documentation, examples, and test cases, facilitating easy comprehension and integration into users' workflows.

# Statement of need

**PerMetrics** is a Python project developed in the field of performance assessment and machine learning. To the best of our knowledge, it is the first open-source framework that contributes a significant number of metrics, totaling 111 methods, for three fundamental problems: regression, classification, and clustering. This library relies exclusively on only two well-known third-party Python scientific computing packages: `NumPy` [@harris2020array] and `SciPy` [@virtanen2020scipy]. The modules of `permetrics` are extensively documented, and the automatically generated API provides a complete and up-to-date description of both the object-oriented and functional implementations underlying the framework.
**PerMetrics** is a Python project developed in the field of performance assessment and machine learning. To our knowledge, it is the first open-source framework that contributes a significant number of metrics, totaling 111 methods, for three fundamental problems: regression, classification, and clustering. This library relies exclusively on only two well-known third-party Python scientific computing packages: `NumPy` [@harris2020array] and `SciPy` [@virtanen2020scipy]. The modules of `permetrics` are extensively documented, and the automatically generated API provides a complete and up-to-date description of both the object-oriented and functional implementations underlying the framework.

To gain a better understanding of the necessity of **PerMetrics** library, this section will compare it to several notable libraries currently are available. Most notably, `Scikit-Learn` [@scikit_learn], which also encompasses an assortment of metrics for regression, classification, and clustering problems. Nevertheless, a few classification metrics present in `Scikit-Learn` lack support for multiple outputs, such as the Matthews correlation coefficient (MCC) and Hinge loss. Furthermore, critical metrics such as RMSE, mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE) are absent. `permetrics` addresses these deficiencies. Additionally, `Scikit-Learn` is deficient in various vital clustering metrics, including but not limited to Ball Hall index, Banfeld Raftery index, sum of squared error, Duda Hart index, and Hartigan index [@van2023metacluster].
This section provides more context and comparisons to existing libraries to describe the gap that the **PerMetrics** library fills for performance measuring, most notably, `Scikit-Learn` [@scikit_learn], which also encompasses an assortment of metrics for regression, classification, and clustering problems. Nevertheless, a few classification metrics in `Scikit-Learn` lack support for multiple outputs, such as the Matthews correlation coefficient (MCC) and Hinge loss. Furthermore, critical metrics such as RMSE, mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE) are absent. `permetrics` addresses these deficiencies. Additionally, `Scikit-Learn` is deficient in various vital clustering metrics, including but not limited to Ball Hall index, Banfeld Raftery index, the sum of squared error, Duda Hart index, and Hartigan index [@van2023metacluster].

Another popular package is `Metrics` [@benhamner]. It provides a variety of metrics for different programming languages such as Python, MATLAB, R, and Haskell. However, the development team has ceased activity since 2015. They offer a limited number of metrics because they focused on creating a single set of metrics for multiple programming languages. Additionally, the metrics are not packaged as a complete library but rather exist as repository code on GitHub.
Another popular package is `Metrics` [@benhamner]. It provides a variety of metrics for different programming languages, such as Python, MATLAB, R, and Haskell. However, the development team has ceased activity since 2015. They offer limited metrics because they focus on creating a single set of metrics for multiple programming languages. Additionally, the metrics are not packaged as a complete library but rather exist as repository code on GitHub.

`TorchMetrics` [@torchmetrics] is a widely recognized framework for performance metrics developed for PyTorch users. The library includes over 100 metrics, covering various domains such as regression, classification, audio, detection, and text. However, `TorchMetrics` does not provide metrics specifically for clustering tasks. Although it offers a substantial number of metrics, it falls short compared to `permetrics`. Moreover, it relies heavily on other major libraries such as `NumPy`, `Torch`, `Typing-extensions`, `Packaging`, and `Lightning-utilities`. Additionally, using this library may not be easy for beginners in Python programming, as it requires a deep understanding of the Torch library to utilize `TorchMetrics` effectively.
`TorchMetrics` [@torchmetrics] is a widely recognized framework for performance metrics developed for PyTorch users. The library includes over 100 metrics, covering various domains such as regression, classification, audio, detection, and text. However, `TorchMetrics` does not provide metrics specifically for clustering tasks. Although it offers a substantial number of metrics, it falls short compared to `permetrics`. Moreover, it relies heavily on other major libraries such as `NumPy`, `Torch`, `Typing-extensions`, `Packaging`, and `Lightning-utilities`. Additionally, using this library may not be easy for beginners in Python programming, as it requires a deep understanding of the `Torch` library to utilize `TorchMetrics` effectively.

Other popular libraries such as `TensorFlow` [@abadi2016tensorflow], `Keras` [@chollet2017xception], `CatBoost` [@prokhorenkova2018catboost], and `MxNet` [@chen2015mxnet] also contain modules dedicated to metrics. However, the issue with these libraries is that their metric modules are specific to each respective one. It is challenging to combine metric modules from different libraries with each other. If it is possible to combine them, it often requires installing numerous related libraries. Furthermore, the metric modules within each library are tailored to users who are familiar with that specific one, requiring users to learn multiple libraries, syntax structures, and necessary commands associated with each framework to use them in combination. These are significant obstacles when using metrics from such libraries.

All the aforementioned challenges are addressed by our **PerMetrics** library. It not only offers a simple and concise syntax and usage but also does not require any knowledge of other major libraries such as `TensorFlow`, `Keras`, or `PyTorch`. Additionally, it can be seamlessly integrated with any computational or machine learning library. In the future, we plan to expand `permetrics` to include other domains such as text metrics, audio metrics, detection metrics, and image metrics.

All the aforementioned challenges are addressed by our **PerMetrics** library. It offers a simple and concise syntax and usage and does not require any knowledge of other major libraries such as `TensorFlow`, `Keras`, or `PyTorch`. Additionally, it can be seamlessly integrated with any computational or machine learning library. In the future, we plan to expand `permetrics` to include other domains such as text, audio, detection, and image metrics.

# Available Methods

At the time of publication, `PerMetrics` provides three types of performance metrics include regression, classification, and clustering metrics. We listed all methods of each type below.
At the time of publication, `PerMetrics` provided three types of performance metrics, including regression, classification, and clustering metrics. We listed all methods of each type below.

| **Problem** | **ID** | **Metric** | **Metric Fullname** |
|----------------|--------|------------|--------------------------------------------------|
Expand Down Expand Up @@ -171,7 +169,7 @@ At the time of publication, `PerMetrics` provides three types of performance met
pip install permetrics
```

Below are a few fundamental examples illustrating the usage of the `permetrics` library. We have prepared a folder `examples` in Github repository that contains these examples and more advances one. Furthermore, to gain a comprehensive understanding of our library, we recommend reading the documentation available at the following [link](https://permetrics.readthedocs.io/).
Below are a few fundamental examples illustrating the usage of the `permetrics` library. We have prepared a folder named `examples` in the Github repository containing these and more advanced examples. Furthermore, to understand our library comprehensively, we recommend reading the documentation available at the following [link](https://permetrics.readthedocs.io/).

## Regression Metrics

Expand Down