In the article “Which Machine Learning (ML) to choose? [1]” which helps you to choose the right ML for your data, we indicated that “From a business perspective, two of the most significant measurements are accuracy [2] and interpretability.” However, we did not discuss computational complexity.
Computational complexity of an algorithm is a fundamental concept in computer science. It is necessary to take computational complexity into account because it affects the amount of resources required to run your model. These required resources are estimated for time, space, and sample complexities.
These estimated computational complexities are approximated for a variety of different machine learning algorithms during Training and Prediction.
- Time, Space, & Sample
“Time complexity is the computational complexity that estimates the number of elementary operations performed by an algorithm.” [Wikipedia]
“Space complexity of an algorithm is an estimated amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input.” [Wikipedia]
“Sample complexity of a machine learning algorithm represents the number of training samples that it needs in order to successfully learn a target function.” [Wikipedia]
- Big O notation
“Big O notation is used to find the upper bound (the highest possible amount) of a function’s growth rate, meaning it works out the time of the longest route it will take to turn input into output. It is often used to compare the efficiency of different algorithms, which is done by calculating, in a worst-case scenario, how much memory is needed, and how much time it takes to complete.” [Wikipedia]
Overview of ML data-driven methods such as Artificial Neural Network (ANN), Decision Tree (DT), Generalized Density-Based Spatial Clustering of Applications with Noise (GDBSCAN), Gaussian Means (GM), Hierarchical Clustering (HC), k-Nearest Neighbors (kNN), Random Forest (RF), and Support-Vector Machines (SVM).
ML Methods Overview. Adapted: Ceren Ates et al.
The enclosed table proposes a theoretical point of view of some algorithms’ upper bounds. If you deploy your model within ensemble methods, then they multiply the complexity of your model.
Computational complexity of machine learning algorithms [3]
Efficient ML algorithms are significant for optimizing the efficiency of other ML methods such as Bagging, Boosting, Stacking, and Cascading ensembles, which require input results from numerous ML algorithms.
An example from the domain of sorting algorithms illustrates the impact of computational complexity on algorithms’ efficiency. “Efficient sorting is consequential for optimizing the efficiency of other algorithms, such as search and merge algorithms, that require input data to be in sorted lists.”
The following animations illustrate how effectively data sets from different starting points can be sorted using different algorithms.
Sorting Algorithms Animations. Toptal
Statistical Machine Learning (SML) merges statistics with the computational sciences: computer science, systems science, and optimization. SML provides mathematical tools for analyzing behavior and the generalization performance of ML algorithms. “The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables [4].”
Next, read my “Interpretability/Explainability: Understanding models’ prediction/decision reasoning/transparency: Why? — Trust; How? — “Seeing Machines Learn” article at https://www.linkedin.com/pulse/interpretabilityexplainability-understanding-models-rajwan-ms-dsc
— — — — — — — — — — — — — — — — — — — — — — — — — — — — -
[1] https://www.linkedin.com/pulse/machine-learning-101-which-ml-choose-yair-rajwan-ms-dsc
[2] https://www.linkedin.com/pulse/accuracy-bias-variance-tradeoff-yair-rajwan-ms-dsc
[3] https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms
[4] https://towardsdatascience.com/the-actual-difference-between-statistics-and-machine-learning-64b49f07ea3