<a href="https://colab.research.google.com/github/RaneemQaddoura/EvoCluster/blob/master/EvoCluster.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>EvoCluster</h1>
An Open-Source Nature-Inspired Optimization Clustering Framework in Python

EvoCluster is an open source and cross-platform framework implemented in Python which includes the most well-known and recent nature-inspired metaheuristic  optimizers  that  are  customized  to  perform  partitional  clustering tasks.  

The  goal  of  this  framework  is  to  provide  a  user-friendly  and  customizable implementation of the metaheuristic based clustering algorithms which can be utilized by experienced and non-experienced users for different applications.

The framework can also be used by researchers who can benefit from the implementation of the metaheuristic optimizers for their research studies. 

EvoCluster can be extended by designing other optimizers, including more objective func-tions, adding other evaluation measures, and using more data sets. 

The current implementation  of  the  framework  includes  ten  metaheristic  optimizers,  thirtydatasets,  five  objective  functions,  and  twelve  evaluation  measures.  

The full list of implemented optimizers is available here https://github.com/7ossam81/EvoloPy/wiki/List-of-optimizers

<h2>Features</h2>

*   Ten nature-inspired metaheuristic optimizers are implemented.
*   The implimentation uses the fast array manipulation using [NumPy] (http://www.numpy.org/).
*   Matrix support using [SciPy's] (https://www.scipy.org/) package.
*   More optimizers are comming soon

<h2>Installation</h2>

Python 3.xx is required.

<h2>GitHub</h2>

Clone the Git repository from GitHub:
git clone https://github.com/RaneemQaddoura/EvoCluster.git

In [None]:
!git clone https://github.com/RaneemQaddoura/EvoCluster.git

In [46]:
# Change working directory
import os
print(os.chdir("EvoCluster/"))


FileNotFoundError: [Errno 2] No such file or directory: 'EvoCluster/'

<h2>Install Packages</h2>

In [None]:
#Install NumPy, SciPy, sklearn, pandas, and matplotlib
!pip install -r requirements.txt

<h2>User Preferences</h2>

In [37]:
# Select optimizers
# "SSA","PSO","GA","BAT","FFA","GWO","WOA","MVO","MFO","CS"
optimizer=["SSA","PSO"]

In [38]:
# Select objective function
# "SSE","TWCV","SC","DB","DI"
objectivefunc=["SSE"] 

In [39]:
# Select data sets
#"aggregation","aniso","appendicitis","balance","banknote","blobs","Blood","circles","diagnosis_II","ecoli","flame","glass","heart","ionosphere","iris","iris2D","jain","liver","moons","mouse","pathbased","seeds","smiley","sonar","varied","vary-density","vertebral2","vertebral3","wdbc","wine"
dataset_List = ["iris","aggregation"]

In [40]:
# Select number of repetitions for each experiment. 
# To obtain meaningful statistical results, usually 30 independent runs are executed for each algorithm.
NumOfRuns=3

In [41]:
# Select general parameters for all optimizers (population size, number of iterations) ....
params = {'PopulationSize' : 30, 'Iterations' : 50}

In [42]:
#Choose whether to Export the results in different formats
export_flags = {'Export_avg':True, 'Export_details':True, 'Export_details_labels':True, 
'Export_convergence':True, 'Export_boxplot':True}

<h2>Run Framework</h2>

In [43]:
# Run EvoCluster
from optimizer import run
run(optimizer, objectivefunc, dataset_List, NumOfRuns, params, export_flags)

PermissionError: [Errno 13] Permission denied: '2021-10-28-14-19-32'

<h2>Results Files and Plots</h2>

In [8]:
#import some useful packages to view the results' files in colab
import pandas as pd
from IPython.display import Image
import os
import datetime
import ipywidgets as widgets

In [9]:
#Select the experiments folder
foldernames = [filename for filename in os.listdir() if filename.startswith(str(datetime.datetime.now().year))]
drop_folder = widgets.Dropdown(options=foldernames, description='Select folder:')
drop_folder

Dropdown(description='Select folder:', options=('2021-05-28-21-17-27', '2021-05-28-22-34-06', '2021-05-29-01-4…

In [15]:
#Get the selected folder
foldername = drop_folder.value

<h4>Average Results File</h4>

In [16]:
#Show the average results file
filename = foldername +'/experiment.csv' 
df = pd.read_csv(filename)
df.head(4)

Unnamed: 0,Dataset,Optimizer,objfname,k,ExecutionTime,SSE,Purity,Entropy,HS,CS,...,Iter41,Iter42,Iter43,Iter44,Iter45,Iter46,Iter47,Iter48,Iter49,Iter50
0,iris,SSA,SSE,3,3.95,4.72,0.78,0.33,0.67,0.78,...,4.72,4.72,4.72,4.72,4.72,4.72,4.72,4.72,4.72,4.72
1,aggregation,SSA,SSE,7,20.52,12.85,0.85,0.17,0.81,0.73,...,12.85,12.85,12.85,12.85,12.85,12.85,12.85,12.85,12.85,12.85
2,iris,SSA,TWCV,3,3.7,38.05,0.96,0.14,0.86,0.86,...,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83
3,aggregation,SSA,TWCV,7,18.33,25.45,0.91,0.09,0.9,0.81,...,11.13,11.13,11.13,11.13,11.13,11.13,11.13,11.13,11.13,11.13


<h4>Detailed Results File</h4>

In [17]:
#Show the detailed results file
filename = foldername +'/experiment_details.csv' 
df = pd.read_csv(filename)
df.head(12)

Unnamed: 0,Dataset,Optimizer,objfname,k,ExecutionTime,SSE,Purity,Entropy,HS,CS,...,Iter41,Iter42,Iter43,Iter44,Iter45,Iter46,Iter47,Iter48,Iter49,Iter50
0,iris,SSA,SSE,3,3.94,5.99,0.67,0.44,0.56,0.72,...,5.99,5.99,5.99,5.99,5.99,5.99,5.99,5.99,5.99,5.99
1,iris,SSA,SSE,3,3.95,5.2,0.74,0.39,0.61,0.76,...,5.2,5.2,5.2,5.2,5.2,5.2,5.2,5.2,5.2,5.2
2,iris,SSA,SSE,3,3.96,2.97,0.94,0.15,0.85,0.86,...,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97
3,aggregation,SSA,SSE,7,19.84,13.66,0.84,0.19,0.78,0.71,...,13.67,13.67,13.67,13.66,13.66,13.66,13.66,13.66,13.66,13.66
4,aggregation,SSA,SSE,7,19.94,11.72,0.88,0.12,0.87,0.77,...,11.72,11.72,11.72,11.72,11.72,11.72,11.72,11.72,11.72,11.72
5,aggregation,SSA,SSE,7,21.78,13.16,0.83,0.2,0.78,0.71,...,13.16,13.16,13.16,13.16,13.16,13.16,13.16,13.16,13.16,13.16
6,iris,SSA,TWCV,3,3.73,80.46,0.96,0.14,0.86,0.86,...,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83
7,iris,SSA,TWCV,3,3.76,13.17,0.96,0.14,0.86,0.86,...,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83
8,iris,SSA,TWCV,3,3.62,20.52,0.96,0.14,0.86,0.86,...,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83,2.83
9,aggregation,SSA,TWCV,7,18.31,26.51,0.91,0.08,0.9,0.81,...,11.18,11.18,11.18,11.18,11.18,11.18,11.18,11.18,11.18,11.18


<h4>Labels Results File</h4>

In [18]:
#Show the labels results file
filename = foldername +'/experiment_details_Labels.csv' 
header_names=['Dataset','Optimizer','objfname'] + ['label' + str(i) for i in range(50)]
df = pd.read_csv(filename,names=header_names,dtype=object)[1:]
df.head(12)

Unnamed: 0,Dataset,Optimizer,objfname,label0,label1,label2,label3,label4,label5,label6,...,label40,label41,label42,label43,label44,label45,label46,label47,label48,label49
1,iris,SSA,SSE,3,1,1,1,1,0,0,...,1,1,1,1,1,0,1,0,1,0
2,iris,SSA,SSE,3,2,2,2,2,2,2,...,2,2,1,2,2,2,2,2,2,2
3,iris,SSA,SSE,3,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
4,aggregation,SSA,SSE,7,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
5,aggregation,SSA,SSE,7,0,0,0,0,0,0,...,3,3,3,3,3,3,3,3,3,3
6,aggregation,SSA,SSE,7,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
7,iris,SSA,TWCV,3,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,iris,SSA,TWCV,3,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,iris,SSA,TWCV,3,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10,aggregation,SSA,TWCV,7,5,5,5,5,5,5,...,5,5,5,5,5,5,5,5,5,5


<h4>Convergence Curve Plot</h4>

In [19]:
#Select convergence curve to show
filenames = [filename for filename in os.listdir(foldername) if filename.startswith('convergence')]

drop_plot_convergence = widgets.Dropdown(options=filenames, description='Select plot:')
drop_plot_convergence

Dropdown(description='Select plot:', options=(), value=None)

In [20]:
#Show selected convergence curve
Image(foldername +'/' + drop_plot_convergence.value)

TypeError: can only concatenate str (not "NoneType") to str

<h4>Box Plot</h4>

In [22]:
#Select boxplot to show
filenames = [filename for filename in os.listdir(foldername) if filename.startswith('boxplot')]

drop_boxplot = widgets.Dropdown(options=filenames, description='Select plot:')
drop_boxplot

Dropdown(description='Select plot:', options=(), value=None)

In [23]:
#Show selected boxplot
Image(foldername +'/' + drop_boxplot.value)

TypeError: can only concatenate str (not "NoneType") to str

<h2>Citation Request</h2>

Please include these citations if you plan to use this Framework:


*   Qaddoura, Raneem, Hossam Faris, Ibrahim Aljarah, and Pedro A. Castillo. "EvoCluster: An Open-Source Nature-Inspired Optimization Clustering Framework in Python." In International Conference on the Applications of Evolutionary Computation (Part of EvoStar), pp. 20-36. Springer, Cham, 2020.
*   Ruba Abu Khurma, Ibrahim Aljarah, Ahmad Sharieh, and Seyedali Mirjalili. Evolopy-fs: An open-source nature-inspired optimization framework in python for feature selection. In Evolutionary Machine Learning Techniques, pages 131–173. Springer, 2020
*   Hossam Faris, Ibrahim Aljarah, Sayedali Mirjalili, Pedro Castillo, and J.J Merelo. "EvoloPy: An Open-source Nature-inspired Optimization Framework in Python". In Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 3: ECTA,ISBN 978-989-758-201-1, pages 171-177.
*   Raneem Qaddoura, Hossam Faris, and Ibrahim Aljarah*. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. International Journal of Machine Learning and Cybernetics, pages 1–40, 2019.
*   Ibrahim Aljarah, Majdi Mafarja, Ali Asghar Heidari, Hossam Faris, and Seyedali Mirjalili. Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowledge and Information Systems, pages 1–33, 2019.
*   Sarah Shukri, Hossam Faris, Ibrahim Aljarah*, Seyedali Mirjalili, and Ajith Abraham. Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer. Engineering Applications of Artificial Intelligence, 72:54–66, 2018.
*   Ibrahim Aljarah, Majdi Mafarja, Ali Asghar Heidari, Hossam Faris, and Seyedali Mirjalili. Multiverse optimizer: Theory, literature review, and application in a data clustering. In Nature-Inspired Optimizers: Theories, Literature Reviews and Applications, pages 123–141. Springer International Publishing, Cham, 2020






