The algorithms in this library compute explanations that resemble those of methods based on computing Shapley values (e.g. [shap](https://github.com/shap/shap)). Together with any prediction, also the *relevance* of the features in making the prediction is computed. The relevance is not computed just with respect to single features, but rather with respect to *sets of features*, as the interaction of different features could be a stronger indicator for some prediction tasks. For instance, if we want to flag suspicious hotels in a city, a hotel in a good location is not necessarily suspicious, and also a low-price hotel is not necessarily suspicious. However, the combination of the two things, a hotel in a good location and with low prices, might rise our suspiciousness.

Roughly speaking, these algorithms generate agents, and each agent looks at the data from a different perspective: each agent observes only a given set of features, and uses it to create a hierarchical categorization of the data (a formal ontology, using FCA), which they use to make a prediction. After all agents have made their predictions, their scores are suitably combined into a final prediction, and their contribution in this deliberation corresponds to the relevance of the feature set they are observing.

To see an example, let's first install the library. To do that it is sufficient to run

In [1]:
!pip install mlconcepts

Collecting mlconcepts
  Downloading mlconcepts-0.0.1a5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m256.8/256.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mlconcepts
Successfully installed mlconcepts-0.0.1a5


Now consider the following dataset containing data about some mushrooms.

In [2]:
import pandas
df = pandas.DataFrame({
    "feat1" : [5.3, 3.1, 3.2, 1.2, 3.2, 9.0, 3.6, 5.6],
    "feat2" : [3.2, 8.7, 8.8, 5.3, 1.2, 8.9, 1.7, 6.9],
    "feat3" : [2.4, 4.2, 5.2, 8.8, 1.0, 2.0, 1.4, 2.3],
    "color" : ["red", "green", "green", "red", "red", "green", "green", "red"],
    "poisonous" : ["no", "yes", "yes", "no", "no", "yes", "no", "no"]
})
df

Unnamed: 0,feat1,feat2,feat3,color,poisonous
0,5.3,3.2,2.4,red,no
1,3.1,8.7,4.2,green,yes
2,3.2,8.8,5.2,green,yes
3,1.2,5.3,8.8,red,no
4,3.2,1.2,1.0,red,no
5,9.0,8.9,2.0,green,yes
6,3.6,1.7,1.4,green,no
7,5.6,6.9,2.3,red,no


A unsupervised model for outlier detection can be simply trained as follows:

In [4]:
import mlconcepts
model = mlconcepts.SODModel(n=4)
model.fit(df, labels="poisonous")

Let's say that we want to predict whether the following mushrooms are poisonous:

In [5]:
testset = pandas.DataFrame({
    "feat1" : [5.4, 2.2],
    "feat2" : [2.2, 8.75],
    "feat3" : [1.7, 6.3],
    "color" : ["red", "green"],
    "poisonous" : ["no", "yes"]
})
testset

Unnamed: 0,feat1,feat2,feat3,color,poisonous
0,5.4,2.2,1.7,red,no
1,2.2,8.75,6.3,green,yes


The model can produce predictions with attached explanations as follows:

In [7]:
explanations = model.predict_explain(testset, labels="poisonous")
print(explanations[1])
print(explanations[0])

Prediction: 0.824282003767305. Explainers: { { feat3 } : 1.784811525525969, { feat1, color } : 1.6528690194238378, { feat2, feat3 } : 1.5222287112378285 }
Prediction: 0.2138955648232009. Explainers: { { feat1, feat2 } : 1.2089614242250488, full : 0.986947081552558, { feat1, color } : 0.03027335209014317 }


This output lists the three most important feature sets in making a *positive* prediction, i.e., in saying that something *is* an outlier. Keep in mind that every run of the algorithm might produce different results (depending on the starting weights of the model); hence, if you re-run this notebook, the following paragraph might not make much sense.

For the second element (which is an outlier), `feat3` is the strongest predictor: indeed, this feature is usually either very high or very low for inliers, while it has a value in-between for this entry. Also, the combination of `feat1` and `color` is a strong predictor here. Indeed, green mushrooms with low values of `feat1` (lower than 3.6) were outliers in the training set.