# 3. Synthetic experiments with Symbolic Pursuit

In this notebook, we shall reproduce one of the experiments from Section 6.1 of the paper.
The idea is to start with a linear pseudo black-box for which the importance vector is known unambiguously and see which interpretability methods identifies this vector the most precisely. Let us start by the useful imports.


In [1]:
from symbolic_pursuit.models import SymbolicRegressor  # our symbolic model class
from sklearn.metrics import mean_squared_error # we are going to assess the quality of the model based on the generalization MSE
from sympy import init_printing # We use sympy to display mathematical expresssions 
import numpy as np # we use numpy to deal with arrays
import lime 
import lime.lime_tabular
init_printing()

We now define a linear pseudo black-box $f$ defined on a 3 dimensional feature space.

$$ f(x_1,x_2,x_3)= x_1 + 2 \cdot x_2 + 3 \cdot x_3$$ 

The importance vector associated to this model is trivially given by $\beta = (1,2,3)$ In this case, we shall keep it unnormalised, unlike in the main paper as we deal with few examples. Let us translate this in Python. 

In [2]:
def f(X):
    return X[:, 0]+2*X[:,1]+3*X[:,2]

dim_X = 3

Now draw uniformly 100 test points  that we will feed to a *LIME* explainer <cite data-cite="2480681/WCEBQ7N9"></cite> and to train a Symbolic model.

In [3]:
n_pts = 100
X = np.random.uniform(0, 1, (n_pts, dim_X))

Now we draw 10 test ponits $x_{test} \equiv U([0,1]^3)$ that we are going to use in order to evaluate the perfomances of both explainers on unseen data.

In [4]:
n_test = 10
X_test = np.random.uniform(0, 1, (n_test, dim_X))

Since LIME produces importance vectors with entries in the form $(feature \ domain , importance)$ for each feature appearing in decreasing order of importance, we implement a function which identifies the feature from the first entry of the tuple and who sorts the importances in the form $(importance(x_1), importance(x_2), importance(x_3))$.

In [5]:
def order_weights(exp_list):
    ordered_weights = [0 for _ in range(dim_X)]
    for tup in exp_list:
        feature_id = int(tup[0].split('x_')[1][0])
        ordered_weights[feature_id-1] = tup[1]    
    return ordered_weights    

We are now ready to extract the feature importance for our 10 test points as predicted by the LIME explainer :

In [6]:
lime_weight_list = []
explainer = lime.lime_tabular.LimeTabularExplainer(X, 
                                                   feature_names=["x_"+str(k) for k in range(1,dim_X+1)], 
                                                   class_names=['f'], 
                                                   verbose=True,
                                                   mode='regression')

for i in range(n_test):
    exp = explainer.explain_instance(X_test[i], f, num_features=dim_X)
    lime_weight_list.append(order_weights(exp.as_list()))  
                            
print(lime_weight_list)    

Intercept 3.386273715800808
Prediction_local [2.32476258]
Right: 2.3711302504096565
Intercept 3.818870316655268
Prediction_local [1.09391178]
Right: 1.3370154443345572
Intercept 3.4778543326370928
Prediction_local [2.0747781]
Right: 1.500451586763697
Intercept 3.102545862568385
Prediction_local [3.21653876]
Right: 3.2635243572291324
Intercept 3.096609492223863
Prediction_local [3.27596983]
Right: 3.1166072960471225
Intercept 2.883947537132836
Prediction_local [3.9040365]
Right: 4.323964016519295
Intercept 2.8748723576455206
Prediction_local [3.82875395]
Right: 3.6301947889118558
Intercept 2.7850041334448656
Prediction_local [4.15189643]
Right: 4.421333902840418
Intercept 2.6997517673896967
Prediction_local [4.39485284]
Right: 4.329928624641883
Intercept 3.7323180952250192
Prediction_local [1.33898044]
Right: 1.3248403686712038
[[-0.4972218108046295, 0.9741903931179863, -1.5384797197686464], [-0.19503610213758726, -0.989548961836806, -1.5403734774107953], [-0.14650378788933416, 0.324203

As we can see from the last output, which is the list of predicted importance vectors, LIME seems to produce a big variety of importance vectors. This is suprising for a global linear model. We also note that the relative importance seem inconsistent with the true importance vector $\beta$ defined above. Let us now train a Symbolic model for $f$ based on our training set.

In [7]:
symbolic_model = SymbolicRegressor()
symbolic_model.fit(f, X)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now working on term number  1 .
Now working on hyperparameter tree number  1 .
         Current function value: 1.181178
         Iterations: 29
         Function evaluations: 1086
         Gradient evaluations: 154
Now working on hyperparameter tree number  2 .
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 82
         Function evaluations: 1701
         Gradient evaluations: 189
The algorithm stopped because the desired precision was achieved.
The tree number  2  was selected as the best.
Backfitting complete.
The current model has the following expression:  6.48072783395268*[ReLU(P1)]**1.0000840052335*hyper((5.04766488358399e-5,), (1.46967073623424, 1.46967073623424), 1.0/[ReLU(P1)])
The current value of the loss is:  9.804671340834355e-10 .
------------------------------------------------------------------------------------------

We now ask our symbolic model to predict the importance vectors for each test point.

In [8]:
symbolic_weight_list = [] 
for k in range(n_test):
    symbolic_weight_list.append(symbolic_model.get_feature_importance(X_test[k]))
    

In [9]:
print(symbolic_weight_list)

[[0.999972339093126, 1.99996360729793, 2.99995165206373], [0.999866544038656, 1.99975201412831, 2.99963426112001], [0.999895578667953, 1.99981008422689, 2.99972136659427], [1.00000913478169, 2.00003719973957, 3.00006204113983], [1.00000424913202, 2.00002742829888, 3.00004738392387], [1.00003710448909, 2.00009313996353, 3.00014595179020], [1.00002005292583, 2.00005903634370, 3.00009479616877], [1.00003921407823, 2.00009735920285, 3.00015228067289], [1.00003723531082, 2.00009340161078, 3.00014634426255], [0.999863963806811, 1.99974685358998, 2.99962652028350]]


As we can see, our results appear to be always consistent and very close to the true importance vector $\beta$.

## References<div class="cite2c-biblio"></div>

<div class="cite2c-biblio"></div>