# DECISION TREE

Decision tree model. 
Implemented because it is simple to understand. It was the first experiment with machine learning algorithms.

In [34]:
import pandas as pd
import numpy as np 
import joblib
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree

The data contains 14 channels and the last column is the state of the eye.

In [35]:
eye_data = pd.read_csv('../eeg_eye_state.arff')
eye_data

Unnamed: 0,AF3,F7,F3,FC5,T7,P7,O1,O2,P8,T8,FC6,F4,F8,AF4,eye
0,4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.90,4393.85,open
1,4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.10,open
2,4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,open
3,4328.72,4011.79,4296.41,4155.90,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,open
4,4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.90,4627.69,4210.77,4244.10,4212.82,4288.21,4632.82,4398.46,open
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14975,4281.03,3990.26,4245.64,4116.92,4333.85,4614.36,4074.87,4625.64,4203.08,4221.54,4171.28,4269.23,4593.33,4340.51,close
14976,4276.92,3991.79,4245.13,4110.77,4332.82,4615.38,4073.33,4621.54,4194.36,4217.44,4162.56,4259.49,4590.26,4333.33,close
14977,4277.44,3990.77,4246.67,4113.85,4333.33,4615.38,4072.82,4623.59,4193.33,4212.82,4160.51,4257.95,4591.79,4339.49,close
14978,4284.62,3991.79,4251.28,4122.05,4334.36,4616.41,4080.51,4628.72,4200.00,4220.00,4165.64,4267.18,4596.41,4350.77,close


Data should be standartized by removing the mean(=zero) and scaling to unit variance

In [36]:
from sklearn.preprocessing import StandardScaler

def signals_preprocessing(data, columns):      
    scaler = StandardScaler() 
    data = np.asarray(data.astype(float))
    data = scaler.fit_transform(data)
    data = pd.DataFrame(data, columns=columns) 
    
    return data

The next steps are to create a decision tree. Its accuracy score is displayed.

In [37]:
x = eye_data.drop(columns=['eye'])
x = signals_preprocessing(x,['AF3','F7','F3','FC5','T7','P7','O1','O2','P8','T8','FC6','F4','F8','AF4'])
y = eye_data['eye']

# Divide the data into training data and test data  
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# Create decision tree
model = DecisionTreeClassifier()
# Train it
model.fit(x_train, y_train)
prediction = model.predict(x_test)

# Accuracy classification score
score = accuracy_score(y_test, prediction)
score

0.8324432576769025

The decission tree for this data has an accuracy rate of approximately 84%.

This trained model is stored using the joblib library for future use.

In [38]:
# Save model to external file
joblib.dump(model, 'decision-tree-eye-state-prediction.joblib')
# Load model from this file 
# model = joblib.load('decision-tree-eye-state-prediction.joblib')

['decision-tree-eye-state-prediction.joblib']

Try prediction for some data:

In [39]:
predictions = model.predict([[4328.23, 4009.23, 4289.23, 4148.21, 4320.26, 4586.15, 4096.92, 4641.03, 4252.05, 4238.46, 4211.28, 4280.51, 4635.9, 4393.85]])
predictions

array(['open'], dtype=object)

This create 'eye-state-decision-tree.dot' file with graph data. This can be shown with Visual Studio Code and its extension for Graphviz (however, this decision tree is too large to show in it).

In [40]:
tree.export_graphviz( model, 
    out_file='eye-state-decision-tree.dot', 
    feature_names=['AF3','F7','F3','FC5','T7','P7','O1','O2','P8','T8','FC6','F4','F8','AF4'], 
    class_names=sorted(y.unique()), 
    label='all', 
    rounded=True, 
    filled=True)