EEG Turkish Sentence Decoding using Deep Learning project

Dataset link: https://www.kaggle.com/datasets/mehmetbayin/turkish-sentence-eeg-dataset

    - Reading Demonstration Set: 15-second 14-channel EEG signals recorded from 20 volunteers
    
    - Reading Listening Set: 15-second 14-channel EEG signals recorded from 20 volunteers
    
    - EMOTIV EPOC+ mobile system 
        - collected 14-channel EEG signals from 16 scalp zones: AF3 (1), F7 (2), F3 (3), FC5 (4), T7 (5), P7 (6), O1 (7), O2 (8), P8 - (9), T8 (10), FC6 (11), F4 (12), F8 (13), AF4 (14), P3 (reference zone), and P4 (reference zone)
        - sampling rate 128 Hz, bandwidth 0.16-43 Hz
        - dataset contains 1600 observations and 1600 labels
        - .mat file

Methodology:
    - may require transfer learning in a multi classifcation setting (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400490/)
    - RNN architecture (LSTM) are very good for EEG data as they are designed to process sequences of data, making them suitable for time-series EEG data. They also capture long-term dependencies in the EEG signals

In [1]:
from scipy.io import loadmat
import numpy as np
import pandas as pd
#loading eeg data as a .mat file
eeg_data = loadmat('data/TurkishSentenceEEGData.mat')


In [2]:
eeg_data

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jul 11 23:18:14 2022',
 '__version__': '1.0',
 '__globals__': [],
 'Labels': array([[ 1],
        [ 1],
        [ 1],
        ...,
        [20],
        [20],
        [20]], dtype=uint8),
 'Observations': array([[array([[4387.307617, 4396.410156, 4389.743652, ..., 4395.128418,
                 4393.333496, 4395.769043],
                [4378.589844, 4384.358887, 4382.05127 , ..., 4399.358887,
                 4391.538574, 4393.077148],
                [4398.077148, 4400.512695, 4396.538574, ..., 4399.615234,
                 4397.179688, 4399.615234],
                ...,
                [4539.102539, 4542.692383, 4540.641113, ..., 4548.077148,
                 4547.05127 , 4550.897461],
                [4396.282227, 4401.025879, 4398.974121, ..., 4427.436035,
                 4427.820313, 4427.179688],
                [4445.128418, 4445.897461, 4445.769043, ..., 4460.641113,
                 4456.410156, 4457.563

In [3]:
len(eeg_data)

5

In [4]:
eeg_data.keys()

dict_keys(['__header__', '__version__', '__globals__', 'Labels', 'Observations'])

In [5]:
eeg_data.values()

dict_values([b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jul 11 23:18:14 2022', '1.0', [], array([[ 1],
       [ 1],
       [ 1],
       ...,
       [20],
       [20],
       [20]], dtype=uint8), array([[array([[4387.307617, 4396.410156, 4389.743652, ..., 4395.128418,
                4393.333496, 4395.769043],
               [4378.589844, 4384.358887, 4382.05127 , ..., 4399.358887,
                4391.538574, 4393.077148],
               [4398.077148, 4400.512695, 4396.538574, ..., 4399.615234,
                4397.179688, 4399.615234],
               ...,
               [4539.102539, 4542.692383, 4540.641113, ..., 4548.077148,
                4547.05127 , 4550.897461],
               [4396.282227, 4401.025879, 4398.974121, ..., 4427.436035,
                4427.820313, 4427.179688],
               [4445.128418, 4445.897461, 4445.769043, ..., 4460.641113,
                4456.410156, 4457.563965]])                             ],
       [array([[4426.538574, 4424.358887, 

In [6]:
#how many values are in each label
labels_array = eeg_data['Labels']
unique_values, counts = np.unique(labels_array, return_counts=True)
value_counts_dict = dict(zip(unique_values, counts))
value_counts_dict

{1: 80,
 2: 80,
 3: 80,
 4: 80,
 5: 80,
 6: 82,
 7: 80,
 8: 79,
 9: 80,
 10: 80,
 11: 79,
 12: 80,
 13: 80,
 14: 80,
 15: 80,
 16: 80,
 17: 80,
 18: 80,
 19: 81,
 20: 79}

In [7]:
# Inspect the keys and structure of the loaded data
observations = eeg_data['Observations']
labels = eeg_data['Labels'].ravel()

In [8]:
demonstration_data = observations[:800]
listening_data = observations[800:]
demonstration_labels = labels[:800]
listening_labels = labels[800:]


In [10]:
# Convert the 2D arrays in demonstration_data and listening_data to lists
demonstration_data_list = [obs.tolist() for obs in demonstration_data]
listening_data_list = [obs.tolist() for obs in listening_data]

In [11]:
# Convert to Pandas DataFrame
demo_df = pd.DataFrame({'EEG_Data': demonstration_data_list, 'Label': demonstration_labels})
listen_df = pd.DataFrame({'EEG_Data': listening_data_list, 'Label': listening_labels})

In [12]:
demo_df


Unnamed: 0,EEG_Data,Label
0,"[[[4387.307617, 4396.410156, 4389.743652, 4378...",1
1,"[[[4426.538574, 4424.358887, 4433.589844, 4440...",1
2,"[[[4368.717773, 4364.102539, 4365.128418, 4366...",1
3,"[[[4395.256348, 4390.0, 4387.692383, 4390.3847...",1
4,"[[[4389.102539, 4398.077148, 4395.128418, 4394...",1
...,...,...
795,"[[[4351.153809, 4335.512695, 4334.871582, 4349...",10
796,"[[[4285.384766, 4278.077148, 4272.436035, 4271...",10
797,"[[[4349.871582, 4345.641113, 4328.333496, 4328...",10
798,"[[[4351.025879, 4361.282227, 4354.487305, 4347...",10


In [13]:
listen_df

Unnamed: 0,EEG_Data,Label
0,"[[[4344.743652, 4346.410156, 4336.794922, 4325...",10
1,"[[[4315.512695, 4308.205078, 4309.358887, 4314...",11
2,"[[[4341.794922, 4335.128418, 4324.358887, 4333...",11
3,"[[[4289.487305, 4295.769043, 4301.538574, 4298...",11
4,"[[[4466.794922, 4470.641113, 4477.179688, 4477...",11
...,...,...
795,"[[[4410.0, 4408.846191, 4403.589844, 4392.1796...",20
796,"[[[4366.794922, 4362.179688, 4349.615234, 4352...",20
797,"[[[4356.922852, 4356.153809, 4357.307617, 4356...",20
798,"[[[4346.282227, 4352.179688, 4356.153809, 4354...",20
