You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -23,7 +23,7 @@ \subsection{Software package for feature computation}
23
23
Nevertheless, realistically, neither algorithms would be used for long time series.
24
24
25
25
Several \texttt{PyEEG} functions were also found to be inconsistent with mathematical
26
-
definitions (see \pr{} documentation, appendix).
26
+
definitions and corrected in the new \pr{} package (see \pr{} documentation, appendix).
27
27
This unfortunately appears to be a common issue for academic software.
28
28
The general status of the peer-review process and the reproducibility of programs and algorithms have
29
29
recently drawn attention (see \cite{morin_shining_2012,crick_can_2014} for
@@ -36,7 +36,7 @@ \subsection{Exhaustive and time-aware feature extraction}
36
36
but also on all wavelet frequency sub-bands.
37
37
Then, new variables were created to account for temporal consistency of vigilance state episodes.
38
38
39
-
Discrete wavelet decomposition is an extremely fast an accurate algorithm to filter a periodic
39
+
Discrete wavelet decomposition is an extremely fast and accurate algorithm to filter a periodic
40
40
signal into complementary and exclusive frequency sub-bands (fig.~\ref{fig:dwd}).
41
41
\c{S}en et al.\cite{sen_comparative_2014} obtained very promising results by
42
42
computing a large number of features on the raw \gls{eeg} signal and a limited subset of features (\ie{} mean power and absolute values) in some wavelet coefficients.
@@ -46,39 +46,43 @@ \subsection{Exhaustive and time-aware feature extraction}
46
46
47
47
48
48
49
-
Many authors have modelled time series of epochs as if each epoch was statistically independent from each other.
49
+
Many authors have modelled time series of epochs as if each epoch was statistically independent of each other.
50
50
This assumption makes it straightforward to use classical machine learning techniques such as
random forests\cite{breiman_random_2001} and others.
53
-
They have the advantage coping very well with non-linearity, can handle a large number of predictors and have many optimised implementations.
53
+
They have the advantage of coping very well with non-linearity, can handle a large number of predictors and have many optimised implementations.
54
54
55
55
However, working with this assumption generally does not allow to account for temporal consistency of vigilance states.
56
56
Indeed, prior knowledge of, for instance, the state transition probabilities cannot be modelled.
57
57
Manual scorers use contextual information to make decisions.
58
-
For example, if a given epoch has ambiguous features between \gls{rem} and awake,
59
-
it is likely to be classified as awake given surrounding epochs are, less ambiguously, awake.
58
+
For example, if a given epoch has ambiguous features between "\gls{rem}" and "awake",
59
+
it is likely to be classified as "awake" given surrounding epochs are, less ambiguously, "awake".
60
60
For this reason, explicit temporal modelling, using, for instance, Hidden Markov Models has been investigated\cite{doroshenkov_classification_2007,pan_transition-constrained_2012}.
61
61
62
62
In order to benefit from the classical machine learning
63
63
framework whist including temporal information,
64
-
it is possible to create, new variables, accounting for the temporal
64
+
it is possible to create new variables to account for the temporal
65
65
variation\cite{dietterich_machine_2002}.
66
-
This study demonstrated that addition of temporal context significantly improved predictive accuracy (fig.\ref{fig:temporal_integration}).
67
-
The convolution approach (eq.\ref{eq:window}) appeared to provide better results.
68
-
Instead of averaging feature after calculation, it may be advantageous to compute features over epochs of different length in a first place.
69
-
Thus, the accuracy of local of non additive features, such as median, will be improved. In addition to local mean of feature, other interval variables, such as local
70
-
slope and local variance of each feature may improve
Instead of averaging features after calculation, it may be advantageous to compute features over epochs of different lengths in the first place.
71
+
Thus, the accuracy of local of non-additive features, such as median, would be improved. In addition to the local mean of features, other variables, such as local
72
+
slope and local variance of each feature, may improve
%DID YOU INCLUDE THAT IN YOUR ALGORITHM, THEN REFER TO YOUR RESULTS, OR PHRASE IT AS AN OUTLOOK
72
76
73
77
Although addition of time-dependent variables improved accuracy over a time-unaware model, their use can be seen as controversial.
74
78
Indeed, including prior information about sleep structure will cause problems if the aim is to find differences in sleep structure.
75
79
As an example, let us consider a training set only made of healthy adult wild type animals,
76
-
and let us assume that \gls{nrem} episodes are always at least, 5min long.
80
+
and let us assume that \gls{nrem} episodes are always at least 5min long.
77
81
Implicitly, this information becomes a prior. That is, the implicit definition of \gls{nrem} is that it
78
82
is uninterrupted.
79
-
The same classifier is not expected to perform well if used on an animal which, for instance, show frequent interruption of\gls{nrem} sleep by short awake episodes.
83
+
The same classifier is not expected to perform well if used on an animal which, for instance, shows frequent interruptions of \gls{nrem} sleep by short awake episodes.
80
84
Indeed, a `time-aware' model will need much more evidence to classify correctly a very short waking episode inside sleep (because this never occurred in the training set).
81
-
Therefore, predictive accuracy alone should not be the ultimate end-goal.
85
+
Therefore, predictive accuracy alone should not be the exclusive goal.
82
86
Models which can perform well without including too much temporal information ought to be preferred in so far as
83
87
they are more likely to be generalisable.
84
88
@@ -87,12 +91,12 @@ \subsection{Exhaustive and time-aware feature extraction}
87
91
\subsection{Random forest classification}
88
92
89
93
In this study, random forest classifiers\cite{breiman_random_2001} were exclusively used.
90
-
In addition to their capacity to model non-linearity, they are very efficient at handling very large number of variables.
91
-
Recently very promising classification of sleep stages in human were generated
94
+
In addition to their capacity to model non-linearity, they are very efficient at handling a very large number of variables.
95
+
Recently, very promising classifications of sleep stages in humans were generated
92
96
using this algorithm\cite{sen_comparative_2014}.
93
97
A very interesting feature of random forest is their
94
98
natural ability to generate relative values of importance for the different predictors.
95
-
These values quantifies how much each variables contributes to the predictive power of the model.
99
+
These values quantify how much each variable contributes to the predictive power of the model.
96
100
This feature is extremely useful because it allows using random forests for variable selection.
97
101
This can be used to reduce dimensionality of the variable space without losing predictive power (fig.\ref{fig:variable_elimination}),
98
102
but also to study conditional variable importance\cite{strobl_conditional_2008}, or, for instance,
the features (and labels) at a given time are very correlated with surrounding features.
117
+
the features (and labels) at a given time are highly correlated with surrounding features.
114
118
Therefore, if random sampling of even 50\% of all epochs, from all time series, was performed,
115
119
most points in the training set will have a direct neighbour in the testing set.
116
-
This almost corresponds to an artificial duplication of a dataset before cross-validation and is likely to fail to detect overfitting.
120
+
This corresponds to an artificial duplication of a dataset before cross-validation and is likely to fail to detect overfitting.
117
121
In the preliminary steps of this study, it was observed that almost perfect accuracy could be achieved when performing naive cross-validation (data not shown).
118
-
Supporting further this idea, such surprisingly high accuracy was not observed when training the model
122
+
Further supporting this idea, such surprisingly high accuracy was not observed when training the model
119
123
with all the even hours (from start of the experiment) and testing it with all the odd ones.
120
-
There are several way to reduce overfitting including limiting the maximal number of splits when growing classification trees, or pruning trees.
121
-
However, it never possible to unsure a model will not overfit \emph{a priori}.
122
-
Thus it remain necessary to assess the model fairly.
124
+
There are several ways to reduce overfitting, including limiting the maximal number of splits when growing classification trees, or pruning trees.
125
+
However, it impossible to unsure that a model will not overfit \emph{a priori}.
126
+
Thus, it remains necessary to assess the model fairly.
123
127
In this study, systematic stratified cross-validation was
124
128
performed \cite{ding_querying_2008}.
125
129
As a result, all predictions made on any 24h time series are generated by models
126
-
that did not use any point originating from this same time series. This precaution simulate the the behaviour of the predictor with new recordings.
127
-
Cross-validation was not only used to generate overall value of accuracy, but also, to further assess differences in sleep patterns (fig. \ref{fig:struct_assess}).
130
+
that did not use any point originating from this same time series. This precaution simulates the behaviour of the predictor with new recordings.
131
+
Cross-validation was not only used to generate overall value of accuracy, but also to further assess differences in sleep patterns (fig. \ref{fig:struct_assess}).
128
132
129
133
\subsection{Quality of the raw data}
130
134
131
-
Vigilance states can be viewed as discrete representation of a phenomena that is, in fact, continuous.
135
+
Vigilance states can be viewed as discrete representations of a phenomenon that is, in fact, continuous.
132
136
In this case, the borders between different states are, by nature, fuzzy and somewhat arbitrary.
133
137
Therefore, ground truth data cannot be assumed to be be entirely correctly labelled.
134
-
In particular, transitions between states will be intricately inaccurate.
135
-
The assessment of prediction doubt (fig.~\ref{fig:error}, fourth row) illustrate the high uncertainty inherent to transitions.
138
+
In particular, transitions between states could be intricately inaccurate.
139
+
The assessment of prediction doubt (fig.~\ref{fig:error}, fourth row) illustrates the high uncertainty inherent to transitions.
136
140
137
-
The ground truth labels used in this study has been generated by a twopass semi-automatic method.
138
-
In a first place, an automatic annotation is performed based on a human-defined variable threshold.
139
-
Then, the expert visually inspect the result and correct ambiguities.
141
+
The ground truth labels used in this study have been generated by a two-pass semi-automatic method.
142
+
In the first place, an automatic annotation is performed based on a human-defined variable threshold.
143
+
Then, the expert visually inspects the result and corrects ambiguities.
140
144
The first pass was originally designed to combine, through logical rules, four
141
145
epochs of five seconds to produce 20s
142
146
epochs\cite{costa-miserachs_automated_2003}.
@@ -146,34 +150,36 @@ \subsection{Quality of the raw data}
146
150
147
151
Several studies have used ground-truth data that was manually scored independently by several experts,
148
152
which often appear to show good mutual agreement.
149
-
This seem extremely important for several reasons.
150
-
First of all, it permits to compare inter-human error to the automatic classifier error.
151
-
Then, it allow to allocate a value of confidence to each annotation.
153
+
This seems extremely important for several reasons.
154
+
First of all, it permits the comparison of inter-human error to the automatic classifier error.
155
+
Then, it allows to allocate a value of confidence to each annotation.
152
156
For instance, if, for a given epoch, there is strong disagreement between experts, the confidence will be low.
153
157
When training a model, this uncertainty can be included, for instance, as a weight.
154
158
155
159
\subsection{Overall results}
156
160
The predictions of the classifier presented in this research agreed with ground truth for 92\% of epochs (table~\ref{tab:confus}).
157
-
Although the limitation of the ground truth annotation makes it is difficult to
161
+
Although the limitation of the ground truth annotation makes it difficult to
158
162
put this result into perspective, this score is very promising.
159
-
In addition, prediction did not result in significant difference in prevalences.
163
+
In addition, prediction did not result in significant differences in prevalences.
160
164
However, there were, on average, much less \gls{rem} episodes in the predicted time series.
161
-
The duration of \gls{rem} episodes was also over-estimated by prediction (though this is only marginally significant).
162
-
Altogether, this indicates that \gls{rem} state is less fragmented in the predicted data.
165
+
The duration of \gls{rem} episodes was also over-estimated by prediction (although this result is only marginally significant).
166
+
Altogether, these findings indicate that \gls{rem} state is less fragmented in the predicted data.
163
167
In contrast, the awake state was more fragmented in the predicted time series.
164
168
Although statistically significant, these differences in variables characterising sleep structure are never greater than twofold.
165
169
166
-
It would be very interesting to investigate further the extent to which such classifier could be used to detect alteration
170
+
It would be very interesting to investigate further the extent to which such classifiers could be used to detect alterations
167
171
in the structure of sleep.
168
-
One way could be analyse the sleep structure of two groups of animals for which differences were already found, and quantify how much more, or less,
172
+
One way could be to analyse the sleep structure of two groups of animals for which differences were already found, and quantify how much more, or less,
169
173
difference is found using automatic scoring.
170
174
171
175
172
176
\section*{Conclusion}
173
177
174
178
The aim of the study herein was to build a classifier that could accurately predict vigilance states from \gls{eeg} and \gls{emg} data
175
179
and serve as a basis for an efficient and flexible software implementation.
176
-
In a first place, \pr{}, a new python package was designed to efficiently extract a large number of features from electrophysiological recordings.
180
+
In the first place, \pr{}, a new python package was designed to efficiently extract a large number of features from electrophysiological recordings.
0 commit comments