Skip to content

Commit 2982d6b

Browse files
committed
Update discussion.tex
1 parent b90e764 commit 2982d6b

File tree

1 file changed

+53
-47
lines changed

1 file changed

+53
-47
lines changed

report/discussion.tex

+53-47
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ \subsection{Software package for feature computation}
55
In order to train statistical learning methods to classify vigilance states,
66
it was necessary to compute an exhaustive set of features for all consecutive five second epochs
77
over long (24h) time series.
8-
For this purpose, \pr{}, a new \py{} package was developed based on
8+
For this purpose, \pr{}, a new \py{} package, was developed based on
99
\pyeeg\cite{bao_pyeeg:_2011}, which already implements several
1010
algorithms often used to study \gls{eeg}.
1111
Very significant improvements in performance were achieved for almost all functions implemented in \texttt{PyEEG}
1212
(table~\ref{tab:benchmark}). These improvements will considerably speed-up prototyping of feature extraction
1313
and may be essential in order to build real time classifiers.
1414
In addition, such modifications will make it possible to compute features for a large number
1515
of recordings in reasonable time.
16-
Further improvements are possible, for instance,
16+
Further improvements are possible: for instance,
1717
sample entropy was tentatively implemented in Julia programming language\cite{bezanson_julia:_2012}
1818
and performed 25 times faster than \pr{}'s implementation\footnote{Implementation available at
1919
\href{https://github.com/qgeissmann/Physiology.jl/blob/master/src/univariate.jl}{https://github.com/qgeissmann/Physiology.jl/blob/master/src/univariate.jl}.}
@@ -23,7 +23,7 @@ \subsection{Software package for feature computation}
2323
Nevertheless, realistically, neither algorithms would be used for long time series.
2424

2525
Several \texttt{PyEEG} functions were also found to be inconsistent with mathematical
26-
definitions (see \pr{} documentation, appendix).
26+
definitions and corrected in the new \pr{} package (see \pr{} documentation, appendix).
2727
This unfortunately appears to be a common issue for academic software.
2828
The general status of the peer-review process and the reproducibility of programs and algorithms have
2929
recently drawn attention (see \cite{morin_shining_2012,crick_can_2014} for
@@ -36,7 +36,7 @@ \subsection{Exhaustive and time-aware feature extraction}
3636
but also on all wavelet frequency sub-bands.
3737
Then, new variables were created to account for temporal consistency of vigilance state episodes.
3838

39-
Discrete wavelet decomposition is an extremely fast an accurate algorithm to filter a periodic
39+
Discrete wavelet decomposition is an extremely fast and accurate algorithm to filter a periodic
4040
signal into complementary and exclusive frequency sub-bands (fig.~\ref{fig:dwd}).
4141
\c{S}en et al.\cite{sen_comparative_2014} obtained very promising results by
4242
computing a large number of features on the raw \gls{eeg} signal and a limited subset of features (\ie{} mean power and absolute values) in some wavelet coefficients.
@@ -46,39 +46,43 @@ \subsection{Exhaustive and time-aware feature extraction}
4646

4747

4848

49-
Many authors have modelled time series of epochs as if each epoch was statistically independent from each other.
49+
Many authors have modelled time series of epochs as if each epoch was statistically independent of each other.
5050
This assumption makes it straightforward to use classical machine learning techniques such as
5151
\glspl{ann}, \glspl{svm}\cite{crisler_sleep-stage_2008},
5252
random forests\cite{breiman_random_2001} and others.
53-
They have the advantage coping very well with non-linearity, can handle a large number of predictors and have many optimised implementations.
53+
They have the advantage of coping very well with non-linearity, can handle a large number of predictors and have many optimised implementations.
5454

5555
However, working with this assumption generally does not allow to account for temporal consistency of vigilance states.
5656
Indeed, prior knowledge of, for instance, the state transition probabilities cannot be modelled.
5757
Manual scorers use contextual information to make decisions.
58-
For example, if a given epoch has ambiguous features between \gls{rem} and awake,
59-
it is likely to be classified as awake given surrounding epochs are, less ambiguously, awake.
58+
For example, if a given epoch has ambiguous features between "\gls{rem}" and "awake",
59+
it is likely to be classified as "awake" given surrounding epochs are, less ambiguously, "awake".
6060
For this reason, explicit temporal modelling, using, for instance, Hidden Markov Models has been investigated\cite{doroshenkov_classification_2007,pan_transition-constrained_2012}.
6161

6262
In order to benefit from the classical machine learning
6363
framework whist including temporal information,
64-
it is possible to create, new variables, accounting for the temporal
64+
it is possible to create new variables to account for the temporal
6565
variation\cite{dietterich_machine_2002}.
66-
This study demonstrated that addition of temporal context significantly improved predictive accuracy (fig.\ref{fig:temporal_integration}).
67-
The convolution approach (eq.\ref{eq:window}) appeared to provide better results.
68-
Instead of averaging feature after calculation, it may be advantageous to compute features over epochs of different length in a first place.
69-
Thus, the accuracy of local of non additive features, such as median, will be improved. In addition to local mean of feature, other interval variables, such as local
70-
slope and local variance of each feature may improve
71-
classification\cite{rodriguez_support_2005,deng_time_2013}.
66+
This study demonstrated that the addition of temporal context significantly improved predictive accuracy (fig.\ref{fig:temporal_integration}).
67+
The convolution approach (eq.\ref{eq:window}) provided better results.
68+
%_____________________________________________________________
69+
%[REFER TO RESULTS FOR THIS CLAIM].
70+
Instead of averaging features after calculation, it may be advantageous to compute features over epochs of different lengths in the first place.
71+
Thus, the accuracy of local of non-additive features, such as median, would be improved. In addition to the local mean of features, other variables, such as local
72+
slope and local variance of each feature, may improve
73+
classification\cite{deng_time_2013}.
74+
%______________________________________________________________
75+
%DID YOU INCLUDE THAT IN YOUR ALGORITHM, THEN REFER TO YOUR RESULTS, OR PHRASE IT AS AN OUTLOOK
7276

7377
Although addition of time-dependent variables improved accuracy over a time-unaware model, their use can be seen as controversial.
7478
Indeed, including prior information about sleep structure will cause problems if the aim is to find differences in sleep structure.
7579
As an example, let us consider a training set only made of healthy adult wild type animals,
76-
and let us assume that \gls{nrem} episodes are always at least, 5min long.
80+
and let us assume that \gls{nrem} episodes are always at least 5min long.
7781
Implicitly, this information becomes a prior. That is, the implicit definition of \gls{nrem} is that it
7882
is uninterrupted.
79-
The same classifier is not expected to perform well if used on an animal which, for instance, show frequent interruption of \gls{nrem} sleep by short awake episodes.
83+
The same classifier is not expected to perform well if used on an animal which, for instance, shows frequent interruptions of \gls{nrem} sleep by short awake episodes.
8084
Indeed, a `time-aware' model will need much more evidence to classify correctly a very short waking episode inside sleep (because this never occurred in the training set).
81-
Therefore, predictive accuracy alone should not be the ultimate end-goal.
85+
Therefore, predictive accuracy alone should not be the exclusive goal.
8286
Models which can perform well without including too much temporal information ought to be preferred in so far as
8387
they are more likely to be generalisable.
8488

@@ -87,12 +91,12 @@ \subsection{Exhaustive and time-aware feature extraction}
8791
\subsection{Random forest classification}
8892

8993
In this study, random forest classifiers\cite{breiman_random_2001} were exclusively used.
90-
In addition to their capacity to model non-linearity, they are very efficient at handling very large number of variables.
91-
Recently very promising classification of sleep stages in human were generated
94+
In addition to their capacity to model non-linearity, they are very efficient at handling a very large number of variables.
95+
Recently, very promising classifications of sleep stages in humans were generated
9296
using this algorithm\cite{sen_comparative_2014}.
9397
A very interesting feature of random forest is their
9498
natural ability to generate relative values of importance for the different predictors.
95-
These values quantifies how much each variables contributes to the predictive power of the model.
99+
These values quantify how much each variable contributes to the predictive power of the model.
96100
This feature is extremely useful because it allows using random forests for variable selection.
97101
This can be used to reduce dimensionality of the variable space without losing predictive power (fig.\ref{fig:variable_elimination}),
98102
but also to study conditional variable importance\cite{strobl_conditional_2008}, or, for instance,
@@ -105,38 +109,38 @@ \subsection{Random forest classification}
105109

106110
\subsection{Rigorous and comprehensive model evaluation}
107111

108-
Previous research, using classical statistical learning framework,
112+
Previous research, using classical statistical learning frameworks,
109113
have often assessed their classifier through cross-validation.
110-
It however often unclear how sampling was performed to generate training and
114+
It is, however, often unclear how sampling was performed to generate training and
111115
testing sets\cite{ebrahimi_automatic_2008, chapotot_automated_2010, sen_comparative_2014}.
112116
Time series of epochs are dense and, in general,
113-
the features (and labels) at a given time are very correlated with surrounding features.
117+
the features (and labels) at a given time are highly correlated with surrounding features.
114118
Therefore, if random sampling of even 50\% of all epochs, from all time series, was performed,
115119
most points in the training set will have a direct neighbour in the testing set.
116-
This almost corresponds to an artificial duplication of a dataset before cross-validation and is likely to fail to detect overfitting.
120+
This corresponds to an artificial duplication of a dataset before cross-validation and is likely to fail to detect overfitting.
117121
In the preliminary steps of this study, it was observed that almost perfect accuracy could be achieved when performing naive cross-validation (data not shown).
118-
Supporting further this idea, such surprisingly high accuracy was not observed when training the model
122+
Further supporting this idea, such surprisingly high accuracy was not observed when training the model
119123
with all the even hours (from start of the experiment) and testing it with all the odd ones.
120-
There are several way to reduce overfitting including limiting the maximal number of splits when growing classification trees, or pruning trees.
121-
However, it never possible to unsure a model will not overfit \emph{a priori}.
122-
Thus it remain necessary to assess the model fairly.
124+
There are several ways to reduce overfitting, including limiting the maximal number of splits when growing classification trees, or pruning trees.
125+
However, it impossible to unsure that a model will not overfit \emph{a priori}.
126+
Thus, it remains necessary to assess the model fairly.
123127
In this study, systematic stratified cross-validation was
124128
performed \cite{ding_querying_2008}.
125129
As a result, all predictions made on any 24h time series are generated by models
126-
that did not use any point originating from this same time series. This precaution simulate the the behaviour of the predictor with new recordings.
127-
Cross-validation was not only used to generate overall value of accuracy, but also, to further assess differences in sleep patterns (fig. \ref{fig:struct_assess}).
130+
that did not use any point originating from this same time series. This precaution simulates the behaviour of the predictor with new recordings.
131+
Cross-validation was not only used to generate overall value of accuracy, but also to further assess differences in sleep patterns (fig. \ref{fig:struct_assess}).
128132

129133
\subsection{Quality of the raw data}
130134

131-
Vigilance states can be viewed as discrete representation of a phenomena that is, in fact, continuous.
135+
Vigilance states can be viewed as discrete representations of a phenomenon that is, in fact, continuous.
132136
In this case, the borders between different states are, by nature, fuzzy and somewhat arbitrary.
133137
Therefore, ground truth data cannot be assumed to be be entirely correctly labelled.
134-
In particular, transitions between states will be intricately inaccurate.
135-
The assessment of prediction doubt (fig.~\ref{fig:error}, fourth row) illustrate the high uncertainty inherent to transitions.
138+
In particular, transitions between states could be intricately inaccurate.
139+
The assessment of prediction doubt (fig.~\ref{fig:error}, fourth row) illustrates the high uncertainty inherent to transitions.
136140

137-
The ground truth labels used in this study has been generated by a two pass semi-automatic method.
138-
In a first place, an automatic annotation is performed based on a human-defined variable threshold.
139-
Then, the expert visually inspect the result and correct ambiguities.
141+
The ground truth labels used in this study have been generated by a two-pass semi-automatic method.
142+
In the first place, an automatic annotation is performed based on a human-defined variable threshold.
143+
Then, the expert visually inspects the result and corrects ambiguities.
140144
The first pass was originally designed to combine, through logical rules, four
141145
epochs of five seconds to produce 20s
142146
epochs\cite{costa-miserachs_automated_2003}.
@@ -146,34 +150,36 @@ \subsection{Quality of the raw data}
146150

147151
Several studies have used ground-truth data that was manually scored independently by several experts,
148152
which often appear to show good mutual agreement.
149-
This seem extremely important for several reasons.
150-
First of all, it permits to compare inter-human error to the automatic classifier error.
151-
Then, it allow to allocate a value of confidence to each annotation.
153+
This seems extremely important for several reasons.
154+
First of all, it permits the comparison of inter-human error to the automatic classifier error.
155+
Then, it allows to allocate a value of confidence to each annotation.
152156
For instance, if, for a given epoch, there is strong disagreement between experts, the confidence will be low.
153157
When training a model, this uncertainty can be included, for instance, as a weight.
154158

155159
\subsection{Overall results}
156160
The predictions of the classifier presented in this research agreed with ground truth for 92\% of epochs (table~\ref{tab:confus}).
157-
Although the limitation of the ground truth annotation makes it is difficult to
161+
Although the limitation of the ground truth annotation makes it difficult to
158162
put this result into perspective, this score is very promising.
159-
In addition, prediction did not result in significant difference in prevalences.
163+
In addition, prediction did not result in significant differences in prevalences.
160164
However, there were, on average, much less \gls{rem} episodes in the predicted time series.
161-
The duration of \gls{rem} episodes was also over-estimated by prediction (though this is only marginally significant).
162-
Altogether, this indicates that \gls{rem} state is less fragmented in the predicted data.
165+
The duration of \gls{rem} episodes was also over-estimated by prediction (although this result is only marginally significant).
166+
Altogether, these findings indicate that \gls{rem} state is less fragmented in the predicted data.
163167
In contrast, the awake state was more fragmented in the predicted time series.
164168
Although statistically significant, these differences in variables characterising sleep structure are never greater than twofold.
165169

166-
It would be very interesting to investigate further the extent to which such classifier could be used to detect alteration
170+
It would be very interesting to investigate further the extent to which such classifiers could be used to detect alterations
167171
in the structure of sleep.
168-
One way could be analyse the sleep structure of two groups of animals for which differences were already found, and quantify how much more, or less,
172+
One way could be to analyse the sleep structure of two groups of animals for which differences were already found, and quantify how much more, or less,
169173
difference is found using automatic scoring.
170174

171175

172176
\section*{Conclusion}
173177

174178
The aim of the study herein was to build a classifier that could accurately predict vigilance states from \gls{eeg} and \gls{emg} data
175179
and serve as a basis for an efficient and flexible software implementation.
176-
In a first place, \pr{}, a new python package was designed to efficiently extract a large number of features from electrophysiological recordings.
180+
In the first place, \pr{}, a new python package was designed to efficiently extract a large number of features from electrophysiological recordings.
181+
% ___________________________________________________________________
182+
% ...MENTION TIME-AWARE MODELING IF THAT'S THE OTHER NEW POINT OF YOUR APPROACH
177183
Then, a random forest approach was used to eliminate irrelevant variables.
178184
Importantly, this study shows that prediction accuracy can then be improved by including features derived from restricted local averages.
179185
The overall achieved accuracy was as high as 92\%, and although some significant structural differences were induced by prediction,

0 commit comments

Comments
 (0)