putting things together

qgeissmann · qgeissmann · commit fead8cb02b0f · 2014-09-06T19:21:50.000+01:00
diff --git a/report/discussion.tex b/report/discussion.tex
@@ -148,10 +148,10 @@ \subsection{Quality of the raw data}
 When training a model, this uncertainty can be included, for instance, as a weight.
 
 \subsection{Final result}
-The predictions of the classifier presented in this research agreed with ground truth for 92\% of epochs (table~/ref{tab:confus}).
+The predictions of the classifier presented in this research agreed with ground truth for 92\% of epochs (table~\ref{tab:confus}).
 Although the limitation of the ground truth annotation make it is hard to put this result into perspective,
-this score is satisfactory.
-In addition, prediction did not result significant difference in prevalences.
+this score is very promissing.
+In addition, prediction did not result in significant difference in prevalences.
 However, there were, on average, much less \gls{rem} episodes in the predicted time series.
 The duration of \gls{rem} episodes was also over-estimated by prediction (though this is only marginally significant).
 Altogether, this indicates that \gls{rem} state is less fragmented in the predicted data.
@@ -166,3 +166,21 @@ \subsection{Final result}
 Data from difernet intrument/ animals/ labs will be needed to generate a ubicuitous predictor.
 It is expected that non linear interaction araise, so RF will help ;)
 RF feature discovery
+
+
+\section{Conclusion}
+The aim of the study herein was to build a classifier that could accuratlty predict vigilance states from \gls{eeg} and \gls{emg} data.
+In a first place, \pr{}, a new python package was designed to efficiently extract a large number of features from electrophysiological recordings.
+Then, a random forest approach was used to eliminate irrelevant variables.
+Importantly, this study shows that prediction accuracy can then be improved by including features derived from restricted local avarages.
+The overall achieved accuracy was as high as 92\%, and although some significant stuctural differences were induced by prediction, 
+the classifier was overall satisfying.
+In addition, the presented classifier can generate confidence values that can be used to moderate each prediction, and ultimately decide whether to trust them.
+Before considering implementation of this promissing classifier is a ubicuitous software tool,
+it would be necessary to generalise its results by the inclusion of different sources of data.
+
+\section{Availability}
+The source code of \pr{} is available at \href{https://github.com/gilestrolab/pyrem}{https://github.com/gilestrolab/pyrem}
+and the package will be released shortly, as an open-source software, in the official python repositories.
+
+
diff --git a/report/intro.tex b/report/intro.tex
@@ -1,63 +1,74 @@
 \section{Introduction} \label{intro}
 
-Sleep is considered to be ubicuitous and necessary in so far as it was observed in most animal models.
+Sleep is considered to be a ubiquitous and necessary behaviour amongst animals.
+However, its real physiological functions remain debated.
+In vertebrate, electrophysiological recordings, in particular, \gls{eeg},
+but also \gls{emg} and \gls{eog} have extensively used to study the structure of sleep during the last century.
+They have the advantage of being non-invasive an relatively high throughput.
+Today, \gls{eeg} remains one of the main assess in the study sleep physiology.
 
-In vertebrate, electrophysiological recordings, in particular, \gls{eeg}, but also \gls{emg}
-Classically, activity has been classified in several discrete \emph{vigilance states}...
-
-In rodents, three vigilance states are usually defined on the basis of \gls{eeg} and \gls{emg} (fig.~\ref{fig:sleep_description}).
-When awake (WAKE), an animal tends to have a high muscular activity which translates as a high amplitude in the \gls{emg} and a relatively low amplitude
-\gls{eeg} domitated by oscilations of frequency between six and ten hertz often refered as theta waves.
-In contrast, \gls{nrem} sleep, also called slow wave sleep is a period of muscular inactivity (low \gls{emg}) dominated by slow oscilations (below 4Hz) of high amplitude named delta waves.
-Finally, \gls{rem} sleep is characterised by a complete lack of muscular activity (atony) and an \gls{eeg} activity very similar to the awake state.
-\gls{rem} sleep is the least prevalent of all three stages, and generally represents less that  20\% of all sleeping time.
+Rodents models, in particular mice and rats, have proved very successful model for understanding of the mechanisms of sleep in mammals.
+Classically, three main types of sleep related behaviours are defined and referred as \emph{vigilance states}.
+Vigilance states are usually defined on the basis of \gls{eeg} and \gls{emg} (fig.~\ref{fig:sleep_description}).
+When awake (WAKE), an animal tends to have a high muscular activity which translates as a high amplitude voltage changes in the \gls{emg}.
+During wakefulness, \gls{eeg} is dominated by a relatively low amplitude oscillations of frequency
+between six and ten hertz often referred as theta waves.
+In contrast, \gls{nrem} sleep, also called slow wave sleep, is a period of muscular inactivity (low \gls{emg})
+dominated by slow oscillations (below 4Hz) of high amplitude named delta waves.
+The third state, \gls{rem} sleep, is characterised by a complete lack of muscular activity (atony) and an \gls{eeg} activity very similar to the awake state.
+\gls{rem} sleep is the least prevalent of all three stages, and generally represents generally  20\% of all sleeping time.
+The prevalence of these three states as well as there structural succession are extremely important observations in sleep research
 
 \input{./figures/sleep_description}
 
 
-Although this definitions appear straigthforward, in practice, many cases are ambigous. 
+Although definitions of sleep stages appear straightforward, in practice, many cases are ambiguous.
 For instance, it is difficult to characterise transitions between two states.
-In addition, there are many sources of variability including how surgery was performed by the experimenter,
- the type of recorder used and inter-animal variability.
-The quality of the aquisition can also be made considerably worst by noisy  signals or when the presence of artefacts.
-For these reasons, sleep scoring, the attribution of discrete vigilence states to electrophysiological time series, is traditionnally performed by trained human experts.
-This task is very time consuming; several hours of work have been reported in order to score 24h of recording.
-This severely limits data throughput and human subjectivity is likely to introduce systematic bias. 
-Indeed, it is expected that scoring will be perfomred differently by each expert, making result difficult to reproduce independently.
-Often, two experts score the same time data, in order to ensure satifying aggreement. In general, the inter-human aggrement is important \citationneeded{}.
-It can however be argued that experts most likely work in the same laboratory and trained one another, or were trained by the same third person.
-In this context, aggremment between experts does not account for the variability between communities of researchers, and cannot be used to assess reproducibility.
+In addition, there are many sources of variability including how surgery was performed by the experimenter, the type of recorder used and inter-animal variability.
+The quality of the acquisition can also be made considerably worst by noisy signals or by the presence of artefacts.
+For these reasons, sleep scoring, the attribution of discrete vigilance states to electrophysiological time series,
+is traditionally performed by trained human experts.
+Such manual annotation is very time consuming; several hours of work have been reported in order to score 24h of recording.
+This severely limits data throughput and human subjectivity is likely to introduce systematic bias.
+Indeed, it is expected that scoring will be performed differently by each expert, making result difficult to reproduce independently.
+Often, two experts score the same time data, in order to ensure satisfying agreement.
+Although, manual scorers are generally reported as being very consensual\citationneeded{},
+it can be argued that experts most likely work in the same laboratory and trained one another, or were trained by the same third person.
+In this context, agreement between experts does not account for the variability between communities of researchers, and cannot be used to assess reproducibility.
 
 In order to overcome both speed and subjectivity limitations, efforts have been directed towards automation of sleep scoring.
-However, there is little addoption of automatic method and very few available implementations in the form of software that biologists could use.
-Typically, two different approches have been followed: unsupervided or supervided learning.
-
-Unsupervised learning has the advantage of making no assumption about the nature of the different vigilence states.
-Therefore, this approach can lead to the discovery of truely new states.
-One issue is that the choice of the variables used for clustering will be critical.
-Often, variables such as frequency domain variables chosen in order to generate clusters that will match human defined clusters.
-In addition, unsupervided methods may lack robustness in so far as the cannot easily include covariates explaining, for instance, variability between recording equipments.
+However, there is little adoption has occurred and very few available implementations in the form of software that biologists could use have been developed.
+Typically, two different approaches to classification have been followed: unsupervised or supervised learning.
 
+Unsupervised learning has the advantage of making no assumption about the nature of the different vigilance states, and how they should be defined.
+Therefore, this approach can lead to the discovery of truly new states.
+One issue is that the choice of the variables used for clustering is very critical.
+Often, variables such as frequency domain variables are in fact chosen in order to generate clusters that will match human defined clusters.
+In addition, unsupervised methods may lack robustness in so far as the cannot easily include covariates explaining, for instance, variability between recording equipments.
 
-Another approach is to assume human annotations are in general biologically relevant and consistant, and to use supervided learning teachniques.
-Of course, if human decisions were baised, such a method may reproduce this biais.
-However, a vast corpus of experimental work has provided hypothesis about function of these states.
-Building a classifier that would produce a consensual prediction of vigilence states could be seen as an attempt to formalised and rationnalise the definition of such states.
-This could improving future research without denying decades of sleep neurobiology. 
+Another approach is to assume human annotations are, although imperfect, biologically relevant and generally consistent,
+ and therefore to use supervised learning techniques.
+Of course, if human decisions were biased, such a method may reproduce this bias.
+However, a vast corpus of experimental work has provided hypothesis about function of these states which tends to validate the actual `existence' of these discrete vigilance states.
+Building a classifier that would produce a consensual prediction of vigilance states could be seen as an attempt to formalised and rationalise the definition of such states.
+This would improve future research without denying decades of sleep neurobiology.
 
 Many supervised learning techniques ranging from SVM, ANNs, to HMMs have been investigated.
-In general, the first step is to compute features on subsequent segments, know as epochs, of annotated electrophysiological signals.
-Then, the relation between the response variable(annotation) and the independent variables (features) can be modeled.
-Either epochs are considered to be independent from one another or time-dependent structures are explicitely modeled (\eg{} HMMs).
+In general, the first step is to compute features on consecutive segments of annotated electrophysiological signals know as epochs.
+Then, the relation between the response variable(annotation) and the independent variables (features) can be modelled.
+Either epochs are considered to be independent from one another or time-dependent structures are explicitly modelled (\eg{} using HMMs).
+Time aware modelling has the advantage of accounting for the interdependence of consecutive epochs (see fig.~\ref{fig:sleep_description}B).
+However, it generally does not perform as well as classical classifiers when modelling non-linear relationships between large numbers of predictors.
 
-Recently, promissing results were obtained for scoring human sleep stages by performing an exhaustive feature extraction including variables resulting from discrete wavelet decomposition.
-Then, the authors compared several classifiers and found that random forest were the most accurate.
+Recently, promising results were obtained for scoring human sleep stages by performing an exhaustive
+feature extraction including variables resulting from discrete wavelet decomposition.
+Then, the authors compared several classifiers and found that random forest were, overall, the most accurate predictors.
 
-The study herein bases itself on these promissing results by computing an even larger number of features.
+The study herein bases itself on these promising results by computing an even larger number of features.
 An important addition is the computation of time-aware features which significantly improved accuracy.
-In addition, rigourous startified cross-validation procedure and comparisons of sleep structure were performed.
-
-In order to pave the way to an implementation of an ubicuitous sleep scoring software.
-\pr, a new \py{} package was also build to facilitate efficient feature extraction. 
-This new package is here demonstated to be significantly more performant than preexisting implementation.
+Furthermore, rigorous stratified cross-validation procedure and comparisons of sleep structure were performed.
+These improvement altogether contributed to achieve a very satisfying overall accuracy of 92\%.
+In order to pave the way to an implementation of an ubiquitous sleep scoring software.
+\pr, a new \py{} package was also build to facilitate efficient feature extraction.
+This new package is here demonstrated to be significantly more performance than alternative implementation.
 
diff --git a/report/report.tex b/report/report.tex
@@ -5,10 +5,11 @@
 \usepackage{standalone}
 
 \usepackage{geometry} % Used to adjust the document margins
-\geometry{bindingoffset=1cm}
+%~ \geometry{bindingoffset=1cm}
 \usepackage{fullpage}
 \usepackage{multirow}
-
+\usepackage{setspace}
+\doublespacing
 
 \usepackage{graphicx}
 \usepackage[font=footnotesize]{caption}
@@ -92,6 +93,11 @@
 \newacronym{nrem}{NREM}{Non-Rapid Eye Movement (slow wave sleep)}
 \newacronym{eeg}{EEG}{ElectroEncephaloGram}
 \newacronym{emg}{EMG}{ElectroMyoGram}
+\newacronym{eog}{EOG}{ElectroOculoGram}
+\newacronym{svm}{SVM}{Support Vector Machine}
+\newacronym{ann}{ANN}{Artificial Neural Network}
+\newacronym{hmm}{HMM}{Hidden Markov Model}
+
 %~ 
 %~ \newglossaryentry{epoch}
 %~ {
@@ -112,7 +118,10 @@
 
 
 \begin{abstract}
-	\TODO{write the abstract}
+	
+	
+	
+	
 \end{abstract}
 
 
diff --git a/report/tables/importances.tex b/report/tables/importances.tex
@@ -3,7 +3,7 @@
 \caption{\ctit{Relative variable importance of the 21 selected features.}
 Random forest algorithm can produce a value to quantify variable importance.
 Variable importance quantify how much, statistically, a variable contributes to predictive accuracy.
-Starting from 164 variables, the least imporatn variables were recursively eliminated.
+Starting from 164 variables, the least important variables were recursively eliminated.
 This table represents the 21 most important remaining features.
 Features from both \gls{eeg} and \gls{emg} are important for accurate prediction.
 \label{tab:importances}}