mainly caption typos/ tables

qgeissmann · qgeissmann · commit 513313e2d242 · 2014-09-07T15:04:41.000+01:00
diff --git a/report/discussion.tex b/report/discussion.tex
@@ -1,6 +1,6 @@
 \section{Discussion} \label{discussion}
 
-\subsection{python package}
+\subsection{Software package for feature computation}
  
 In order to train statistical learning methods to classify vigilance states,
 it was necessary to compute an exhaustive set of features for all consecutive five second epochs
@@ -23,13 +23,13 @@ \subsection{python package}
 
 Several \texttt{PyEEG} functions were also found to be inconsistent with mathematical
 definitions (see \pr{} documentation, appendix).
-This unfortunatly apperas to be a common issue for accademic software.
-The general status of the peer-review process and the reproducibility of programs and algorithms have 
+This unfortunatly apperas to be a common issue for academic software.
+The general status of the peer-review process and the reproducibility of programs and algorithms have
 recently drawn attention (see \citationneeded{Black-box; Can I reproduce your algo} for discussions about this issue).
 
-\subsection{Originality of feature extraction}
+\subsection{Exhaustive feature extraction}
 
-Feature exctraction in the present study contrasts with previous work in two respects.
+Feature extraction in the present study contrasts with previous work in two respects.
 First of all, features were exhaustively computed not only on raw signals,
 but also on all wavelet frequency sub-bands.
 Then, new variables were created to account for temporal consistency of vigilance state episodes.
@@ -61,7 +61,7 @@ \subsection{Originality of feature extraction}
 The convolution approach (eq.\ref{eq:window}) appeared to provide better results.
 Instead of averaging feature after calculation, it may be advantageous to compute features over epochs of different length in a first place.
 Thus, the accuracy of local of non additive features, such as median, will be improved. In addition to local mean of feature, other variables, such as local
-slope and local variance of each feature may improve classification \citationneeded(Deng 2013).
+slope and local variance of each feature may improve classification \citationneeded{(Deng 2013).}
 
 Although addition of time-dependent variables improved accuracy over a time-unaware model, their use can be seen as controversial.
 Indeed, including prior information about sleep structure will cause problems if the aim is to find differences in sleep structure.
@@ -77,7 +77,7 @@ \subsection{Originality of feature extraction}
 
 
 
-\subsection{Random forest}
+\subsection{Random forest classification}
 
 In this study, random forest\citationneeded{} classifiers were exclusively used.
 In addition to their capacity to model non-linearity, they are very efficient at handling very large number of variables.
@@ -95,7 +95,7 @@ \subsection{Random forest}
 
 
 
-\subsection{Rigorous model evaluation}
+\subsection{Rigorous and comprehensive model evaluation}
 
 Previous research, using classical statistical learning framework,
 have often assessed their classifier through cross-validation.
@@ -110,7 +110,7 @@ \subsection{Rigorous model evaluation}
 with all the even hours (from start of the experiment) and testing it with all the odd ones.
 There are several way to reduce overfitting including limiting the maximal number of splits when growing classification trees, or pruning trees.
 However, it never possible to unsure a model will not overfit \emph{a priori}.
-Thus it remain necessary toassess the model fairly.
+Thus it remain necessary to assess the model fairly.
 In this study, systematic stratified cross-validation was performed.
 As a result, all predictions made on any 24h time series are generated by models
 that did not use any point originating from this same time series. This precaution simulate the the behaviour of the predictor with new recordings.
@@ -140,10 +140,10 @@ \subsection{Quality of the raw data}
 For instance, if, for a given epoch, there is strong disagreement between experts, the confidence will be low.
 When training a model, this uncertainty can be included, for instance, as a weight.
 
-\subsection{Final result}
+\subsection{Overall results}
 The predictions of the classifier presented in this research agreed with ground truth for 92\% of epochs (table~\ref{tab:confus}).
 Although the limitation of the ground truth annotation make it is hard to put this result into perspective,
-this score is very promissing.
+this score is very promising.
 In addition, prediction did not result in significant difference in prevalences.
 However, there were, on average, much less \gls{rem} episodes in the predicted time series.
 The duration of \gls{rem} episodes was also over-estimated by prediction (though this is only marginally significant).
@@ -174,3 +174,4 @@ \section*{Availability}
 and the package will be released shortly, as an open-source software, in the official python repositories.
 
 
+
diff --git a/report/matmet.tex b/report/matmet.tex
@@ -1,18 +1,35 @@
 \section{Material and Methods} \label{matmet}
 
-\subsection{Data}
-
-\subsection{Preprocessing}
+\subsection{Data acquisition}
+%~ 
+In this study, 12 male mice (Tg(Gal-cre)KI87Gsat/Mmucd, FVB/N-Crl:CD1(ICR) hybrid strain),
+between 8 and 12 week old were used.
+Animal were housed under a 12h light/dark light cycle.
+Small (diam. = 1mm) holes were drilled in the scull of anaesthetised animals.
+For the  \gls{eeg}, the reference-ground electrode was placed into the parietal bone (Bregma -1.5, mediolateral (ML) +1.5) and
+ the other electrode was placed into the frontal bone (Bregma +1.5, ML -1.5).
+For \gls{emg} acquisition, three polytetrafluoroethylene-insulated stainless steel electrodes were placed into the neck muscles.
+
+Both signals were recorded by a miniature `neurologger' \citationneeded{ Vyssotski et al.,2006}.
+The recording device applies band-pass analogue filtering between 1 and 70Hz and converts both analogue signals to digital time series at a sampling rate of approximately 200.0Hz.
+All 12 animals were monitored for approximately 24h.
+
+Sleep scoring was performed in a semi-automatic fashion by a trained expert.
+A first, human assisted, pass was applied to generate preliminary annotations on the basis of logical rules \citationneeded{}.
+Then, the expert visually inspected and, when required, corrected the annotations.
+Annotation were generated for consecutive epochs of approximately 5.0s.
+%~ 
+\subsection{Data preprocessing}
 
 \gls{eeg} and \gls{emg} signals were resampled from approximately 200.0Hz to 256.0Hz using
 conservative sinc interpolation\citationneeded{Putman}.
-A sampling frequency of $f_s  = 256.0Hz$ is convenient since is implies that discrete wavelet decomposition (see subsection~\ref{sub:features} and fig.~\ref{fig:dwd}) will separate
+A sampling frequency of $f_s  = 256.0$Hz is convenient since is implies that discrete wavelet decomposition (see subsection~\ref{sub:features} and fig.~\ref{fig:dwd}) will separate
 frequencies above 4.0Hz from those below 4.0Hz (since $4 = 256.0/{2^6} $).
 This frequency is typically uses as a cut-off value between theta and delta waves \citationneeded{}.
 In addition, \gls{eeg} and \gls{emg} signals were standardised ($E[x] = 0, Var[x] = 1$) to account for the variability in baseline amplitude due to acquisition.
 Vigilance state anotations were resampled at exactly 0.20Hz using nearest neighbour interpolation.
 
-\subsection{Feature extraction}
+\subsection{Feature extraction from time series}
 \label{sub:features}
 
 A wavelet transform based feature extraction strategy was adopted.
@@ -61,31 +78,6 @@ \subsection{Addition of temporal features}
 
 
 \subsection{Stratified Cross Validation and sampling}
-%~ Generally, success of classification of vigilance stages is assessed by cross-validation.
-%~ In many studies\citationneeded{}, it is implied that cross-validation was performed by making $k$ random training subsets
-%~ of the whole data and assessing the model fitted on the remaining subsets (\ie{} k-fold cross-validation). 
-%~ In this context, since features are calculated for every epoch
-%~ Therefore, epochs are the statistical individuals.
-%~ 
-%~ When working with dense time series (or, for instance, spatial data), it can be suspected that within a statistical block (group),
-%~ both features and response variables are very correlated with neighbouring data points.
-%~ For instance, the features and label at $t_n$ are expected to be largely similar to the features, and labels at $t_{n+1}$ and $t_{n-1}$.
-%~ In statistical terms, $t_{n+1}$ is not independent of $t_{n}$.
-%~ 
-%~ If the training sets are drawn completely at random, from a dataset containing multiple long recordings,
-%~ the underlying time series would only be made marginally sparser, and the data points missing from
-%~ a time series could simply be inferred from the neighbouring points which remain in the training set.
-%~ This temporal pseudo-replication may result in model overfitting and give the false impression 
-%~ that a model has a very strong predictive power.
-%~ 
-%~ The goal of cross-validation is to assess how well a predictive model would perform on \emph{new data}.
-%~ This study, and most similar studies, aims at providing a tool to automatically annotate \emph{new recordings}.
-%~ Therefore, it is fairer to perform \emph{stratified cross-validation}, using the different recordings as the stratum levels.
-%~ In this study, all cross-validation were performed by training the model with epochs originating from all but one recordings,
-%~ and testing it on the recording kept out. This was repeated by successively leaving each recording out.
-
-%~ As shown in figure~\ref{fig:sleep_description}, the prevalence of different states is not balanced. Noticeably, \gls{rem} sleep represents only $10\%$ of all epochs.
-%~ This implies that $90\%$ accuracy could be achieved even if all \gls{rem} epoch were misclassified.
 
 
 Stratified cross-validation was systematically applied to generate vigilance state predictions.
@@ -96,7 +88,7 @@ \subsection{Stratified Cross Validation and sampling}
 Class  unbalancedness was accounted for by fitting predictors on balanced subsamples (750 epochs of each class per tree).
 
 
-\subsection{Random forests}
+\subsection{Random forests analysis}
 Unless specified otherwise, random forests \citationneeded{} were trained with balanced samples of 1000 epochs per class.
 In order to select variables and define new features, forests with 50 classification trees were built.
 100 trees were used otherwise.
@@ -159,5 +151,5 @@ \subsection{Statistical analysis}
 linear mixed model\citationneeded{} on log-transformed duration, using recordings (\ie{} animals) as a random effect, was fitted. 
 Then, interaction between state and method were tested with a t-test.
 
-
-
+Statistical analysis was performed with \texttt{R}, unsing the \citationneeded{} packages.
+%~ 
diff --git a/report/report.tex b/report/report.tex
@@ -8,6 +8,7 @@
 %~ \geometry{bindingoffset=1cm}
 \usepackage{fullpage}
 \usepackage{multirow}
+%~ \usepackage{wasysym}
 \usepackage{setspace}
 \doublespacing
 
diff --git a/report/tables/benchmark.tex b/report/tables/benchmark.tex
@@ -1,15 +1,15 @@
 \begin {table}[!h]
 \begin{center}
 \caption{\ctit{Performance improvements over \texttt{PyEEG}.}
- In order to improve performance, modifications of the algorithms implemented in \texttt{PyEEG} were carried out.
-In addition, several mathematical inconsistencies were also discovered and corrected.
-The table compares how long, on average, each algorithm would take, for a random sequence of length $1280$ (\ie{} $5s$ at $256Hz$).
+In order to improve performance, modifications of the algorithms implemented in \texttt{PyEEG} were carried out.
+This table compares how long, on average, each algorithm would take, for a random sequence of length $1280$ (\ie{} $5s$ at $256$Hz).
 It also represents how many added points would lead to a tenfold runtime increase.
 For the tested range ($n \in [1280;7680] $), all algorithms add approximately an
-exponential time complexity ($10^{O(n)}$), $R^2 > 0.95$, for all.
-\textbf{1}: whether the original implementation was corrected in order to match mathematical definition.
+exponential time complexity ($10^{O(n)}$, $R^2 > 0.95$, for all).
+Several mathematical inconsistencies were also discovered and corrected. 
+The rightmost column (\textbf{\textdagger}) indicates whether the original implementation was
+corrected in order to match mathematical definition. Each alteration is mathematically justified in the section \texttt{pyrem.univariate} of the \pr{} documentation (see appendix).
 \textbf{(-)}: indicates a worse performance of \pr{} over \pyeeg{}.
-Each alteration is mathematically justified in the section \texttt{pyrem.univariate} of the \pr{} documentation (see appendix).
 Significance levels: $^{***}$, $p-value < 10^{-3}$; $^{**}$, $p-value < 10^{-2}$, see Material and Methods for detail about statistical analysis.
 \label{tab:benchmark}
 }
@@ -20,7 +20,7 @@
  \hline
  \hline
  
-  algorithm & function & \specialcell{$t$(ms) for \\$n = 1280$} & \specialcell{$n$ for $\times 10$\\increase} & \specialcell{$t$(ms) for \\$n = 1280$} & \specialcell{$n$ for $\times 10$\\ increase} & fix$^1$\\
+  algorithm & function & \specialcell{$t$(ms) for \\$n = 1280$} & \specialcell{$n$ for $\times 10$\\increase} & \specialcell{$t$(ms) for \\$n = 1280$} & \specialcell{$n$ for $\times 10$\\ increase} & fix\textsuperscript{\textdagger}\\
  
   \hline
   \hline
diff --git a/report/tables/confus.tex b/report/tables/confus.tex
@@ -9,7 +9,7 @@
 \label{tab:confus}
 }
 \footnotesize
-\begin{tabular}{|c|c|c|c|c|c|}
+\begin{tabular}{|c|c|c|c|c||c|}
   \hline
   & &   \multicolumn{3}{|c|}{Prediction} & \multirow{2}{*}{PPV$^a$} \\ 
   & & NREM& REM& WAKE &\\
@@ -19,7 +19,7 @@
    & NREM & 47.9& 1.36& 3.03 & 0.92\\
    & REM & 0.63 & 5.14& 0.25 & 0.85\\
    & WAKE & 2.24& 0.42& 39.3 & 0.94\\
-   \hline
+   \hline\hline
    \multicolumn{2}{|c|}{Sensitivity} &  0.94  & 0.74 &   0.92 &$0.92^b$\\
  \hline
 
diff --git a/report/tables/importances.tex b/report/tables/importances.tex
@@ -2,7 +2,7 @@
 \begin{center}
 \caption{\ctit{Relative variable importance of the 21 selected features.}
 Random forest algorithm can produce a value to quantify variable importance.
-Variable importance quantify how much, statistically, a variable contributes to predictive accuracy.
+Variable importance corresponds to how much, statistically, a variable contributes to reducing the prediction inacuracy (or, to be more precise, the Gini impurity).
 Starting from 164 variables, the least important variables were recursively eliminated.
 This table represents the 21 most important remaining features.
 Features from both \gls{eeg} and \gls{emg} are important for accurate prediction.