paper_gecco_8/learningMethods.tex.orig

\subsection{Parametrized Motion Models}
\seclabel{motionModel}

\edit{this might no longer flow...fix} There are a plethora of methods available for the creation of a vector motion generating function $g(t)$, and we've chosen two for comparison in this study.

The first method is to choose a family of \emph{parametrized functions}.  By fixing the parameters to a particular set of values, we obtain a deterministic motion function over time.  We tried several parametrizations on the robot and, obtaining reasonble early success, settled on one particular parametrization.  We dubbed this the \emph{SineModel5}, as it has five parameters and employs a sine wave as its root pattern.

The five parameters in $\vec{\theta}$ are as follows:

 - amp - Amplitude       - allowable range
 - tau   - sine wave period   - allowable range
 - multIO - multiplier scaling inner joings from outer joints  - allowable range
 - multFB - multiplier scaling front joints from back joints  - allowable range
 - multLR - multiplier scaling left joints from right joints  - allowable range
(verify this)

\newcommand{\amp}{\mathrm{amp}}

Inuitively, SineModel5 starts with 8 identicaly sine waves of amplitude $\amp$ and period $\tau$, multiplies the waves for all inner motors by multIO, multiplies the waves for all front motors by multFB, and multiplies the waves for all left motors by multLR.  To obtain the actual motor position commands, these waves are offset by an appropriate fixed constant so that the base position (when the sine waves are at 0) is approximately a crouch (the position shown in \figref{crouchingRobot}).  Finally, in this motion model, the ninth "hip" motor is kept at a neutral position.  Thus, the commanded position for all motors, as a vector function of time, is:

\begin{verbatim}
        [g_0(t) ]    [ \amp * sin(\tau t) ]
        [g_1(t) ]    [ \amp * sin(\tau t) ]
        [g_2(t) ]    [ \amp * sin(\tau t) ]
g(t)   [g_3(t) ] = [ \amp * sin(\tau t) ]
        [g_4(t) ]    [ \amp * sin(\tau t) ]
        [g_5(t) ]    [ \amp * sin(\tau t) ]
        [g_6(t..) ]    [ \amp * sin(\tau t) ]
\end{verbatim}

The nine subscripted functions correspond to the nine labeled motors in \figref{robotDiagram}

% used is a function parametrized by five numbers, shown in \figref{foo}


\subsection{Learning Methods for Parametrized Motion Models}
\seclabel{learningMethods}

Using the SineModel5 parametrized motion model described in the previous section, along with the allowable ranges for each of the five parameters (shown in \tabref{parameterTable}), the task becomes how to choose the combination of five parameters that results in the fastest motion, per the evaluation methods in \secref{fitnessEvaluation}.

If we choose a value for the five dimensional parameter $\vec{\theta}$, then a given physical trial gives us one measurement of the fitness of that parameter vector, or $f(\vec{\theta})$.  Two things make learnign difficult.  First, each evaluation of $f(\vec{\theta})$ is expensive, taking 15-20 seconds of time on average.  Second, the fitness returned by such evaluations has proven to be very noisy, with the standard deviation of the noise often being roughly equivalent to the size of the measurement.

For both of these reasons, an intelligent learning algorithm is needed.

In this context, by learning algorithm we simply mean a method for choosing:

1) The value for $\vec{\theta}$ for the initial trial, and
2) The next value of $\vec{\theta}$ to try, given the $\vec{\theta}$ and $f(\vec{\theta})$ for all previous trials

We evaluated a total of eight subtly or not so subtly different learning methods.  All employed a simple random sampling method for requirement (1); that is, all methods picked their initial $\vec{\theta}$ value via uniform random sampling within the allowed parameter ranges. \edit{make sure this is true}.  Thus, the differences in the algorithms was how the selected new $\vec{\theta}$ values to try from their past experience.

The eight methods used, and their methods for choosing the next $\vec{\theta}$ are:


\edit{make figure to go here, show drawing with inner/outer, l/r, and f/b motors.}
\edit{JMC: Use photo of physical robot in this image, and add labels in photoshop?}

\edit{write this.  Old quadratot Method section below alsdkfj }

%\section{Method}
%\seclabel{method}

% Describe in reasonable detail the algorithm you are using to address
% this problem. A pseudo-code description of the algorithm you are
% using is frequently useful. If it makes sense for your project,
% trace through a concrete example, showing how your algorithm
% processes this example. The example should be complex enough to
% illustrate all of the important aspects of the problem but simple
% enough to be easily understood. If possible, an intuitively
% meaningful example is better than one with meaningless symbols.

%We use several parametrized motion models that command motors to
%positions based on a sine wave, creating a periodic pattern.  While we
%investigated several models, for the bulk of our experiments, we used
%a model whose five parameters are: amplitude, wavelength, scale inner
%vs outer motors, scale left vs right motors, scale back vs front
%motors. Each strategy below attempts to choose the best possible
%parameters for this motion model.  

\edit{did you want the next paragraph commented out}
We implemented and tested 8 different learning strategies.  All
strategies except for the HyperNEAT method\cite{clune2009evolving} were
constrained to pick parameters from within predetermined ranges.

\edit{JMC:Bullets=narrower column=wasted space. subsubsection instead?}

\begin{itemize}
\item \emph{Random}: This method randomly generates parameter vectors
  in the allowable range for every trial. This strategy was used only as baseline for comparison.

\item \emph{Uniform random hill climbing}: This method begins by
  selecting a single random parameter vector.  Subsequent iterations
  generate a neighbor by randomly choosing one parameter to adjust and
  replacing it with a new value chosen with uniform probability in the
  allowable range for that parameter. The neighbor is evaluated by
  running the robot with the newly chosen parameters. If this neighbor
  results in a longer distance walked than the previous best gait, it
  is saved as the new best gait. The process is then repeated, always
  starting with the best gait.

\item \emph{Gaussian random hill climbing}: This method works
  similarly to Uniform random hill climbing, except neighbors are
  generated by adding random Gaussian noise to the current best gait.
  This results in all parameters being changed at once, but the
  resulting vector is always fairly close to the previous best gait.
  We used independently selected noise in each dimension, scaled such
  that the standard deviation of the noise was 5\% of the range of
  that dimension.

\item \emph{N-dimensional policy gradient descent}: In \cite{kohl},
  Kohl and Stone document a method for local gradient ascent for gait
  learning with noisy fitness evaluations, and we have included this
  method in our evaluation.  This strategy explicitly estimates the
  gradient for the objective function. It does this by first
  evaluating \emph{n} randomly generated parameter vectors near the
  initial vector, each dimension of these vectors being perturbed by
  either $-\epsilon$, $0$, or $\epsilon$. Then, for each dimension, it
  groups vectors into three groups: $-\epsilon$, $0$, and $\epsilon$.
  The gradient along this dimension is then estimated as the average
  score for the $\epsilon$ group minus the average score for the
  $-\epsilon$ group. Finally, the method creates a new vector by
  changing all parameters by a fixed-size step in the direction of the
  gradient.

\item \emph{Nelder-Mead simplex method}: The Nelder-Mead simplex
  method \cite{nm} creates an initial simplex with $d+1$ vertices,
  where $d$ is the dimension of the parameter space. The initial
  parameter vector is stored as the first vertex and the other five
  vertices are created by adding to one dimension at a time one tenth
  of the allowable range for that parameter. It then tests the fitness
  of each vertex and based on these fitnesses, it reflects the worst
  point over the centroid in an attempt to improve it.  In general,
  the worst vertex is reflected; however, to prevent cycles and
  becoming stuck in local minima, several other rules are used.  If
  the reflected point is better than the second worst point and worse
  than the best point, then the reflected point replaces the worst. If
  the reflected point is better than the best point, the simplex is
  expanded in the direction of the reflected point. The better of the
  reflected and the expanded point replaces the worst point. If the
  reflected point is worse than the second worst point, then the
  simplex is contracted away from the reflected point. If the
  contracted point is better than the reflected point, the contracted
  point replaces the worst point. If the contracted point is worse
  than the reflected point, the entire simplex is shrunk \cite{nm}.

\item \emph{Linear regression}: To initialize, this method chooses and
  evaluates five random parameter vectors. It then fits a linear model
  from parameter vector to fitness. In a loop, the method chooses and
  evaluates a new parameter vector generated by taking a fixed-size
  step in the direction of the gradient for each parameter, and fits a
  new linear model to all vectors evaluated so far, choosing the model
  to minimize the sum of squared errors.

\item \emph{SVM regression}: Similarly to linear regression, this
  model starts with several random vectors, but this time they are
  chosen in a small neighborhood about some initial random vector.
  These vectors (generally 8) are evaluated, and a support vector
  regression model is fit to the observed fitnesses.  To choose the
  next vector for evaluation, we randomly generate some number
  (typically 100) of vectors in the neighborhood of the best observed
  gait, and select for evaluation the vector with the best predicted
  performance.  We suspected that if we always chose the best
  predicted point out of 100, we may end up progressing along a narrow
  subspace, prohibiting learning of the true local fitness function.
  Put another way, we would always choose exploitation of knowledge
  vs. exploration of the space.  To address this concern, we added a
  parameter dubbed \code{bumpBy} that added noise to the final
  selected point before it was evaluated.

  Such a method naturally has many tunable parameters, and we
  endeavored to select these parameters by tuning the method in
  simulation.  To estimate the performance of the algorithm, we ran it
  against a simulation with a known optimum.  The simulated function
  was in the same five dimensional parameter space, and simply
  returned a fitness determined as the height of a Gaussian with a
  random mean.  The width of the Gaussian in each dimension was 20\%
  of the range of each dimension, and the maximum value at the peak
  was 100.  \figref{svm_sim_results} shows the learning results on
  this simulated model using the ultimately selected SVM parameters.
  Interestingly, a non-zero value of \code{bumpBy} resulted in better
  learning than noise free (exploration free) learning.

  \edit{JMC: This next paragraph sounds like it belongs in Results?}
  Ultimately, however, the version of SVM tuned for simulation still
  did not show competitive performance on the real robot.  We tried
  tuning some parameters on the real robot, but after some amount of
  tuning, the method still exhibited too little exploration and easily
  became stuck in local minima.

\item \emph{Neuroevolution (Evolving Artificial Neural Networks with HyperNEAT)}: 
    
  HyperNEAT is an indirect encoding for evolving artificial neural networks (ANNs) that is inspired by the way natural organisms develop~\cite{stanley2009hypercube}. It evolves Compositional Pattern Producing Networks (CPPNs)~\cite{stanley2007compositional}, each of which is a genome that encodes an ANN phenotype~\cite{stanley2009hypercube}. Each CPPN is itself a directed graph, where each node is a mathematical function, such as sine or Gaussian. The nature of these functions can facilitate the evolution of properties such as symmetry (e.g., an absolute value or Gaussian function) and repetition (e.g., a sine function)~\cite{stanley2009hypercube, stanley2007compositional}. The signal on each link in the CPPN is multiplied by that link's weight, which can alter its effect.
  
A CPPN is queried once for each link in the ANN substrate to determine that link's weight. The inputs to the CPPN are the Cartesian coordinates of both the source (e.g., $x = 2$, $y = 4$) and target (e.g., $x = 3$, $y = 5$) nodes of a link and a constant bias value. The CPPN takes these five values as inputs and produces one or two output values, depending on the substrate topology. If there is no hidden layer in the substrate, the single output is the weight of the link between a source node on the input layer and a target node on the output layer. If there is a hidden layer, the first output value determines the weight of the link between the associated input (source) and hidden layer (target) nodes, and the second output value determines the weight of the link between the associated hidden (source) and output (target) layer nodes. All pairwise combinations of source and target nodes are iteratively passed as inputs to a CPPN to deter-mine the weight of each substrate link. 
HyperNEAT is capable of exploiting the geometry of a problem: because the link values between nodes in the final ANN substrate are a function of the geometric positions of those nodes, HyperNEAT can exploit such information when solving a problem~\cite{stanley2009hypercube, clune2009sensitivity, clune2011performance}. In the case of quadruped locomotion, this property helped Hyper-NEAT produce gaits with front-back, left-right, and four-way symmetries~\cite{clune2009evolving, clune2011performance}.
  
The evolution of the population of CPPNs occurs according to the principles of the NeuroEvolution of Augmenting Topologies (NEAT) algorithm~\cite{stanley2002evolving}, which was originally designed to evolve ANNs. NEAT can be fruitfully applied to CPPNs because of their structural similarity to ANNs. For example, mutations can add a node, and thus a function, to a CPPN graph, or change its link weights. The NEAT algorithm is unique in three main ways~\cite{stanley2002evolving}. Initially, it starts with small genomes that encode simple networks and slowly complexifies them via mutations that add nodes and links to the network, enabling the algorithm to evolve the network topology in addition to its weights. Secondly, NEAT has a fitness sharing mechanism that preserves diversity in the system and gives time for new innovations to be tuned by evolution before competing them against more adapted rivals. Finally, NEAT tracks historical information to perform intelligent crossover while avoiding the need for expensive topological analysis. A full explanation of NEAT can be found in~\cite{stanley2002evolving}]. 
  
\edit{JMC: fix next paragraph}  
Following Clune et al.~\cite{clune2011performance, clune2009evolving}, the ANNs for the robot consist of three 5x4 Cartesian grids of nodes forming input, hidden, and output layers. Adjacent layers were completely connected, meaning that there were (5x4)2 x 2 = 800 links in each substrate. The inputs to the substrate were the current angles of each of the 12 joints of the robot (described below), a touch sen-sor that provides a 1 if the lower leg is touching the ground and a 0 if it is not, the pitch, roll, and yaw of the torso, and a modified sine wave (to facilitate the production of periodic behaviors). The sine wave was the following function of time (t) in milli-seconds: sin(120 x t) x ¹. This function produces numbers from Ð¹?to ¹, which was the range of the unconstrained joints. During preliminary tests, we experimentally found the constant 120 to produce fast, natural gaits, and determined that the touch, pitch, roll, yaw, and sine inputs all contributed to the ability to evolve fast gaits [3].

The ANN substrate outputs were the desired angles for each joint, which were fed into proportional controllers that applied forces to move the joints toward the desired angles. Robots were evaluated in the ODE physics simulator (www.ode.org). The rectangular torso of the robot was (in ODE units) 0.15 wide, 0.3 long, and .05 tall. Each of four legs was composed of three cylinders (length 0.075, radius 0.02) and three hinge joints. The first cylinder functioned as a hip bone. It was parallel to the proximal-distal axis of the torso and barely stuck out from it. The other two cylinders were the upper and lower leg. There were two hip joints and one knee joint. The first hip joint allowed the legs to swing forward and backward (anterior-posterior) and was constrained to 180¡ such that, at maximum extension, it was parallel with the torso. The second hip joint allowed a leg to swing in and out (proximal-distal). Together, the two hip joints approximated a universal joint. The knee joint swung forward and backward. The second hip and knee joints were unconstrained.  

Each controller in a population of 150 was simulated for 3000 time steps (3 sec-onds). Experiments lasted 1000 generations with a switch point at generation 500. Trials were cut short if any part of the robot except its lower leg touched the ground, or if the number of direction changes in joints exceeded 960. The latter condition re-flects the fact that servo motors cannot be vibrated incessantly without breaking. The fitness of controllers was the following function of d, the maximum distance traveled:  . The exponential nature of the function magnified the selective advantage of small increases in the distance traveled. Because HyperNEAT greatly outperforms FT-NEAT on this problem [3], we compare HybrID to only HyperNEAT. 


  % Preliminary HyperNEAT runs were promising and resulted in several
  % interesting gaits.  

  % Unfortunately, the gaits generated by HyperNEAT
  % also tended to stress the robot more than typical gaits had before,
  % and the servos would often overheat and malfunction, requiring
  % restarts.  We think these issues may be addressed by adding a small
  % layer between the HyperNEAT strategy and the robot that disallows
  % quickly shifting commanded positions, and we hope to be able to test
  % these methods further once this filter is in place.

\end{itemize}

\acmFig{svm_sim_results}{1}{Results for the SVM regression strategy
  in simulation.  This simulation was used to tune the SVM strategy's
  parameters before trying it on the physical robot.  The strategy
  quickly approaches the maximum simulated fitness of 100.}