\subsection{Parametrized Motion Models} \seclabel{motionModel} \edit{this might no longer flow...fix} There are a plethora of methods available for the creation of a vector motion generating function $g(t)$, and we've chosen two for comparison in this study. The first method is to choose a family of \emph{parametrized functions}. By fixing the parameters to a particular set of values, we obtain a deterministic motion function over time. We tried several parametrizations on the robot and, obtaining reasonble early success, settled on one particular parametrization. We dubbed this the \emph{SineModel5}, as it has five parameters and employs a sine wave as its root pattern. The five parameters in $\vec{\theta}$ are as follows: - amp - Amplitude - allowable range - tau - sine wave period - allowable range - multIO - multiplier scaling inner joings from outer joints - allowable range - multFB - multiplier scaling front joints from back joints - allowable range - multLR - multiplier scaling left joints from right joints - allowable range (verify this) \newcommand{\amp}{\mathrm{amp}} Inuitively, SineModel5 starts with 8 identicaly sine waves of amplitude $\amp$ and period $\tau$, multiplies the waves for all inner motors by multIO, multiplies the waves for all front motors by multFB, and multiplies the waves for all left motors by multLR. To obtain the actual motor position commands, these waves are offset by an appropriate fixed constant so that the base position (when the sine waves are at 0) is approximately a crouch (the position shown in \figref{crouchingRobot}). Finally, in this motion model, the ninth "hip" motor is kept at a neutral position. Thus, the commanded position for all motors, as a vector function of time, is: \begin{verbatim} [g_0(t) ] [ \amp * sin(\tau t) ] [g_1(t) ] [ \amp * sin(\tau t) ] [g_2(t) ] [ \amp * sin(\tau t) ] g(t) [g_3(t) ] = [ \amp * sin(\tau t) ] [g_4(t) ] [ \amp * sin(\tau t) ] [g_5(t) ] [ \amp * sin(\tau t) ] [g_6(t..) ] [ \amp * sin(\tau t) ] \end{verbatim} The nine subscripted functions correspond to the nine labeled motors in \figref{robotDiagram} % used is a function parametrized by five numbers, shown in \figref{foo} \subsection{Learning Methods for Parametrized Motion Models} \seclabel{learningMethods} Using the SineModel5 parametrized motion model described in the previous section, along with the allowable ranges for each of the five parameters (shown in \tabref{parameterTable}), the task becomes how to choose the combination of five parameters that results in the fastest motion, per the evaluation methods in \secref{fitnessEvaluation}. If we choose a value for the five dimensional parameter $\vec{\theta}$, then a given physical trial gives us one measurement of the fitness of that parameter vector, or $f(\vec{\theta})$. Two things make learnign difficult. First, each evaluation of $f(\vec{\theta})$ is expensive, taking 15-20 seconds of time on average. Second, the fitness returned by such evaluations has proven to be very noisy, with the standard deviation of the noise often being roughly equivalent to the size of the measurement. For both of these reasons, an intelligent learning algorithm is needed. In this context, by learning algorithm we simply mean a method for choosing: 1) The value for $\vec{\theta}$ for the initial trial, and 2) The next value of $\vec{\theta}$ to try, given the $\vec{\theta}$ and $f(\vec{\theta})$ for all previous trials We evaluated a total of eight subtly or not so subtly different learning methods. All employed a simple random sampling method for requirement (1); that is, all methods picked their initial $\vec{\theta}$ value via uniform random sampling within the allowed parameter ranges. \edit{make sure this is true}. Thus, the differences in the algorithms was how the selected new $\vec{\theta}$ values to try from their past experience. The eight methods used, and their methods for choosing the next $\vec{\theta}$ are: \edit{make figure to go here, show drawing with inner/outer, l/r, and f/b motors.} \edit{JMC: Use photo of physical robot in this image, and add labels in photoshop?} \edit{write this. Old quadratot Method section below alsdkfj } %\section{Method} %\seclabel{method} % Describe in reasonable detail the algorithm you are using to address % this problem. A pseudo-code description of the algorithm you are % using is frequently useful. If it makes sense for your project, % trace through a concrete example, showing how your algorithm % processes this example. The example should be complex enough to % illustrate all of the important aspects of the problem but simple % enough to be easily understood. If possible, an intuitively % meaningful example is better than one with meaningless symbols. %We use several parametrized motion models that command motors to %positions based on a sine wave, creating a periodic pattern. While we %investigated several models, for the bulk of our experiments, we used %a model whose five parameters are: amplitude, wavelength, scale inner %vs outer motors, scale left vs right motors, scale back vs front %motors. Each strategy below attempts to choose the best possible %parameters for this motion model. \edit{did you want the next paragraph commented out} We implemented and tested 8 different learning strategies. All strategies except for the HyperNEAT method\cite{clune2009evolving} were constrained to pick parameters from within predetermined ranges. \edit{JMC:Bullets=narrower column=wasted space. subsubsection instead?} \begin{itemize} \item \emph{Random}: This method randomly generates parameter vectors in the allowable range for every trial. This strategy was used only as baseline for comparison. \item \emph{Uniform random hill climbing}: This method begins by selecting a single random parameter vector. Subsequent iterations generate a neighbor by randomly choosing one parameter to adjust and replacing it with a new value chosen with uniform probability in the allowable range for that parameter. The neighbor is evaluated by running the robot with the newly chosen parameters. If this neighbor results in a longer distance walked than the previous best gait, it is saved as the new best gait. The process is then repeated, always starting with the best gait. \item \emph{Gaussian random hill climbing}: This method works similarly to Uniform random hill climbing, except neighbors are generated by adding random Gaussian noise to the current best gait. This results in all parameters being changed at once, but the resulting vector is always fairly close to the previous best gait. We used independently selected noise in each dimension, scaled such that the standard deviation of the noise was 5\% of the range of that dimension. \item \emph{N-dimensional policy gradient descent}: In \cite{kohl}, Kohl and Stone document a method for local gradient ascent for gait learning with noisy fitness evaluations, and we have included this method in our evaluation. This strategy explicitly estimates the gradient for the objective function. It does this by first evaluating \emph{n} randomly generated parameter vectors near the initial vector, each dimension of these vectors being perturbed by either $-\epsilon$, $0$, or $\epsilon$. Then, for each dimension, it groups vectors into three groups: $-\epsilon$, $0$, and $\epsilon$. The gradient along this dimension is then estimated as the average score for the $\epsilon$ group minus the average score for the $-\epsilon$ group. Finally, the method creates a new vector by changing all parameters by a fixed-size step in the direction of the gradient. \item \emph{Nelder-Mead simplex method}: The Nelder-Mead simplex method \cite{nm} creates an initial simplex with $d+1$ vertices, where $d$ is the dimension of the parameter space. The initial parameter vector is stored as the first vertex and the other five vertices are created by adding to one dimension at a time one tenth of the allowable range for that parameter. It then tests the fitness of each vertex and based on these fitnesses, it reflects the worst point over the centroid in an attempt to improve it. In general, the worst vertex is reflected; however, to prevent cycles and becoming stuck in local minima, several other rules are used. If the reflected point is better than the second worst point and worse than the best point, then the reflected point replaces the worst. If the reflected point is better than the best point, the simplex is expanded in the direction of the reflected point. The better of the reflected and the expanded point replaces the worst point. If the reflected point is worse than the second worst point, then the simplex is contracted away from the reflected point. If the contracted point is better than the reflected point, the contracted point replaces the worst point. If the contracted point is worse than the reflected point, the entire simplex is shrunk \cite{nm}. \item \emph{Linear regression}: To initialize, this method chooses and evaluates five random parameter vectors. It then fits a linear model from parameter vector to fitness. In a loop, the method chooses and evaluates a new parameter vector generated by taking a fixed-size step in the direction of the gradient for each parameter, and fits a new linear model to all vectors evaluated so far, choosing the model to minimize the sum of squared errors. \item \emph{SVM regression}: Similarly to linear regression, this model starts with several random vectors, but this time they are chosen in a small neighborhood about some initial random vector. These vectors (generally 8) are evaluated, and a support vector regression model is fit to the observed fitnesses. To choose the next vector for evaluation, we randomly generate some number (typically 100) of vectors in the neighborhood of the best observed gait, and select for evaluation the vector with the best predicted performance. We suspected that if we always chose the best predicted point out of 100, we may end up progressing along a narrow subspace, prohibiting learning of the true local fitness function. Put another way, we would always choose exploitation of knowledge vs. exploration of the space. To address this concern, we added a parameter dubbed \code{bumpBy} that added noise to the final selected point before it was evaluated. Such a method naturally has many tunable parameters, and we endeavored to select these parameters by tuning the method in simulation. To estimate the performance of the algorithm, we ran it against a simulation with a known optimum. The simulated function was in the same five dimensional parameter space, and simply returned a fitness determined as the height of a Gaussian with a random mean. The width of the Gaussian in each dimension was 20\% of the range of each dimension, and the maximum value at the peak was 100. \figref{svm_sim_results} shows the learning results on this simulated model using the ultimately selected SVM parameters. Interestingly, a non-zero value of \code{bumpBy} resulted in better learning than noise free (exploration free) learning. \edit{JMC: This next paragraph sounds like it belongs in Results?} Ultimately, however, the version of SVM tuned for simulation still did not show competitive performance on the real robot. We tried tuning some parameters on the real robot, but after some amount of tuning, the method still exhibited too little exploration and easily became stuck in local minima. \item \emph{Neuroevolution (Evolving Artificial Neural Networks with HyperNEAT)}: HyperNEAT is an indirect encoding for evolving artificial neural networks (ANNs) that is inspired by the way natural organisms develop~\cite{stanley2009hypercube}. It evolves Compositional Pattern Producing Networks (CPPNs)~\cite{stanley2007compositional}, each of which is a genome that encodes an ANN phenotype~\cite{stanley2009hypercube}. Each CPPN is itself a directed graph, where each node is a mathematical function, such as sine or Gaussian. The nature of these functions can facilitate the evolution of properties such as symmetry (e.g., an absolute value or Gaussian function) and repetition (e.g., a sine function)~\cite{stanley2009hypercube, stanley2007compositional}. The signal on each link in the CPPN is multiplied by that link's weight, which can alter its effect. A CPPN is queried once for each link in the ANN substrate to determine that link's weight. The inputs to the CPPN are the Cartesian coordinates of both the source (e.g., $x = 2$, $y = 4$) and target (e.g., $x = 3$, $y = 5$) nodes of a link and a constant bias value. The CPPN takes these five values as inputs and produces one or two output values, depending on the substrate topology. If there is no hidden layer in the substrate, the single output is the weight of the link between a source node on the input layer and a target node on the output layer. If there is a hidden layer, the first output value determines the weight of the link between the associated input (source) and hidden layer (target) nodes, and the second output value determines the weight of the link between the associated hidden (source) and output (target) layer nodes. All pairwise combinations of source and target nodes are iteratively passed as inputs to a CPPN to deter-mine the weight of each substrate link. HyperNEAT is capable of exploiting the geometry of a problem: because the link values between nodes in the final ANN substrate are a function of the geometric positions of those nodes, HyperNEAT can exploit such information when solving a problem~\cite{stanley2009hypercube, clune2009sensitivity, clune2011performance}. In the case of quadruped locomotion, this property helped Hyper-NEAT produce gaits with front-back, left-right, and four-way symmetries~\cite{clune2009evolving, clune2011performance}. The evolution of the population of CPPNs occurs according to the principles of the NeuroEvolution of Augmenting Topologies (NEAT) algorithm~\cite{stanley2002evolving}, which was originally designed to evolve ANNs. NEAT can be fruitfully applied to CPPNs because of their structural similarity to ANNs. For example, mutations can add a node, and thus a function, to a CPPN graph, or change its link weights. The NEAT algorithm is unique in three main ways~\cite{stanley2002evolving}. Initially, it starts with small genomes that encode simple networks and slowly complexifies them via mutations that add nodes and links to the network, enabling the algorithm to evolve the network topology in addition to its weights. Secondly, NEAT has a fitness sharing mechanism that preserves diversity in the system and gives time for new innovations to be tuned by evolution before competing them against more adapted rivals. Finally, NEAT tracks historical information to perform intelligent crossover while avoiding the need for expensive topological analysis. A full explanation of NEAT can be found in~\cite{stanley2002evolving}]. \edit{JMC: fix next paragraph} Following Clune et al.~\cite{clune2011performance, clune2009evolving}, the ANNs for the robot consist of three 5x4 Cartesian grids of nodes forming input, hidden, and output layers. Adjacent layers were completely connected, meaning that there were (5x4)2 x 2 = 800 links in each substrate. The inputs to the substrate were the current angles of each of the 12 joints of the robot (described below), a touch sen-sor that provides a 1 if the lower leg is touching the ground and a 0 if it is not, the pitch, roll, and yaw of the torso, and a modified sine wave (to facilitate the production of periodic behaviors). The sine wave was the following function of time (t) in milli-seconds: sin(120 x t) x ¹. This function produces numbers from Ð¹?to ¹, which was the range of the unconstrained joints. During preliminary tests, we experimentally found the constant 120 to produce fast, natural gaits, and determined that the touch, pitch, roll, yaw, and sine inputs all contributed to the ability to evolve fast gaits [3]. The ANN substrate outputs were the desired angles for each joint, which were fed into proportional controllers that applied forces to move the joints toward the desired angles. Robots were evaluated in the ODE physics simulator (www.ode.org). The rectangular torso of the robot was (in ODE units) 0.15 wide, 0.3 long, and .05 tall. Each of four legs was composed of three cylinders (length 0.075, radius 0.02) and three hinge joints. The first cylinder functioned as a hip bone. It was parallel to the proximal-distal axis of the torso and barely stuck out from it. The other two cylinders were the upper and lower leg. There were two hip joints and one knee joint. The first hip joint allowed the legs to swing forward and backward (anterior-posterior) and was constrained to 180¡ such that, at maximum extension, it was parallel with the torso. The second hip joint allowed a leg to swing in and out (proximal-distal). Together, the two hip joints approximated a universal joint. The knee joint swung forward and backward. The second hip and knee joints were unconstrained. Each controller in a population of 150 was simulated for 3000 time steps (3 sec-onds). Experiments lasted 1000 generations with a switch point at generation 500. Trials were cut short if any part of the robot except its lower leg touched the ground, or if the number of direction changes in joints exceeded 960. The latter condition re-flects the fact that servo motors cannot be vibrated incessantly without breaking. The fitness of controllers was the following function of d, the maximum distance traveled: . The exponential nature of the function magnified the selective advantage of small increases in the distance traveled. Because HyperNEAT greatly outperforms FT-NEAT on this problem [3], we compare HybrID to only HyperNEAT. % Preliminary HyperNEAT runs were promising and resulted in several % interesting gaits. % Unfortunately, the gaits generated by HyperNEAT % also tended to stress the robot more than typical gaits had before, % and the servos would often overheat and malfunction, requiring % restarts. We think these issues may be addressed by adding a small % layer between the HyperNEAT strategy and the robot that disallows % quickly shifting commanded positions, and we hope to be able to test % these methods further once this filter is in place. \end{itemize} \acmFig{svm_sim_results}{1}{Results for the SVM regression strategy in simulation. This simulation was used to tune the SVM strategy's parameters before trying it on the physical robot. The strategy quickly approaches the maximum simulated fitness of 100.}