-
Notifications
You must be signed in to change notification settings - Fork 6
/
learningMethods.tex.orig
229 lines (176 loc) · 18.6 KB
/
learningMethods.tex.orig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
\subsection{Parametrized Motion Models}
\seclabel{motionModel}
\edit{this might no longer flow...fix} There are a plethora of methods available for the creation of a vector motion generating function $g(t)$, and we've chosen two for comparison in this study.
The first method is to choose a family of \emph{parametrized functions}. By fixing the parameters to a particular set of values, we obtain a deterministic motion function over time. We tried several parametrizations on the robot and, obtaining reasonble early success, settled on one particular parametrization. We dubbed this the \emph{SineModel5}, as it has five parameters and employs a sine wave as its root pattern.
The five parameters in $\vec{\theta}$ are as follows:
- amp - Amplitude - allowable range
- tau - sine wave period - allowable range
- multIO - multiplier scaling inner joings from outer joints - allowable range
- multFB - multiplier scaling front joints from back joints - allowable range
- multLR - multiplier scaling left joints from right joints - allowable range
(verify this)
\newcommand{\amp}{\mathrm{amp}}
Inuitively, SineModel5 starts with 8 identicaly sine waves of amplitude $\amp$ and period $\tau$, multiplies the waves for all inner motors by multIO, multiplies the waves for all front motors by multFB, and multiplies the waves for all left motors by multLR. To obtain the actual motor position commands, these waves are offset by an appropriate fixed constant so that the base position (when the sine waves are at 0) is approximately a crouch (the position shown in \figref{crouchingRobot}). Finally, in this motion model, the ninth "hip" motor is kept at a neutral position. Thus, the commanded position for all motors, as a vector function of time, is:
\begin{verbatim}
[g_0(t) ] [ \amp * sin(\tau t) ]
[g_1(t) ] [ \amp * sin(\tau t) ]
[g_2(t) ] [ \amp * sin(\tau t) ]
g(t) [g_3(t) ] = [ \amp * sin(\tau t) ]
[g_4(t) ] [ \amp * sin(\tau t) ]
[g_5(t) ] [ \amp * sin(\tau t) ]
[g_6(t..) ] [ \amp * sin(\tau t) ]
\end{verbatim}
The nine subscripted functions correspond to the nine labeled motors in \figref{robotDiagram}
% used is a function parametrized by five numbers, shown in \figref{foo}
\subsection{Learning Methods for Parametrized Motion Models}
\seclabel{learningMethods}
Using the SineModel5 parametrized motion model described in the previous section, along with the allowable ranges for each of the five parameters (shown in \tabref{parameterTable}), the task becomes how to choose the combination of five parameters that results in the fastest motion, per the evaluation methods in \secref{fitnessEvaluation}.
If we choose a value for the five dimensional parameter $\vec{\theta}$, then a given physical trial gives us one measurement of the fitness of that parameter vector, or $f(\vec{\theta})$. Two things make learnign difficult. First, each evaluation of $f(\vec{\theta})$ is expensive, taking 15-20 seconds of time on average. Second, the fitness returned by such evaluations has proven to be very noisy, with the standard deviation of the noise often being roughly equivalent to the size of the measurement.
For both of these reasons, an intelligent learning algorithm is needed.
In this context, by learning algorithm we simply mean a method for choosing:
1) The value for $\vec{\theta}$ for the initial trial, and
2) The next value of $\vec{\theta}$ to try, given the $\vec{\theta}$ and $f(\vec{\theta})$ for all previous trials
We evaluated a total of eight subtly or not so subtly different learning methods. All employed a simple random sampling method for requirement (1); that is, all methods picked their initial $\vec{\theta}$ value via uniform random sampling within the allowed parameter ranges. \edit{make sure this is true}. Thus, the differences in the algorithms was how the selected new $\vec{\theta}$ values to try from their past experience.
The eight methods used, and their methods for choosing the next $\vec{\theta}$ are:
\edit{make figure to go here, show drawing with inner/outer, l/r, and f/b motors.}
\edit{JMC: Use photo of physical robot in this image, and add labels in photoshop?}
\edit{write this. Old quadratot Method section below alsdkfj }
%\section{Method}
%\seclabel{method}
% Describe in reasonable detail the algorithm you are using to address
% this problem. A pseudo-code description of the algorithm you are
% using is frequently useful. If it makes sense for your project,
% trace through a concrete example, showing how your algorithm
% processes this example. The example should be complex enough to
% illustrate all of the important aspects of the problem but simple
% enough to be easily understood. If possible, an intuitively
% meaningful example is better than one with meaningless symbols.
%We use several parametrized motion models that command motors to
%positions based on a sine wave, creating a periodic pattern. While we
%investigated several models, for the bulk of our experiments, we used
%a model whose five parameters are: amplitude, wavelength, scale inner
%vs outer motors, scale left vs right motors, scale back vs front
%motors. Each strategy below attempts to choose the best possible
%parameters for this motion model.
\edit{did you want the next paragraph commented out}
We implemented and tested 8 different learning strategies. All
strategies except for the HyperNEAT method\cite{clune2009evolving} were
constrained to pick parameters from within predetermined ranges.
\edit{JMC:Bullets=narrower column=wasted space. subsubsection instead?}
\begin{itemize}
\item \emph{Random}: This method randomly generates parameter vectors
in the allowable range for every trial. This strategy was used only as baseline for comparison.
\item \emph{Uniform random hill climbing}: This method begins by
selecting a single random parameter vector. Subsequent iterations
generate a neighbor by randomly choosing one parameter to adjust and
replacing it with a new value chosen with uniform probability in the
allowable range for that parameter. The neighbor is evaluated by
running the robot with the newly chosen parameters. If this neighbor
results in a longer distance walked than the previous best gait, it
is saved as the new best gait. The process is then repeated, always
starting with the best gait.
\item \emph{Gaussian random hill climbing}: This method works
similarly to Uniform random hill climbing, except neighbors are
generated by adding random Gaussian noise to the current best gait.
This results in all parameters being changed at once, but the
resulting vector is always fairly close to the previous best gait.
We used independently selected noise in each dimension, scaled such
that the standard deviation of the noise was 5\% of the range of
that dimension.
\item \emph{N-dimensional policy gradient descent}: In \cite{kohl},
Kohl and Stone document a method for local gradient ascent for gait
learning with noisy fitness evaluations, and we have included this
method in our evaluation. This strategy explicitly estimates the
gradient for the objective function. It does this by first
evaluating \emph{n} randomly generated parameter vectors near the
initial vector, each dimension of these vectors being perturbed by
either $-\epsilon$, $0$, or $\epsilon$. Then, for each dimension, it
groups vectors into three groups: $-\epsilon$, $0$, and $\epsilon$.
The gradient along this dimension is then estimated as the average
score for the $\epsilon$ group minus the average score for the
$-\epsilon$ group. Finally, the method creates a new vector by
changing all parameters by a fixed-size step in the direction of the
gradient.
\item \emph{Nelder-Mead simplex method}: The Nelder-Mead simplex
method \cite{nm} creates an initial simplex with $d+1$ vertices,
where $d$ is the dimension of the parameter space. The initial
parameter vector is stored as the first vertex and the other five
vertices are created by adding to one dimension at a time one tenth
of the allowable range for that parameter. It then tests the fitness
of each vertex and based on these fitnesses, it reflects the worst
point over the centroid in an attempt to improve it. In general,
the worst vertex is reflected; however, to prevent cycles and
becoming stuck in local minima, several other rules are used. If
the reflected point is better than the second worst point and worse
than the best point, then the reflected point replaces the worst. If
the reflected point is better than the best point, the simplex is
expanded in the direction of the reflected point. The better of the
reflected and the expanded point replaces the worst point. If the
reflected point is worse than the second worst point, then the
simplex is contracted away from the reflected point. If the
contracted point is better than the reflected point, the contracted
point replaces the worst point. If the contracted point is worse
than the reflected point, the entire simplex is shrunk \cite{nm}.
\item \emph{Linear regression}: To initialize, this method chooses and
evaluates five random parameter vectors. It then fits a linear model
from parameter vector to fitness. In a loop, the method chooses and
evaluates a new parameter vector generated by taking a fixed-size
step in the direction of the gradient for each parameter, and fits a
new linear model to all vectors evaluated so far, choosing the model
to minimize the sum of squared errors.
\item \emph{SVM regression}: Similarly to linear regression, this
model starts with several random vectors, but this time they are
chosen in a small neighborhood about some initial random vector.
These vectors (generally 8) are evaluated, and a support vector
regression model is fit to the observed fitnesses. To choose the
next vector for evaluation, we randomly generate some number
(typically 100) of vectors in the neighborhood of the best observed
gait, and select for evaluation the vector with the best predicted
performance. We suspected that if we always chose the best
predicted point out of 100, we may end up progressing along a narrow
subspace, prohibiting learning of the true local fitness function.
Put another way, we would always choose exploitation of knowledge
vs. exploration of the space. To address this concern, we added a
parameter dubbed \code{bumpBy} that added noise to the final
selected point before it was evaluated.
Such a method naturally has many tunable parameters, and we
endeavored to select these parameters by tuning the method in
simulation. To estimate the performance of the algorithm, we ran it
against a simulation with a known optimum. The simulated function
was in the same five dimensional parameter space, and simply
returned a fitness determined as the height of a Gaussian with a
random mean. The width of the Gaussian in each dimension was 20\%
of the range of each dimension, and the maximum value at the peak
was 100. \figref{svm_sim_results} shows the learning results on
this simulated model using the ultimately selected SVM parameters.
Interestingly, a non-zero value of \code{bumpBy} resulted in better
learning than noise free (exploration free) learning.
\edit{JMC: This next paragraph sounds like it belongs in Results?}
Ultimately, however, the version of SVM tuned for simulation still
did not show competitive performance on the real robot. We tried
tuning some parameters on the real robot, but after some amount of
tuning, the method still exhibited too little exploration and easily
became stuck in local minima.
\item \emph{Neuroevolution (Evolving Artificial Neural Networks with HyperNEAT)}:
HyperNEAT is an indirect encoding for evolving artificial neural networks (ANNs) that is inspired by the way natural organisms develop~\cite{stanley2009hypercube}. It evolves Compositional Pattern Producing Networks (CPPNs)~\cite{stanley2007compositional}, each of which is a genome that encodes an ANN phenotype~\cite{stanley2009hypercube}. Each CPPN is itself a directed graph, where each node is a mathematical function, such as sine or Gaussian. The nature of these functions can facilitate the evolution of properties such as symmetry (e.g., an absolute value or Gaussian function) and repetition (e.g., a sine function)~\cite{stanley2009hypercube, stanley2007compositional}. The signal on each link in the CPPN is multiplied by that link's weight, which can alter its effect.
A CPPN is queried once for each link in the ANN substrate to determine that link's weight. The inputs to the CPPN are the Cartesian coordinates of both the source (e.g., $x = 2$, $y = 4$) and target (e.g., $x = 3$, $y = 5$) nodes of a link and a constant bias value. The CPPN takes these five values as inputs and produces one or two output values, depending on the substrate topology. If there is no hidden layer in the substrate, the single output is the weight of the link between a source node on the input layer and a target node on the output layer. If there is a hidden layer, the first output value determines the weight of the link between the associated input (source) and hidden layer (target) nodes, and the second output value determines the weight of the link between the associated hidden (source) and output (target) layer nodes. All pairwise combinations of source and target nodes are iteratively passed as inputs to a CPPN to deter-mine the weight of each substrate link.
HyperNEAT is capable of exploiting the geometry of a problem: because the link values between nodes in the final ANN substrate are a function of the geometric positions of those nodes, HyperNEAT can exploit such information when solving a problem~\cite{stanley2009hypercube, clune2009sensitivity, clune2011performance}. In the case of quadruped locomotion, this property helped Hyper-NEAT produce gaits with front-back, left-right, and four-way symmetries~\cite{clune2009evolving, clune2011performance}.
The evolution of the population of CPPNs occurs according to the principles of the NeuroEvolution of Augmenting Topologies (NEAT) algorithm~\cite{stanley2002evolving}, which was originally designed to evolve ANNs. NEAT can be fruitfully applied to CPPNs because of their structural similarity to ANNs. For example, mutations can add a node, and thus a function, to a CPPN graph, or change its link weights. The NEAT algorithm is unique in three main ways~\cite{stanley2002evolving}. Initially, it starts with small genomes that encode simple networks and slowly complexifies them via mutations that add nodes and links to the network, enabling the algorithm to evolve the network topology in addition to its weights. Secondly, NEAT has a fitness sharing mechanism that preserves diversity in the system and gives time for new innovations to be tuned by evolution before competing them against more adapted rivals. Finally, NEAT tracks historical information to perform intelligent crossover while avoiding the need for expensive topological analysis. A full explanation of NEAT can be found in~\cite{stanley2002evolving}].
\edit{JMC: fix next paragraph}
Following Clune et al.~\cite{clune2011performance, clune2009evolving}, the ANNs for the robot consist of three 5x4 Cartesian grids of nodes forming input, hidden, and output layers. Adjacent layers were completely connected, meaning that there were (5x4)2 x 2 = 800 links in each substrate. The inputs to the substrate were the current angles of each of the 12 joints of the robot (described below), a touch sen-sor that provides a 1 if the lower leg is touching the ground and a 0 if it is not, the pitch, roll, and yaw of the torso, and a modified sine wave (to facilitate the production of periodic behaviors). The sine wave was the following function of time (t) in milli-seconds: sin(120 x t) x ¹. This function produces numbers from й?to ¹, which was the range of the unconstrained joints. During preliminary tests, we experimentally found the constant 120 to produce fast, natural gaits, and determined that the touch, pitch, roll, yaw, and sine inputs all contributed to the ability to evolve fast gaits [3].
The ANN substrate outputs were the desired angles for each joint, which were fed into proportional controllers that applied forces to move the joints toward the desired angles. Robots were evaluated in the ODE physics simulator (www.ode.org). The rectangular torso of the robot was (in ODE units) 0.15 wide, 0.3 long, and .05 tall. Each of four legs was composed of three cylinders (length 0.075, radius 0.02) and three hinge joints. The first cylinder functioned as a hip bone. It was parallel to the proximal-distal axis of the torso and barely stuck out from it. The other two cylinders were the upper and lower leg. There were two hip joints and one knee joint. The first hip joint allowed the legs to swing forward and backward (anterior-posterior) and was constrained to 180¡ such that, at maximum extension, it was parallel with the torso. The second hip joint allowed a leg to swing in and out (proximal-distal). Together, the two hip joints approximated a universal joint. The knee joint swung forward and backward. The second hip and knee joints were unconstrained.
Each controller in a population of 150 was simulated for 3000 time steps (3 sec-onds). Experiments lasted 1000 generations with a switch point at generation 500. Trials were cut short if any part of the robot except its lower leg touched the ground, or if the number of direction changes in joints exceeded 960. The latter condition re-flects the fact that servo motors cannot be vibrated incessantly without breaking. The fitness of controllers was the following function of d, the maximum distance traveled: . The exponential nature of the function magnified the selective advantage of small increases in the distance traveled. Because HyperNEAT greatly outperforms FT-NEAT on this problem [3], we compare HybrID to only HyperNEAT.
% Preliminary HyperNEAT runs were promising and resulted in several
% interesting gaits.
% Unfortunately, the gaits generated by HyperNEAT
% also tended to stress the robot more than typical gaits had before,
% and the servos would often overheat and malfunction, requiring
% restarts. We think these issues may be addressed by adding a small
% layer between the HyperNEAT strategy and the robot that disallows
% quickly shifting commanded positions, and we hope to be able to test
% these methods further once this filter is in place.
\end{itemize}
\acmFig{svm_sim_results}{1}{Results for the SVM regression strategy
in simulation. This simulation was used to tune the SVM strategy's
parameters before trying it on the physical robot. The strategy
quickly approaches the maximum simulated fitness of 100.}