### Subversion checkout URL

You can clone with HTTPS or Subversion.

Fetching contributors…

Cannot retrieve contributors at this time

file 87 lines (74 sloc) 4.569 kb
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 \section{Methods}\subsection{Policy Representation by Splines}The simplest model with back-compatibility is geometricsplines. For a given model f(x) with K knots, we can preserve theexact shape of the generated curve while adding extra knots to theoriginal spline. Say, if we put one additional knot between every twoconsecutive knots of the original spline, we end up with a 2K - 1knots and a spline that has the same shape as the original one. Inorder to do this, we need to define an algorithm for evolving theparameterization from $K$ to $L$ knots ($L > K$), which is formulated as Algorithm 1 in \cite{kormushev2011bipedal-walking-energy}. Without loss of generality, the policy parameters arenormalized into $[0, 1]$, and appropriately scaled/shifted as necessarylater upon use.\subsection{Parameterized Gaits by RL PoWER}Here we used an RL approach to change the complexity of the policyrepresentation dynamically while the trial is running. In\cite{kormushev2011bipedal-walking-energy}'s studies on reducing energyconsumption for bipedal robots, a mechanism that canevolve the policy parameterization was used. The method starts from avery simple parameterization and gradually increases itsrepresentational capability. The method was tested to be capable of generatingan adaptive policy parameterization that can accommodate increasinglymore complex policies. Presented in the studies of \cite{kormushev2011bipedal-walking-energy}, the policygenerated by this approach can reach the global optimum at a fastrate when applied to the energy reduction problem. Another property found about this method is its chance of converging to a suboptimal solution is reduced, because in the lower-dimensional representation this effect is less exhibited.\figp{powerSplinesExample}{.6}{An example for an evolving policy parameterization based onspline representation of the policy. The set of spline knots is the policyparameterization. The spline knots are the actual policy parametervalues. This original parameterization starts from 4 knots and grows up to 32 knots}\cite{kober2009learning-motor-primitives} proposed a RL algorithmcalled Policy learning by Weighting Exploration with theReturns(PoWER), which is based on Expectation-Maximization algorithm(EM). The proposed technique for evolving the policy parameterizationis a combination with this EM-based RL algorithm, named PoWER \cite{kober2009learning-motor-primitives}. The reason for using this is its relatively fewer parameters that need tuning. Weevolved the policy parameterization only on those past trials rankedthe highest by the importance sampling technique used by the PoWERalgorithm. The intuition behind is that highly rankedparameterizations have more potential to evolve even better in thefuture. Besides, evolving all the parameterizations increases the exploring space. Since our experimentis done on a physical robot, explore all the variations of everyparameterization is not practical. Future work may incorporate simulations into the studies, as illustrated in \cite{bongard2006resilient-machines-through}.For the experiment, we set 3 knots for each servo and there are 8servos in total. The servo in the hip is not used inour experiment. Previous work has verified that quadruped gaitsperform better when they are coordinated \cite{clune2009evolving-coordinated-quadruped} \cite{clune2011on-the-performance-of-indirect-encoding}\cite{valsalam2008modular-neuroevolution-for-multilegged}. For each spline, we calculate its corresponding parameterized gait for one unit time cycle. Given that, then apply the samepattern to every cycle throughout the 12 seconds of onetrial. Specifically, each spline(a set of 3 knots) is interpreted to its corresponding servo positions asfollowing:\begin{table}[b]\begin{center}\begin{tabular}{|c|c|c|}\hlineParameters & & \\in $\vec{\theta}$ & Description & Range \\\hline\hline$f(s1,s2,s3)$ & Spline function & [0,1] \\ %subject to change\hline$R$ & Position multiplier & [256, 768] \\\hline\end{tabular}\caption{The \emph{RL PoWER} motion model parameters.}\tablabel{parameters}\label{tab:params}\end{center}\end{table}$\vec{g}(t) =\left[ {\begin{array}{c@{ }c@{ }c@{ }l@{ }l}R \cdot f(s1, s2, s3) & \ \ & \ & \ & + C \\ % 10 & \ & \ & \ & + C_C \\ % 8\end{array} } \right]$
Something went wrong with that request. Please try again.