# Cartpole Problem

The state and control vectors $\textbf{x}$ and $\textbf{u}$ are defined as follows:

$$
\begin{equation*}
\textbf{x} = \begin{bmatrix}
    x & \theta & \dot{x} & \dot{\theta}
    \end{bmatrix}^T
\end{equation*}
$$

$$
\begin{equation*}
\textbf{u} = \begin{bmatrix}
    F_{x}
    \end{bmatrix}^T
\end{equation*}
$$

The goal is to swing the pendulum upright:

$$
\begin{equation*}
\textbf{x}_{goal} = \begin{bmatrix}
    0 & 0 & 0 & 0
    \end{bmatrix}^T
\end{equation*}
$$

**Note**: The force is constrained between $-1$ and $1$. This is achieved by
instead fitting for unconstrained actions and then applying it to a squashing
function $\tanh(\textbf{u})$. This is directly embedded into the dynamics model
in order to be auto-differentiated. This also means that we need to apply this
transformation manually to the output of our iLQR at the end.

In [1]:
%matplotlib inline

Matplotlib is building the font cache; this may take a moment.


In [2]:
from __future__ import print_function

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import time

In [4]:
from ilqr.cost import QRCost, FiniteDiffCost
from ilqr.mujoco_dynamics import MujocoDynamics
from ilqr.mujoco_controller import iLQR, RecedingHorizonController
from ilqr.examples.cartpole import CartpoleDynamics
from ilqr.dynamics import constrain

from scipy.optimize import approx_fprime

import mujoco_py
from mujoco_py import MjViewer
import os

Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']
Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']


In [5]:
def on_iteration(iteration_count, xs, us, J_opt, accepted, converged):
    J_hist.append(J_opt)
    info = "converged" if converged else ("accepted" if accepted else "failed")
    final_state = xs[-1]
    print("iteration", iteration_count, info, J_opt, final_state)

In [6]:
xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
print(dynamics.dt)


Finished loading process 3653
Finished loading processFinished loading processFinished loading process  3655
 3656
3654
Finished loading processFinished loading process  36693665
Finished loading process
 3666
Finished loading process 3697
0.04
Finished loading process 3710
Finished loading processFinished loading process  3723
3718
Finished loading process 3732


In [7]:
x_goal = np.array([0.0, 0.0, 0.0, 0.0])

# Instantenous state cost.
Q = np.eye(4)
Q[0, 0] = 2.0
Q[1, 1] = 10.0


# Terminal state cost.
Q_terminal = 10 * Q

# Instantaneous control cost.
R = np.eye(1)

cost1 = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)

In [8]:
cost2 = FiniteDiffCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)

Finished loading process 3768
Finished loading processFinished loading processFinished loading process 3777 3772 
Finished loading process
 
37823791
Finished loading processFinished loading processFinished loading process   38173822Finished loading process3813 


3825
Finished loading process 3846Finished loading process
 3855Finished loading process
 3867


In [9]:
N = 100
x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])
"""us_init = np.array([[-4.76751939e-01],
 [ 3.34490970e-01],
 [-3.99608551e-01],
 [ 8.41882163e-01],
 [-8.93302461e-01],
 [-3.57273055e-01],
 [-3.32158856e-01],
 [-4.82030121e-01],
 [-6.84388675e-01],
 [-4.26475287e-01],
 [-4.90913171e-01],
 [ 1.14754770e-01],
 [ 3.90275383e-01],
 [-4.36421243e-01],
 [ 5.57806778e-01],
 [ 7.83813923e-01],
 [-3.27778717e-01],
 [ 8.00582346e-01],
 [-8.49640982e-01],
 [-5.69222128e-01],
 [ 2.58447724e-01],
 [ 6.02857039e-01],
 [-6.11855326e-01],
 [ 7.00853348e-01],
 [-9.31090157e-01],
 [ 4.97665652e-01],
 [ 2.45721323e-01],
 [-1.92025996e-01],
 [ 2.72219728e-02],
 [ 7.95701514e-01],
 [-8.92320606e-01],
 [ 3.22802941e-02],
 [ 2.69562194e-01],
 [-1.46125346e-01],
 [-3.15934186e-02],
 [ 6.61809200e-01],
 [ 4.76622656e-01],
 [-9.78007260e-01],
 [ 5.73481914e-01],
 [-1.28208542e-02],
 [ 1.48147746e-01],
 [ 1.39421731e-04],
 [ 1.08812740e-01],
 [ 6.16007441e-01],
 [ 2.66982969e-01],
 [-2.09250070e-02],
 [ 6.04343953e-02],
 [ 4.14836049e-01],
 [-7.01346473e-01],
 [ 2.94563133e-01],
 [-3.07180590e-01],
 [ 6.53429823e-01],
 [ 3.87696411e-01],
 [-1.60361255e-01],
 [-7.91982930e-01],
 [ 3.04331662e-01],
 [-3.33057338e-01],
 [-1.45487867e-01],
 [-4.48293362e-01],
 [-4.56753222e-01],
 [-5.63113978e-02],
 [ 9.17106858e-01],
 [-7.79117478e-01],
 [-7.74944928e-01],
 [ 1.26081663e-01],
 [ 8.11397037e-02],
 [-6.58667412e-01],
 [ 9.01877119e-01],
 [-7.59017615e-01],
 [-6.54909707e-01],
 [-7.19152458e-01],
 [-8.23250291e-01],
 [-1.96576912e-01],
 [ 3.31076346e-01],
 [-9.59322994e-01],
 [ 6.61615691e-01],
 [-4.48940253e-01],
 [-4.10547311e-01],
 [-8.26340358e-01],
 [ 7.48939731e-01],
 [-8.83894866e-01],
 [ 4.12684469e-01],
 [-4.61578622e-01],
 [-8.29689676e-01],
 [-9.02561735e-01],
 [-2.44970624e-01],
 [ 2.86652487e-01],
 [-8.59512109e-01],
 [-5.89043961e-01],
 [ 6.21286175e-01],
 [-4.02464523e-01],
 [-7.80221770e-01],
 [-7.58513349e-01],
 [ 5.35469863e-01],
 [ 7.43535637e-01],
 [ 9.40814704e-01],
 [-9.31071558e-01],
 [-4.20465454e-01],
 [-1.28056017e-01],
 [-2.09487816e-01]])"""
us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
ilqr = iLQR(dynamics, cost2, N)
mpc = RecedingHorizonController(x0, ilqr)

In [10]:
t0 = time.time()
J_hist = []
controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
us = []
for i in range(100):
    print('ITERATION', i, '\n')
    us.append(next(controls)[1])
    
print('time', time.time() - t0)

ITERATION 0 

iteration 0 accepted 10635.090204523765 [-2.63330418  2.38628719 -5.10295909  1.45788601]
iteration 1 accepted 9995.477840814894 [-2.58783978  2.45123803 -4.6641613   1.67769198]
iteration 2 accepted 9696.79345926941 [-2.11492944  2.55962825 -4.13920904  2.04776674]
iteration 3 accepted 9688.00649139876 [-2.10094165  2.51421408 -4.32939054  2.05902317]
iteration 4 accepted 9208.055861514957 [-1.91627104  2.51310703 -4.31151025  1.99918333]
iteration 5 accepted 9007.106123720017 [-1.99288411  2.49089538 -4.34337049  2.02518945]
iteration 6 accepted 8981.26639245459 [-1.97737148  2.49522324 -4.34781221  2.02819938]
iteration 7 accepted 8757.796434406313 [-1.95626061  2.49889678 -4.31484445  2.01987812]
iteration 8 accepted 8672.608783954023 [-1.94234264  2.50445158 -4.34160935  2.03173426]
iteration 9 accepted 8598.787430124767 [-1.89336314  2.52451071 -4.36506617  2.02492349]
iteration 10 accepted 8348.778724924292 [-1.93366224  2.51466769 -4.40381737  2.04216891]
iteratio

iteration 91 accepted 3827.329631884912 [ 0.11345225 -0.00449269  0.01506507  0.00619091]
iteration 92 accepted 3826.547842269244 [ 0.11151359 -0.00461279  0.01576323  0.00717067]
iteration 93 converged 3826.5461835266897 [ 0.10439518 -0.00476232  0.01791113  0.00903735]
ITERATION 1 

iteration 0 accepted 3776.94406778967 [ 0.10353823 -0.00454211  0.01743622  0.00474731]
iteration 1 accepted 3775.6536332540186 [ 0.09998067 -0.00454591  0.01714267  0.00553585]
iteration 2 accepted 3775.638774989958 [ 0.08680145 -0.0045408   0.01642046  0.00886227]
iteration 3 accepted 3775.4725562928343 [ 0.08702758 -0.00454638  0.01624734  0.00892839]
iteration 4 accepted 3775.458101211906 [ 0.08676407 -0.00455426  0.01631151  0.00905745]
iteration 5 converged 3775.457587758383 [ 0.08613501 -0.00457136  0.01645138  0.0093671 ]
ITERATION 2 

iteration 0 accepted 3720.6596723217795 [ 0.08562032 -0.00425789  0.01579949  0.00559731]
iteration 1 accepted 3720.6554454455113 [ 0.08278189 -0.00421424  0.013001

iteration 0 converged 58.11583517931131 [ 4.14121416e-03 -7.42148446e-05  1.38057973e-03  9.06698812e-04]
ITERATION 56 

iteration 0 converged 46.36817300548279 [ 3.94332209e-03 -8.72445415e-05  1.54184513e-03  1.00259460e-03]
ITERATION 57 

iteration 0 converged 36.99527259049856 [ 3.75811631e-03 -9.99337218e-05  1.69790304e-03  1.08426140e-03]
ITERATION 58 

iteration 0 converged 29.60450118242425 [ 0.00358476 -0.00011221  0.00184726  0.00115623]
ITERATION 59 

iteration 0 converged 23.811972026996198 [ 0.0034225  -0.00012403  0.00198925  0.00122115]
ITERATION 60 

iteration 0 converged 19.28178740687801 [ 0.00327061 -0.00013538  0.00212368  0.00128063]
ITERATION 61 

iteration 0 converged 15.733562058014828 [ 0.00312844 -0.00014626  0.00225062  0.00133564]
ITERATION 62 

iteration 0 converged 12.940778677613302 [ 0.00299537 -0.00015665  0.00237029  0.00138683]
ITERATION 63 

iteration 0 converged 10.72521627883752 [ 0.00287083 -0.00016658  0.00248296  0.00143463]
ITERATION 64 

iter

In [14]:
viewer = MjViewer(dynamics.sim)
dynamics.set_state(x0)
print(dynamics.get_state())
for i, u in enumerate(us):
    dynamics.step(u[0])
    viewer.render()

Creating window glfw
[ 0.         -1.82006901  0.          0.        ]
