# Cartpole Problem

The state and control vectors $\textbf{x}$ and $\textbf{u}$ are defined as follows:

$$
\begin{equation*}
\textbf{x} = \begin{bmatrix}
    x & \theta & \dot{x} & \dot{\theta}
    \end{bmatrix}^T
\end{equation*}
$$

$$
\begin{equation*}
\textbf{u} = \begin{bmatrix}
    F_{x}
    \end{bmatrix}^T
\end{equation*}
$$

The goal is to swing the pendulum upright:

$$
\begin{equation*}
\textbf{x}_{goal} = \begin{bmatrix}
    0 & 0 & 0 & 0
    \end{bmatrix}^T
\end{equation*}
$$

**Note**: The force is constrained between $-1$ and $1$. This is achieved by
instead fitting for unconstrained actions and then applying it to a squashing
function $\tanh(\textbf{u})$. This is directly embedded into the dynamics model
in order to be auto-differentiated. This also means that we need to apply this
transformation manually to the output of our iLQR at the end.

In [1]:
%matplotlib inline

In [2]:
from __future__ import print_function

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import time

In [4]:
from ilqr.cost import QRCost, FiniteDiffCost
from ilqr.mujoco_dynamics import MujocoDynamics
from ilqr.mujoco_controller import iLQR, RecedingHorizonController
from ilqr.examples.cartpole import CartpoleDynamics
from ilqr.dynamics import constrain

from scipy.optimize import approx_fprime

import mujoco_py
from mujoco_py import MjViewer
import os

Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']
Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']


In [5]:
def on_iteration(iteration_count, xs, us, J_opt, accepted, converged):
    info = "converged" if converged else ("accepted" if accepted else "failed")
    final_state = xs[-1]
    print("iteration", iteration_count, info, J_opt, final_state)

In [7]:
xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
print(dynamics.dt)


Finished loading processFinished loading process  3701
3702
Finished loading processFinished loading process 3703 Finished loading process 3704

3705
Finished loading process Finished loading process3710 
3728
Finished loading processFinished loading process  37503737

0.04
Finished loading process 3755
Finished loading process 3773Finished loading process
 3786


In [9]:
cost2 = FiniteDiffCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)

Finished loading process 3816
Finished loading process 3819
Finished loading process 3828
Finished loading process 3837
Finished loading process 3846
Finished loading process 3855Finished loading process
 3865Finished loading process
 3872Finished loading process
 3881
Finished loading process 3894Finished loading process
 3901Finished loading process
 3910


In [9]:
N = 100
x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])
"""us_init = np.array([[-4.76751939e-01],
 [ 3.34490970e-01],
 [-3.99608551e-01],
 [ 8.41882163e-01],
 [-8.93302461e-01],
 [-3.57273055e-01],
 [-3.32158856e-01],
 [-4.82030121e-01],
 [-6.84388675e-01],
 [-4.26475287e-01],
 [-4.90913171e-01],
 [ 1.14754770e-01],
 [ 3.90275383e-01],
 [-4.36421243e-01],
 [ 5.57806778e-01],
 [ 7.83813923e-01],
 [-3.27778717e-01],
 [ 8.00582346e-01],
 [-8.49640982e-01],
 [-5.69222128e-01],
 [ 2.58447724e-01],
 [ 6.02857039e-01],
 [-6.11855326e-01],
 [ 7.00853348e-01],
 [-9.31090157e-01],
 [ 4.97665652e-01],
 [ 2.45721323e-01],
 [-1.92025996e-01],
 [ 2.72219728e-02],
 [ 7.95701514e-01],
 [-8.92320606e-01],
 [ 3.22802941e-02],
 [ 2.69562194e-01],
 [-1.46125346e-01],
 [-3.15934186e-02],
 [ 6.61809200e-01],
 [ 4.76622656e-01],
 [-9.78007260e-01],
 [ 5.73481914e-01],
 [-1.28208542e-02],
 [ 1.48147746e-01],
 [ 1.39421731e-04],
 [ 1.08812740e-01],
 [ 6.16007441e-01],
 [ 2.66982969e-01],
 [-2.09250070e-02],
 [ 6.04343953e-02],
 [ 4.14836049e-01],
 [-7.01346473e-01],
 [ 2.94563133e-01],
 [-3.07180590e-01],
 [ 6.53429823e-01],
 [ 3.87696411e-01],
 [-1.60361255e-01],
 [-7.91982930e-01],
 [ 3.04331662e-01],
 [-3.33057338e-01],
 [-1.45487867e-01],
 [-4.48293362e-01],
 [-4.56753222e-01],
 [-5.63113978e-02],
 [ 9.17106858e-01],
 [-7.79117478e-01],
 [-7.74944928e-01],
 [ 1.26081663e-01],
 [ 8.11397037e-02],
 [-6.58667412e-01],
 [ 9.01877119e-01],
 [-7.59017615e-01],
 [-6.54909707e-01],
 [-7.19152458e-01],
 [-8.23250291e-01],
 [-1.96576912e-01],
 [ 3.31076346e-01],
 [-9.59322994e-01],
 [ 6.61615691e-01],
 [-4.48940253e-01],
 [-4.10547311e-01],
 [-8.26340358e-01],
 [ 7.48939731e-01],
 [-8.83894866e-01],
 [ 4.12684469e-01],
 [-4.61578622e-01],
 [-8.29689676e-01],
 [-9.02561735e-01],
 [-2.44970624e-01],
 [ 2.86652487e-01],
 [-8.59512109e-01],
 [-5.89043961e-01],
 [ 6.21286175e-01],
 [-4.02464523e-01],
 [-7.80221770e-01],
 [-7.58513349e-01],
 [ 5.35469863e-01],
 [ 7.43535637e-01],
 [ 9.40814704e-01],
 [-9.31071558e-01],
 [-4.20465454e-01],
 [-1.28056017e-01],
 [-2.09487816e-01]])"""
us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
ilqr = iLQR(dynamics, cost2, N)
mpc = RecedingHorizonController(x0, ilqr)

In [10]:
t0 = time.time()
controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
us = []
for i in range(100):
    print('ITERATION', i, '\n')
    us.append(next(controls)[1])
    
print('time', time.time() - t0)

ITERATION 0 

         12913 function calls (12850 primitive calls) in 0.151 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(all)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(any)
       36    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(atleast_1d)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(column_stack)
      222    0.000    0.000    0.002    0.000 <__array_function__ internals>:2(concatenate)
        8    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(diag)
       10    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(empty_like)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ones_like)
       18    0.

        7    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        7    0.000    0.000    0.000    0.000 threading.py:1104(is_alive)
        2    0.000    0.000    0.000    0.000 threading.py:215(__init__)
        4    0.000    0.000    0.000    0.000 threading.py:239(__enter__)
        4    0.000    0.000    0.000    0.000 threading.py:242(__exit__)
        2    0.000    0.000    0.000    0.000 threading.py:248(_release_save)
        2    0.000    0.000    0.000    0.000 threading.py:251(_acquire_restore)
        4    0.000    0.000    0.000    0.000 threading.py:254(_is_owned)
        2    0.000    0.000    0.100    0.050 threading.py:263(wait)
        2    0.000    0.000    0.000    0.000 threading.py:334(notify)
        2    0.000    0.000    0.000    0.000 threading.py:498(__init__)
        9    0.000    0.000    0.000    0.000 threading.py:506(is_set)
        2    0.000    0.000    0.100    0.050 threading.py:533(wait)
      100    0.001    0.000    0.

NameError: name 'J_hist' is not defined

In [14]:
viewer = MjViewer(dynamics.sim)
dynamics.set_state(x0)
print(dynamics.get_state())
for i, u in enumerate(us):
    dynamics.step(u[0])
    viewer.render()

Creating window glfw
[ 0.         -1.82006901  0.          0.        ]


In [6]:
def run_mpc():
    xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
    dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
    print(dynamics.dt)
    cost2 = FiniteDiffCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)
    N = 100
    x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])

    us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
    ilqr = iLQR(dynamics, cost2, N)
    mpc = RecedingHorizonController(x0, ilqr)
    t0 = time.time()
    controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
    us = []
    for i in range(100):
        print('ITERATION', i, '\n')
        us.append(next(controls)[1])

    print('time', time.time() - t0)
    
import cProfile
cProfile.run('run_mpc()')


Finished loading process 5706
Finished loading processFinished loading process  57085707
Finished loading processFinished loading process
  Finished loading process57185711 Finished loading process5719 


5724Finished loading process
0.04
 5741
Finished loading process 5756
Finished loading process 5769
Finished loading processFinished loading processFinished loading process  57805801

Finished loading process  58065789Finished loading process

 5825Finished loading process
 5839Finished loading process 
5846
Finished loading process 5855Finished loading process
 5866Finished loading process
 5873Finished loading process
 5882
Finished loading process 5895
Finished loading process 5902
Finished loading process 5911
ITERATION 0 

iteration 0 accepted 10065.979664620818 [ 3.00106931e+00 -2.80344014e+00 -4.98248567e-03 -5.07765244e+00]
iteration 1 accepted 9785.663861908346 [ 2.6023478  -2.80945043 -2.62641463 -6.63500605]
iteration 2 accepted 9202.394513303316 [ 2.9833879  -2.09636296  0

iteration 0 accepted 3939.522064851824 [-0.16643687 -0.42331757 -1.74964422 -0.90529532]
iteration 1 accepted 3939.4412885474676 [-0.16850754 -0.4263664  -1.77234516 -0.89290559]
iteration 2 accepted 3939.4203218390226 [-0.17031389 -0.43020553 -1.80526405 -0.86625403]
iteration 3 accepted 3939.4027298859924 [-0.17049034 -0.43111535 -1.81374085 -0.85700837]
iteration 4 converged 3939.4025050655396 [-0.17044018 -0.4323667  -1.82578905 -0.84252214]
ITERATION 3 

iteration 0 accepted 3882.325161096838 [-0.18730756 -0.36259058 -1.65466213 -0.82629798]
iteration 1 accepted 3882.1587366013778 [-0.18890061 -0.36529641 -1.67291135 -0.81565632]
iteration 2 accepted 3882.1243313874056 [-0.1897171  -0.36707012 -1.68675686 -0.80409774]
iteration 3 accepted 3882.118633292216 [-0.19027835 -0.369512   -1.70661912 -0.78443804]
iteration 4 accepted 3882.1093883469757 [-0.19029402 -0.37015286 -1.71183143 -0.77838826]
iteration 5 converged 3882.107321741595 [-0.1901964  -0.37062041 -1.71563871 -0.77357301

iteration 1 accepted 2184.966937086076 [-0.15983312  0.00306739 -0.08886023 -0.03604408]
iteration 2 accepted 2184.961273850733 [-0.15770048  0.00339486 -0.08563086 -0.03573118]
iteration 3 converged 2184.960725428288 [-0.15437815  0.00388603 -0.0806733  -0.03534827]
ITERATION 30 

iteration 0 accepted 2180.8125876916083 [-0.15161235  0.00351757 -0.07171772 -0.02861773]
iteration 1 accepted 2180.807065966058 [-0.14978975  0.00375848 -0.06912413 -0.02841848]
iteration 2 converged 2180.805572934289 [-0.14835418  0.00393662 -0.06702068 -0.02839397]
ITERATION 31 

iteration 0 accepted 2177.5097742048933 [-0.13681194  0.00464442 -0.04775962 -0.02072053]
iteration 1 accepted 2177.5065040152003 [-0.13678432  0.00460791 -0.0475225  -0.02109658]
iteration 2 converged 2177.506061770209 [-0.13673363  0.00457931 -0.04731188 -0.02146074]
ITERATION 32 

iteration 0 converged 2174.8726287942663 [-0.13030632  0.00436653 -0.03636978 -0.01583911]
ITERATION 33 

iteration 0 accepted 2172.749533733246 [-0

iteration 0 converged 54.55194192802744 [-0.00249741 -0.00064664  0.00739794  0.00358606]
ITERATION 92 

iteration 0 converged 42.77603101439618 [-0.00226812 -0.00063079  0.00721464  0.00346357]
ITERATION 93 

iteration 0 converged 33.44837321176244 [-0.00205361 -0.00061537  0.00703545  0.00336239]
ITERATION 94 

iteration 0 converged 26.1839108336463 [-0.00185287 -0.00060047  0.00686301  0.00327523]
ITERATION 95 

iteration 0 converged 20.581755418276643 [-0.001665   -0.00058616  0.0066986   0.00319783]
ITERATION 96 

iteration 0 converged 16.286549608094063 [-0.00148917 -0.00057245  0.00654271  0.00312767]
ITERATION 97 

iteration 0 converged 13.000121643852736 [-0.00132461 -0.00055935  0.0063954   0.00306321]
ITERATION 98 

iteration 0 converged 10.481075848853981 [-0.00117059 -0.00054685  0.00625652  0.00300351]
ITERATION 99 

iteration 0 converged 8.53949674472962 [-0.00102646 -0.00053496  0.00612577  0.00294792]
time 30.057520866394043
         4141767 function calls (4121119 pri