# Cartpole Problem

The state and control vectors $\textbf{x}$ and $\textbf{u}$ are defined as follows:

$$
\begin{equation*}
\textbf{x} = \begin{bmatrix}
    x & \theta & \dot{x} & \dot{\theta}
    \end{bmatrix}^T
\end{equation*}
$$

$$
\begin{equation*}
\textbf{u} = \begin{bmatrix}
    F_{x}
    \end{bmatrix}^T
\end{equation*}
$$

The goal is to swing the pendulum upright:

$$
\begin{equation*}
\textbf{x}_{goal} = \begin{bmatrix}
    0 & 0 & 0 & 0
    \end{bmatrix}^T
\end{equation*}
$$

**Note**: The force is constrained between $-1$ and $1$. This is achieved by
instead fitting for unconstrained actions and then applying it to a squashing
function $\tanh(\textbf{u})$. This is directly embedded into the dynamics model
in order to be auto-differentiated. This also means that we need to apply this
transformation manually to the output of our iLQR at the end.

In [1]:
%matplotlib inline

In [2]:
from __future__ import print_function

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import time

In [4]:
from ilqr.cost import QRCost, FiniteDiffCost, Cost
from ilqr.mujoco_dynamics import MujocoDynamics
from ilqr.mujoco_controller import iLQR, RecedingHorizonController
from ilqr.examples.cartpole import CartpoleDynamics
from ilqr.dynamics import constrain

from scipy.optimize import approx_fprime

import mujoco_py
from mujoco_py import MjViewer
import os

Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']
Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']


In [5]:
def on_iteration(iteration_count, xs, us, J_opt, accepted, converged):
    info = "converged" if converged else ("accepted" if accepted else "failed")
    final_state = xs[-1]
    print("iteration", iteration_count, info, J_opt, final_state)

In [13]:
xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
print(dynamics.dt)


0.04
Finished loading processFinished loading process 7174
Finished loading process  Finished loading processFinished loading process7170 
71737171

 7172
Finished loading process 7176Finished loading process 
7175
Finished loading process 7177
Finished loading process Finished loading process 7178
7180Finished loading process
 7179Finished loading process
 7181


In [14]:
cost2 = FiniteDiffCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)

Finished loading process 7281
Finished loading process 7284
Finished loading process 7293
Finished loading process 7303Finished loading process
 7312Finished loading process
 7319
Finished loading process 7339Finished loading process
 7330
Finished loading process 7345
Finished loading processFinished loading process  73477342

Finished loading process 7349


In [15]:
N = 100
x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])
"""us_init = np.array([[-4.76751939e-01],
 [ 3.34490970e-01],
 [-3.99608551e-01],
 [ 8.41882163e-01],
 [-8.93302461e-01],
 [-3.57273055e-01],
 [-3.32158856e-01],
 [-4.82030121e-01],
 [-6.84388675e-01],
 [-4.26475287e-01],
 [-4.90913171e-01],
 [ 1.14754770e-01],
 [ 3.90275383e-01],
 [-4.36421243e-01],
 [ 5.57806778e-01],
 [ 7.83813923e-01],
 [-3.27778717e-01],
 [ 8.00582346e-01],
 [-8.49640982e-01],
 [-5.69222128e-01],
 [ 2.58447724e-01],
 [ 6.02857039e-01],
 [-6.11855326e-01],
 [ 7.00853348e-01],
 [-9.31090157e-01],
 [ 4.97665652e-01],
 [ 2.45721323e-01],
 [-1.92025996e-01],
 [ 2.72219728e-02],
 [ 7.95701514e-01],
 [-8.92320606e-01],
 [ 3.22802941e-02],
 [ 2.69562194e-01],
 [-1.46125346e-01],
 [-3.15934186e-02],
 [ 6.61809200e-01],
 [ 4.76622656e-01],
 [-9.78007260e-01],
 [ 5.73481914e-01],
 [-1.28208542e-02],
 [ 1.48147746e-01],
 [ 1.39421731e-04],
 [ 1.08812740e-01],
 [ 6.16007441e-01],
 [ 2.66982969e-01],
 [-2.09250070e-02],
 [ 6.04343953e-02],
 [ 4.14836049e-01],
 [-7.01346473e-01],
 [ 2.94563133e-01],
 [-3.07180590e-01],
 [ 6.53429823e-01],
 [ 3.87696411e-01],
 [-1.60361255e-01],
 [-7.91982930e-01],
 [ 3.04331662e-01],
 [-3.33057338e-01],
 [-1.45487867e-01],
 [-4.48293362e-01],
 [-4.56753222e-01],
 [-5.63113978e-02],
 [ 9.17106858e-01],
 [-7.79117478e-01],
 [-7.74944928e-01],
 [ 1.26081663e-01],
 [ 8.11397037e-02],
 [-6.58667412e-01],
 [ 9.01877119e-01],
 [-7.59017615e-01],
 [-6.54909707e-01],
 [-7.19152458e-01],
 [-8.23250291e-01],
 [-1.96576912e-01],
 [ 3.31076346e-01],
 [-9.59322994e-01],
 [ 6.61615691e-01],
 [-4.48940253e-01],
 [-4.10547311e-01],
 [-8.26340358e-01],
 [ 7.48939731e-01],
 [-8.83894866e-01],
 [ 4.12684469e-01],
 [-4.61578622e-01],
 [-8.29689676e-01],
 [-9.02561735e-01],
 [-2.44970624e-01],
 [ 2.86652487e-01],
 [-8.59512109e-01],
 [-5.89043961e-01],
 [ 6.21286175e-01],
 [-4.02464523e-01],
 [-7.80221770e-01],
 [-7.58513349e-01],
 [ 5.35469863e-01],
 [ 7.43535637e-01],
 [ 9.40814704e-01],
 [-9.31071558e-01],
 [-4.20465454e-01],
 [-1.28056017e-01],
 [-2.09487816e-01]])"""
us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
ilqr = iLQR(dynamics, cost2, N)
mpc = RecedingHorizonController(x0, ilqr)

In [16]:
t0 = time.time()
controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
us = []
for i in range(100):
    print('ITERATION', i, '\n')
    us.append(next(controls)[1])
    
print('time', time.time() - t0)

ITERATION 0 

iteration 0 accepted 12564.546284043461 [ 2.80903313 -2.94598595 -0.48616295 -0.0830509 ]
iteration 1 accepted 12455.300802662583 [ 3.00152834 -3.2945255  -0.01051473 -0.5753186 ]
iteration 2 accepted 12240.566118587934 [ 3.00120385 -3.50105848 -0.00895483 -0.6043347 ]
iteration 3 accepted 11960.16485908725 [ 3.00118565 -3.58826476 -0.00949836 -1.06649372]
iteration 4 accepted 11752.27597564621 [ 3.00151193 -3.48537252 -0.0201105  -4.08058114]
iteration 5 accepted 10970.840714555039 [ 3.01311086 -2.75381801 -0.21138982 -6.88012975]
iteration 6 accepted 10488.20364161994 [ 3.01636014 -2.10424817  0.09760284 -5.34995764]
iteration 7 accepted 10086.354487263447 [ 3.00651477 -1.91125753  0.72905722 -3.66144823]
iteration 8 accepted 9728.9362779536 [ 2.99463945 -2.28791513  0.64903436 -2.72115512]
iteration 9 accepted 9182.165039483474 [ 3.00572733 -2.30394274 -0.02268582 -3.85129112]
iteration 10 accepted 9044.875735629192 [ 2.96383502 -3.62520639  0.20482178 -3.32074969]
ite

iteration 92 accepted 4285.2043732560005 [-0.54689515 -0.34618898 -0.76070215 -1.77342336]
iteration 93 accepted 4253.17458180804 [-0.49585763 -0.20967161 -0.66773199 -1.10142271]
iteration 94 accepted 4218.426568230987 [-0.45180768 -0.1686694  -0.62097703 -0.90429284]
iteration 95 accepted 4199.843239651759 [-0.38687034 -0.10580387 -0.5436913  -0.60035978]
iteration 96 accepted 4193.974008869977 [-0.36501233 -0.08700752 -0.52445742 -0.51290269]
iteration 97 accepted 4187.045599203477 [-0.32966098 -0.05799436 -0.48773261 -0.37623204]
iteration 98 accepted 4184.863044853403 [-0.32490287 -0.04940258 -0.48276125 -0.33852348]
iteration 99 accepted 4181.2852573044665 [-0.30721263 -0.03592475 -0.46504499 -0.27648704]
iteration 100 accepted 4180.875993578915 [-0.31482345 -0.03208486 -0.46608808 -0.2610817 ]
iteration 101 accepted 4178.0150451137615 [-0.30948525 -0.02583409 -0.45612637 -0.23234067]
iteration 102 accepted 4173.910350899388 [-0.30574954 -0.02501668 -0.45257625 -0.22769408]
itera

iteration 0 accepted 2183.0690437845665 [-0.05731918  0.00062158  0.02187707  0.02279251]
iteration 1 converged 2183.0685355142973 [-0.05676559  0.00044977  0.02294263  0.02069931]
ITERATION 31 

iteration 0 converged 2178.4636874169664 [-0.05297969  0.00014423  0.02376746  0.01956214]
ITERATION 32 

iteration 0 converged 2175.076143569873 [-0.05131635  0.00070453  0.0222074   0.02379189]
ITERATION 33 

iteration 0 converged 2172.559640452591 [-4.72293361e-02  4.55090834e-05  2.29561351e-02  2.12159403e-02]
ITERATION 34 

iteration 0 converged 2170.608267581307 [-0.04562947  0.00063695  0.02114781  0.02563885]
ITERATION 35 

iteration 0 converged 2168.9598066008307 [-0.04296425  0.00063185  0.020492    0.02639491]
ITERATION 36 

iteration 0 converged 2167.3752552444066 [-0.04128321  0.00121351  0.01863727  0.03061497]
ITERATION 37 

iteration 0 converged 2165.634075270444 [-0.03568195 -0.0011865   0.0212733   0.01882316]
ITERATION 38 

iteration 0 accepted 2163.5079752212705 [-0.034594

iteration 1 converged 0.8955634632491822 [ 0.00023265 -0.00041059  0.00498521  0.00241774]
ITERATION 96 

iteration 0 accepted 0.7929547772272573 [ 0.00029157 -0.00041868  0.00491672  0.0024001 ]
iteration 1 converged 0.7929547597959774 [ 0.00028613 -0.00040545  0.00493467  0.0023968 ]
ITERATION 97 

iteration 0 accepted 0.7056673476578788 [ 0.00034158 -0.00041373  0.00486951  0.00238065]
iteration 1 converged 0.7056673304500488 [ 0.00033615 -0.0004006   0.00488735  0.0023772 ]
ITERATION 98 

iteration 0 accepted 0.6312896060424243 [ 0.00038835 -0.00040907  0.00482531  0.00236245]
iteration 1 converged 0.6312895890409265 [ 0.00038293 -0.00039604  0.00484305  0.00235885]
ITERATION 99 

iteration 0 accepted 0.5677493143974297 [ 0.00043211 -0.00040469  0.00478394  0.0023454 ]
iteration 1 converged 0.5677492975865575 [ 0.0004267  -0.00039175  0.00480158  0.00234167]
time 26.349048137664795


In [14]:
viewer = MjViewer(dynamics.sim)
dynamics.set_state(x0)
print(dynamics.get_state())
for i, u in enumerate(us):
    dynamics.step(u[0])
    viewer.render()

Creating window glfw
[ 0.         -1.82006901  0.          0.        ]


In [6]:
def run_mpc():
    xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
    dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
    print(dynamics.dt)
    cost2 = FiniteDiffCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)
    N = 100
    x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])

    us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
    ilqr = iLQR(dynamics, cost2, N)
    mpc = RecedingHorizonController(x0, ilqr)
    t0 = time.time()
    controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
    us = []
    for i in range(100):
        print('ITERATION', i, '\n')
        us.append(next(controls)[1])

    print('time', time.time() - t0)
    
import cProfile
cProfile.run('run_mpc()')


Finished loading processFinished loading process Finished loading process21544Finished loading process  2154621547
 

21545Finished loading process 0.04

21548
Finished loading process Finished loading process21556 
21549
Finished loading processFinished loading processFinished loading process  Finished loading process 21583 21606Finished loading process21595

 21592
21619

Finished loading process Finished loading process21611 
21656Finished loading process
 21663
Finished loading process 21676
Finished loading process 21685
Finished loading process 21694
Finished loading process 21707Finished loading process
 21712
Finished loading process 21719
Finished loading process 21730
Finished loading process 21739
Finished loading process 21748
ITERATION 0 

iteration 0 accepted 12834.14329818068 [0.58621932 3.61705717 2.66121069 5.11575213]
iteration 1 accepted 10782.977152782507 [-2.85937597  2.41199809 -4.94540893  1.66886076]
iteration 2 accepted 10341.187819553892 [-2.64246775  2.419936

iteration 83 accepted 5084.191597167458 [-0.12449239 -0.01889786  0.17346066  0.07237435]
iteration 84 accepted 5069.051671210082 [-0.27462848 -0.02268608  0.26051135  0.10792908]
iteration 85 accepted 5045.36138613033 [-0.10375114 -0.01941271  0.17096598  0.07293424]
iteration 86 accepted 5032.466280434582 [-0.30557884 -0.02341178  0.27536392  0.1136979 ]
iteration 87 accepted 5028.890273707137 [-0.06832293 -0.01822271  0.14692751  0.06329805]
iteration 88 accepted 5015.826188092781 [-0.09025206 -0.01809682  0.15496665  0.06625898]
iteration 89 accepted 5014.30230959825 [-0.00111919 -0.01567986  0.10062931  0.0446794 ]
iteration 90 accepted 5011.078058280982 [-0.0007059  -0.01561416  0.10024656  0.04451702]
iteration 91 accepted 5004.4675519135435 [-0.44424507 -0.02518264  0.32899609  0.133875  ]
iteration 92 accepted 4941.312361312003 [-0.27060076 -0.02152302  0.2414242   0.09975113]
iteration 93 accepted 4938.009444849302 [-0.26244554 -0.02137813  0.23828392  0.09856658]
iteration 9

iteration 173 accepted 3649.178832094161 [-0.10827069 -0.01069046  0.10649693  0.04447739]
iteration 174 accepted 3647.4808894585603 [-0.07799648 -0.00927522  0.08455762  0.03560569]
iteration 175 accepted 3640.0286401712933 [-0.06192933 -0.00852046  0.07280058  0.03084837]
iteration 176 accepted 3627.8416154129495 [-0.06164034 -0.00857897  0.07320847  0.03103406]
iteration 177 accepted 3605.8301555252483 [-0.14471101 -0.01255466  0.13494749  0.05604403]
iteration 178 accepted 3586.9993649219437 [-0.07677312 -0.00919675  0.08355225  0.03519445]
iteration 179 accepted 3585.4730709170603 [-0.16734202 -0.01366585  0.15185687  0.06289879]
iteration 180 accepted 3580.587458119949 [-0.00337229 -0.00500868  0.02328167  0.01059104]
iteration 181 accepted 3547.310104327967 [-0.04923467 -0.00727256  0.05870826  0.02497713]
iteration 182 accepted 3523.2570020657763 [-0.01172184 -0.00517138  0.02960358  0.01313112]
iteration 183 accepted 3507.163956621223 [-0.01076624 -0.00515304  0.0292816   0.01

iteration 263 accepted 2033.6563177218445 [-0.0400874  -0.00433889  0.04299719  0.01817938]
iteration 264 accepted 2003.6532556078748 [-0.00631245 -0.00181006  0.01338889  0.00600025]
iteration 265 accepted 1967.6097341670707 [-0.00841635 -0.00194959  0.01528634  0.00677986]
iteration 266 accepted 1907.5492568643463 [-0.00821266 -0.00188903  0.01496351  0.0066405 ]
iteration 267 accepted 1893.0204261269182 [-0.01596828 -0.00243216  0.02162963  0.00937689]
iteration 268 accepted 1873.576905251305 [ 0.00263841 -0.00096147  0.00506211  0.0025506 ]
iteration 269 accepted 1869.306480408584 [-0.02509196 -0.00308204  0.02942894  0.01257882]
iteration 270 accepted 1848.849451111478 [-0.01896882 -0.00259541  0.02401142  0.01034691]
iteration 271 accepted 1822.9449050653564 [-0.01886312 -0.00260588  0.02396667  0.010331  ]
iteration 272 accepted 1806.3041641008745 [-0.02358027 -0.00295008  0.02808244  0.01202282]
iteration 273 accepted 1788.4477612175635 [-0.0121102  -0.00204262  0.01787234  0.0

iteration 1 converged 1.4835091570821315 [ 0.00093457 -0.00034061  0.00431462  0.00213974]
ITERATION 53 

iteration 0 accepted 1.2659906810932329 [ 0.00094786 -0.00035155  0.00429074  0.00214225]
iteration 1 converged 1.2659906664150307 [ 0.00094258 -0.00033963  0.00430734  0.00213671]
ITERATION 54 

iteration 0 accepted 1.0794022898092517 [ 0.00095536 -0.00035063  0.00428394  0.00213946]
iteration 1 converged 1.0794022751501235 [ 0.00095008 -0.00033871  0.00430054  0.00213389]
ITERATION 55 

iteration 0 accepted 0.9195698029966997 [ 0.00096237 -0.00034977  0.00427759  0.00213686]
iteration 1 converged 0.9195697883555192 [ 0.00095709 -0.00033787  0.00429419  0.00213125]
ITERATION 56 

iteration 0 accepted 0.7828371567354068 [ 0.00096893 -0.00034898  0.00427165  0.00213443]
iteration 1 converged 0.7828371421112514 [ 0.00096366 -0.00033708  0.00428825  0.00212878]
ITERATION 57 

iteration 0 accepted 0.6660109044024296 [ 0.00097507 -0.00034824  0.0042661   0.00213216]
iteration 1 converge

iteration 2 converged 0.006078403940582722 [ 0.00104943 -0.00032838  0.00421048  0.00209365]
ITERATION 91 

iteration 0 accepted 0.00562088522096287 [ 0.0010553  -0.00033909  0.00419442  0.00210162]
iteration 1 accepted 0.0056208711830350025 [ 0.00105004 -0.00032747  0.00421037  0.00209628]
iteration 2 converged 0.005620871117920702 [ 0.00105007 -0.00032832  0.00420989  0.00209341]
ITERATION 92 

iteration 0 accepted 0.005227922298044222 [ 0.0010559  -0.00033903  0.00419387  0.0021014 ]
iteration 1 accepted 0.005227908262111204 [ 0.00105065 -0.0003274   0.00420982  0.00209605]
iteration 2 converged 0.005227908197004644 [ 0.00105068 -0.00032825  0.00420934  0.00209318]
ITERATION 93 

iteration 0 accepted 0.00488988739988546 [ 0.00105646 -0.00033897  0.00419335  0.00210119]
iteration 1 accepted 0.004889873365823452 [ 0.00105121 -0.00032734  0.0042093   0.00209583]
iteration 2 converged 0.00488987330072416 [ 0.00105124 -0.00032819  0.00420882  0.00209297]
ITERATION 94 

iteration 0 accept

        1    0.000    0.000    0.000    0.000 pool.py:53(RemoteTraceback)
        1    0.000    0.000    0.000    0.000 pool.py:59(ExceptionWithTraceback)
        1    0.000    0.000    0.000    0.000 pool.py:617(ApplyResult)
      916    0.003    0.000    0.012    0.000 pool.py:619(__init__)
      916    0.001    0.000    0.001    0.000 pool.py:627(ready)
      916    0.001    0.000   39.356    0.043 pool.py:634(wait)
      916    0.002    0.000   39.359    0.043 pool.py:637(get)
        1    0.000    0.000    0.000    0.000 pool.py:661(MapResult)
      916    0.003    0.000    0.015    0.000 pool.py:663(__init__)
        1    0.000    0.000    0.000    0.000 pool.py:702(IMapIterator)
        1    0.000    0.000    0.000    0.000 pool.py:76(MaybeEncodingError)
        1    0.000    0.000    0.000    0.000 pool.py:766(IMapUnorderedIterator)
        1    0.000    0.000    0.000    0.000 pool.py:780(ThreadPool)
        1    0.000    0.000    0.000    0.000 popen_fork.py:1(<module>)
     

In [1]:
#HARD CODED COST DERIVATIVES

import multiprocessing as mp
import os
import numpy as np
from ilqr.cost import QRCost, FiniteDiffCost, Cost
from ilqr.mujoco_dynamics import MujocoDynamics
from ilqr.mujoco_controller import iLQR, RecedingHorizonController
from ilqr.examples.cartpole import CartpoleDynamics
from ilqr.dynamics import constrain

from scipy.optimize import approx_fprime

import mujoco_py
from mujoco_py import MjViewer
import time


class ExactCost(Cost):

    def __init__(self,
                 l,
                 l_terminal,
                 state_size,
                 action_size,
                 x_eps=None,
                 u_eps=None,
                 use_multiprocessing = False):
        
        self._l = l
        self._l_terminal = l_terminal
        self._state_size = state_size
        self._action_size = action_size

        self._x_eps = x_eps if x_eps else np.sqrt(np.finfo(float).eps)
        self._u_eps = u_eps if x_eps else np.sqrt(np.finfo(float).eps)

        self._x_eps_hess = np.sqrt(self._x_eps)
        self._u_eps_hess = np.sqrt(self._u_eps)

        self.multiprocessing = use_multiprocessing
        if self.multiprocessing:
            self._pool = mp.Pool(initializer = ExactCost._worker_init,
                                 initargs = (l, l_terminal, state_size, action_size, x_eps, u_eps, False))

        super(ExactCost, self).__init__()

    @staticmethod
    def _worker_init(l,
                     l_terminal,
                     state_size,
                     action_size,
                     x_eps,
                     u_eps,
                     use_multiprocessing):
        """
        Initializes sims for workers in multiprocessing Pool.
        """
        global cost
        cost = ExactCost(l, l_terminal, state_size, action_size, x_eps, u_eps, use_multiprocessing)
        print("Finished loading process", os.getpid())

    @staticmethod
    def _worker(x, u, i):
        return (cost.l(x, u, i), cost.l_x(x, u, i), cost.l_u(x, u, i), cost.l_xx(x, u, i), cost.l_ux(x, u, i), cost.l_uu(x, u, i))

    def l_derivs(self, xs, us):
        if self.multiprocessing:
            results = self._pool.starmap(ExactCost._worker, [(xs[i], us[i], i) for i in range(us.shape[0])], chunksize = us.shape[0] // mp.cpu_count())
            return ([result[0] for result in results],
                    [result[1] for result in results],
                    [result[2] for result in results],
                    [result[3] for result in results],
                    [result[4] for result in results],
                    [result[5] for result in results])

        L = [self.l(xs[i], us[i], i) for i in range(us.shape[0])]
        L_x = [self.l_x(xs[i], us[i], i) for i in range(us.shape[0])]
        L_u = [self.l_u(xs[i], us[i], i) for i in range(us.shape[0])]
        L_xx = [self.l_xx(xs[i], us[i], i) for i in range(us.shape[0])]
        L_ux = [self.l_ux(xs[i], us[i], i) for i in range(us.shape[0])]
        L_uu = [self.l_uu(xs[i], us[i], i) for i in range(us.shape[0])]
        return (L, L_x, L_u, L_xx, L_ux, L_uu)

    def l(self, x, u, i, terminal=False):
        """Instantaneous cost function.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            i: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            Instantaneous cost (scalar).
        """
        if terminal:
            return self._l_terminal(x, i)

        return self._l(x, u, i)

    
    def l_x(self, x, u, i, terminal=False):
        if terminal:
            return np.array([4, 20, 2, 2]) * x
        return np.array([4, 20, 2, 2]) * x
    def l_u(self, x, u, i, terminal = False):
        if terminal:
            return np.zeros(1)
        return np.array([2]) * u
    def l_xx(self, x, u, i, terminal=False):
        deriv = np.zeros((4, 4))
        deriv[0][0] = 4
        deriv[1][1] = 20
        deriv[2][2] = 2
        deriv[3][3] = 2
        return deriv
    def l_ux(self, x, u, i, terminal=False):
        return np.zeros((1, 4))
    def l_uu(self, x, u, i, terminal=False):
        if terminal:
            return np.zeros((1, 1))
        return np.array([[2]])

def on_iteration(iteration_count, xs, us, J_opt, accepted, converged):
    info = "converged" if converged else ("accepted" if accepted else "failed")
    final_state = xs[-1]
    print("iteration", iteration_count, info, J_opt, final_state)

def run_mpc():
    xml_path = os.path.join('..', 'ilqr', 'xmls', 'inverted_pendulum.xml')
    dynamics = MujocoDynamics(xml_path, frame_skip = 2, use_multiprocessing = True)
    print(dynamics.dt)
    cost = ExactCost(lambda x, u, i: 2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2 + u[0] ** 2,
                      lambda x, i: (2 * (x[0] ** 2) + 10 * (x[1] ** 2) + x[2] ** 2 + x[3] ** 2),
                      4, 1, use_multiprocessing = True)

    N = 100
    x0 = np.array([0.0, np.random.uniform(-np.pi, np.pi), 0.0, 0.0])
    print("hi")
    us_init = np.random.uniform(-1, 1, (N, dynamics.action_size))
    ilqr = iLQR(dynamics, cost, N)
    mpc = RecedingHorizonController(x0, ilqr)
    t0 = time.time()
    controls = mpc.control(us_init, initial_n_iterations = 500, subsequent_n_iterations = 100, on_iteration = on_iteration)
    us = []
    print("hi 2")
    for i in range(100):
        print('ITERATION', i, '\n')
        us.append(next(controls)[1])

    print('time', time.time() - t0)
    
import cProfile
cProfile.run('run_mpc()')


Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']
Choosing the latest nvidia driver: /usr/lib/nvidia-418, among ['/usr/lib/nvidia-375', '/usr/lib/nvidia-418']
0.04
Finished loading processFinished loading processFinished loading process Finished loading processFinished loading process Finished loading process22194Finished loading process   Finished loading process22179 
22195 22181 Finished loading processFinished loading process2218322205

 Finished loading process22198Finished loading processFinished loading process
22210 
Finished loading process

Finished loading process Finished loading process
22189 22230Finished loading process  Finished loading process 2224722219Finished loading process
22187 


 2218222180
2218422289
 


Finished loading process22185 
 
2226022186
Finished loading process 22354
Finished loading process22188 
Finished loading process Finished loading process22377 
22190
hi
hi 2
ITERATION 0 

iteration 0

iteration 84 accepted 7990.907961611437 [-1.04488493  0.4281073   1.48637618  0.61724475]
iteration 85 accepted 7963.024842656578 [-0.57016279  0.55437012  2.25223366  0.98164213]
iteration 86 accepted 7935.550975180697 [-0.6359047   0.53034286  2.16467172  0.95494381]
iteration 87 accepted 7931.792701396215 [-1.46533949  0.26649357  0.61276705  0.23018241]
iteration 88 accepted 7898.376279158348 [-0.97215628  0.40540994  1.65414649  0.67953455]
iteration 89 accepted 7893.48447369845 [-0.42164861  0.573588    2.60427238  1.1339907 ]
iteration 90 accepted 7845.512222697924 [-0.36098384  0.57915266  2.73674781  1.24698439]
iteration 91 accepted 7816.327921638666 [-0.08121191  0.5487579   2.62603166  1.26692339]
iteration 92 accepted 7783.280016040915 [-0.21042463  0.5091193   2.50986442  1.19948962]
iteration 93 accepted 7757.632916431641 [-0.67520688  0.38710533  1.82380328  0.8387513 ]
iteration 94 accepted 7725.919634648466 [-0.60423558  0.38076318  1.84863754  0.84052007]
iteration 9

iteration 175 accepted 5698.017284396503 [-0.33661733 -0.01147574  0.39813735  0.16430285]
iteration 176 accepted 5648.341731193549 [-0.33543224 -0.0117871   0.38703452  0.15962024]
iteration 177 accepted 5611.37376005612 [-0.31002831 -0.01053756  0.35442887  0.14611095]
iteration 178 accepted 5604.076290287302 [ 0.23140002 -0.00825991  0.26692693  0.11976634]
iteration 179 accepted 5571.101288132495 [-0.25539643 -0.01174146  0.34785819  0.14454942]
iteration 180 accepted 5544.3285166077685 [-0.36502375 -0.01195856  0.35480197  0.14526249]
iteration 181 accepted 5521.831150458379 [-0.49854083 -0.01205268  0.35192085  0.14127833]
iteration 182 accepted 5459.841768312232 [-0.22175041 -0.01148502  0.30470001  0.12680603]
iteration 183 accepted 5436.230295303647 [-0.35600785 -0.0119059   0.32285263  0.13181087]
iteration 184 accepted 5432.756748660081 [-0.23912052 -0.01192445  0.29942449  0.12423582]
iteration 185 accepted 5407.719388147258 [ 0.16036259 -0.01313745  0.26130271  0.11636229]

iteration 266 accepted 3151.53027715738 [-0.05929583 -0.00590892  0.06579703  0.02766271]
iteration 267 accepted 3127.938981943394 [ 0.01618097 -0.00259949  0.01146299  0.00572011]
iteration 268 accepted 3113.9070533169374 [ 0.00929201 -0.00286921  0.01614476  0.00760143]
iteration 269 accepted 3098.2490164398087 [ 3.29446149e-02 -2.01577342e-03 -9.71783370e-05  1.07785310e-03]
iteration 270 accepted 3066.810223632946 [-0.02809281 -0.00459963  0.04304004  0.01847129]
iteration 271 accepted 3048.5309818461046 [-0.00959234 -0.00371097  0.02911903  0.01282812]
iteration 272 accepted 3026.6944962036027 [ 0.02026982 -0.00214286  0.00632332  0.0035676 ]
iteration 273 accepted 3026.597871676429 [ 0.08584244  0.00125758 -0.0439104  -0.01683882]
iteration 274 accepted 3001.489444897201 [ 0.07618036  0.00030948 -0.03445304 -0.01291186]
iteration 275 accepted 3000.948191873484 [-0.0441243  -0.00544579  0.05428494  0.02301955]
iteration 276 accepted 2972.4570442847416 [-0.01531202 -0.0040243   0.0

iteration 358 accepted 2087.7920035564343 [ 0.02559783  0.00049574 -0.01299933 -0.00482179]
iteration 359 accepted 2085.806168739805 [ 0.02539629  0.00048165 -0.01283772 -0.00475564]
iteration 360 accepted 2082.5762716126605 [ 0.02308741  0.00033611 -0.01096517 -0.00399034]
iteration 361 accepted 2077.4139893910815 [ 0.02383852  0.00037879 -0.01157608 -0.0042396 ]
iteration 362 accepted 2074.5314225449365 [ 1.12132260e-05 -1.21108420e-03  8.41931847e-03  3.95294234e-03]
iteration 363 accepted 2065.53233876673 [ 0.01485599 -0.00023326 -0.00406083 -0.0011596 ]
iteration 364 accepted 2062.019150781511 [ 0.02043744  0.00015396 -0.00880922 -0.00310764]
iteration 365 accepted 2060.5500533080935 [ 0.02154705  0.00023974 -0.00978438 -0.00350907]
iteration 366 accepted 2060.101136817024 [ 0.02253111  0.00032564 -0.01068495 -0.00388127]
iteration 367 accepted 2059.841907872292 [ 0.02272624  0.00034453 -0.0108701  -0.00395805]
iteration 368 accepted 2059.7733097999185 [ 0.02294688  0.00036786 -0.

iteration 1 converged 0.46825827031703937 [ 0.00160996 -0.00026959  0.00367953  0.00187634]
ITERATION 57 

iteration 0 accepted 0.42023125682958135 [ 0.00157948 -0.00028387  0.0036977   0.00189835]
iteration 1 converged 0.4202312443275178 [ 0.00157434 -0.00027311  0.00371332  0.00189035]
ITERATION 58 

iteration 0 accepted 0.3789590352007507 [ 0.00154618 -0.00028725  0.00372926  0.00191134]
iteration 1 converged 0.3789590225790362 [ 0.00154103 -0.00027643  0.00374495  0.00190346]
ITERATION 59 

iteration 0 accepted 0.343304301591799 [ 0.00151503 -0.00029043  0.00375879  0.00192351]
iteration 1 converged 0.3433042888581584 [ 0.00150987 -0.00027955  0.00377454  0.00191572]
ITERATION 60 

iteration 0 accepted 0.31231023372279376 [ 0.00148589 -0.00029342  0.00378642  0.00193488]
iteration 1 converged 0.3123102208844823 [ 0.00148073 -0.00028248  0.00380223  0.0019272 ]
ITERATION 61 

iteration 0 accepted 0.2851777656192508 [ 0.00145864 -0.00029623  0.00381227  0.00194553]
iteration 1 conver

iteration 2 converged 0.010992053643704945 [ 0.00109503 -0.00032352  0.00416798  0.002076  ]
ITERATION 98 

iteration 0 accepted 0.009971573964911657 [ 0.00109794 -0.00033447  0.00415471  0.00208528]
iteration 1 accepted 0.009971560082056977 [ 0.0010927  -0.00032292  0.0041706   0.00207978]
iteration 2 converged 0.009971560016235939 [ 0.00109273 -0.00032377  0.00417014  0.00207689]
ITERATION 99 

iteration 0 accepted 0.009059737757668915 [ 0.0010958  -0.00033471  0.00415673  0.00208611]
iteration 1 accepted 0.009059723867545891 [ 0.00109055 -0.00032315  0.00417262  0.00208062]
iteration 2 converged 0.009059723801697029 [ 0.00109058 -0.000324    0.00417215  0.00207773]
time 20.652040004730225
         7310577 function calls (7296485 primitive calls) in 20.861 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      535    0.001    0.000    0.012    0.000 <__array_function__ internals>:2(amax)
      535    0.001    0.000    0.00

In [2]:
dynamics.sim.reset()
dynamics.set_state(x0)
video = []
for i in range(len(us)):
    dynamics.step(us[i])
    video.append(dynamics.sim.render(512, 512))
    

NameError: name 'dynamics' is not defined

In [6]:
from ilqr.utils.visualization import make_video_fn
%load_ext autoreload
%autoreload 2
make_video_fn(video)()