### Mountain Car With an Explicit Policy

The explicit policy used in this mountain car problem is described below:<br />
In each state, examine the direction of the current velocity.  Then take the action which makes the direction of car's acceleration following the direction of the current velocity.  This action should gain the maximum momentum the car can have in that state.

In [1]:
import numpy as np
import gym

In [2]:
# get mountain car environment
env = gym.make("MountainCar-v0")
# goal: car reaches the flag on top of the mountain on the right side
# action: 0 (acceleration towards left), 1 (stay), 2 (acceleration towards right)
# observation (state): [position (initial value: -0.5), velocity (initial value: 0.0)]

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m


In [3]:
def run_episode():
    total_reward = 0
    time_step = 0
    observation = env.reset()
    for time_step in range(200):
        env.render()
        # get position and velocity from observation (state)
        position, velocity = observation
        # apply the explicit policy
        if velocity < 0:
            action = 0
        else:
            action = 2
        observation, reward, done, info = env.step(action)
        total_reward += reward
        if done:
            break
    return total_reward, time_step + 1

In [4]:
# test episode runs
for episode in range(20):
    total_reward, time_step = run_episode()
    print(f"Episode:{episode:2d}, Reward:{total_reward:3.2f}, Timestep:{time_step:3d}")

Episode: 0, Reward:-118.00, Timestep:118
Episode: 1, Reward:-114.00, Timestep:114
Episode: 2, Reward:-121.00, Timestep:121
Episode: 3, Reward:-123.00, Timestep:123
Episode: 4, Reward:-121.00, Timestep:121
Episode: 5, Reward:-114.00, Timestep:114
Episode: 6, Reward:-119.00, Timestep:119
Episode: 7, Reward:-121.00, Timestep:121
Episode: 8, Reward:-121.00, Timestep:121
Episode: 9, Reward:-121.00, Timestep:121
Episode:10, Reward:-115.00, Timestep:115
Episode:11, Reward:-121.00, Timestep:121
Episode:12, Reward:-116.00, Timestep:116
Episode:13, Reward:-123.00, Timestep:123
Episode:14, Reward:-122.00, Timestep:122
Episode:15, Reward:-114.00, Timestep:114
Episode:16, Reward:-121.00, Timestep:121
Episode:17, Reward:-124.00, Timestep:124
Episode:18, Reward:-122.00, Timestep:122
Episode:19, Reward:-121.00, Timestep:121


In [5]:
# close mountain car environment
env.close()