In [1]:
# activate project environment
# include these lines of code in any future scripts/notebooks
#---
import Pkg
if !haskey(Pkg.installed(), "AA228FinalProject")
    jenv = joinpath(dirname(@__FILE__()), "..") # this assumes the notebook is in the same dir
    # as the Project.toml file, which should be in top level dir of the project. 
    # Change accordingly if this is not the case.
    Pkg.activate(jenv)
end
#---

# import necessary packages
using AA228FinalProject
using POMDPs
using POMDPPolicies
using BeliefUpdaters
using ParticleFilters
using POMDPSimulators
using Cairo
using Gtk
using Random
using Printf
using POMDPModels
using POMDPSimulators
using BasicPOMCP
using QMDP
using ARDESPOT

┌ Info: Loading Cairo backend into Compose.jl
└ @ Compose /Users/adamthorne/.julia/packages/Compose/BYWXX/src/Compose.jl:161
│ - If you have Compose checked out for development and have
│   added Cairo as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Compose
└ @ nothing nothing:840


In [2]:
sensor = Bumper() # or Bumper() for the bumper version of the environment
config = 3 # 1,2, or 3
vlist = [5.0]
omlist = [-0.5,0,0.5]
aspace = vec(collect(RoombaAct(v, om) for v in vlist, om in omlist))

num_x_pts = 100
num_y_pts = 100
num_th_pts = 16


m = RoombaPOMDP(sensor=sensor, mdp=RoombaMDP(config=config, aspace=aspace))

RoombaPOMDP{Bumper,Bool}(Bumper(), RoombaMDP{ContinuousRoombaStateSpace,Array{RoombaAct,1}}
  v_max: Float64 10.0
  om_max: Float64 1.0
  dt: Float64 0.5
  contact_pen: Float64 -1.0
  time_pen: Float64 -0.1
  goal_reward: Float64 10.0
  stairs_penalty: Float64 -10.0
  config: Int64 3
  room: AA228FinalProject.Room
  sspace: ContinuousRoombaStateSpace ContinuousRoombaStateSpace()
  aspace: Array{RoombaAct}((3,))
  _amap: Dict{RoombaAct,Int64}
)

In [3]:
num_particles = 2000
resampler = BumperResampler(num_particles)

spf = SimpleParticleFilter(m, resampler)

v_noise_coefficient = 2.0
om_noise_coefficient = 0.5

belief_updater = RoombaParticleFilter(spf, v_noise_coefficient, om_noise_coefficient);

### Define a POMCP Policy

In [4]:
# initialize a solver and compute a policy
solver = DESPOTSolver(bounds=(-10.0, 10.0))
planner = solve(solver, m)

DESPOTPlanner{RoombaPOMDP{Bumper,Bool},Tuple{Float64,Float64},MemorizingSource{MersenneTwister},MersenneTwister}(DESPOTSolver
  epsilon_0: Float64 0.0
  xi: Float64 0.95
  K: Int64 500
  D: Int64 90
  lambda: Float64 0.01
  T_max: Float64 1.0
  max_trials: Int64 9223372036854775807
  bounds: Tuple{Float64,Float64}
  default_action: ExceptionRethrow ExceptionRethrow()
  rng: MersenneTwister
  random_source: MemorizingSource{MersenneTwister}
, RoombaPOMDP{Bumper,Bool}(Bumper(), RoombaMDP{ContinuousRoombaStateSpace,Array{RoombaAct,1}}
  v_max: Float64 10.0
  om_max: Float64 1.0
  dt: Float64 0.5
  contact_pen: Float64 -1.0
  time_pen: Float64 -0.1
  goal_reward: Float64 10.0
  stairs_penalty: Float64 -10.0
  config: Int64 3
  room: AA228FinalProject.Room
  sspace: ContinuousRoombaStateSpace ContinuousRoombaStateSpace()
  aspace: Array{RoombaAct}((3,))
  _amap: Dict{RoombaAct,Int64}
), (-10.0, 10.0), MemorizingSource{MersenneTwister}(MersenneTwister(UInt32[0x594ccccc], Random.DSFMT.DSFMT_s

### Define a policy

Here we demonstrate how to define a naive policy that attempts navigate the Roomba to the goal. The heuristic policy we define here first spins around for 25 time-steps in order to perform localization, then follows a simple proprtional control law that navigates the robot in the direction of the goal state (note that this policy fails if there is a wall in the way).

First we create a struct that subtypes the Policy abstract type, defined in the package ```POMDPPolicies.jl```. Here, we can also define certain parameters, such as a variable tracking the current time-step.

Next, we define a function that can take in our policy and the belief state and return the desired action. We do this by defining a new ```POMDPs.action``` function that will work with our policy. 

In [5]:
# Define the policy to test
mutable struct ToEnd <: Policy
    ts::Int64 # to track the current time-step.
end

# extract goal for heuristic controller
goal_xy = get_goal_xy(m)
print(goal_xy)

# define a new function that takes in the policy struct and current belief,
# and returns the desired action
function POMDPs.action(p::ToEnd, b::ParticleCollection{RoombaState})
    p.ts += 1
#     if length(particles(b)) == 0
#         return action(QMDPPolicy, uniform_belief(m))
#     end
    if AA228FinalProject.wall_contact(m,particles(b)[1])
        return RoombaAct(3.0,-pi)
    end

    a = action(planner, b)
    return a
end

[15.0, 0.0]

### Simulation and rendering

Here, we will demonstrate how to seed the environment, run a simulation, and render the simulation. To render the simulation, we use the ```Gtk``` package. 

The simulation is carried out using the ```stepthrough``` function defined in the package ```POMDPSimulators.jl```. During a simulation, a window will open that renders the scene. It may be hidden behind other windows on your desktop.

In [None]:
# first seed the environment
Random.seed!(2)

# reset the policy
p = ToEnd(0) # here, the argument sets the time-steps elapsed to 0
# for (t, step) in enumerate(stepthrough(m, p, belief_updater, max_steps=100))
#     print("hi")
# end
# run the simulation
c = @GtkCanvas()
win = GtkWindow(c, "Roomba Environment", 600, 600)
for (t, step) in enumerate(stepthrough(m, p, belief_updater, max_steps=100))
    @guarded draw(c) do widget
        
        # the following lines render the room, the particles, and the roomba
        ctx = getgc(c)
        set_source_rgb(ctx,1,1,1)
        paint(ctx)
        render(ctx, m, step)
        
        # render some information that can help with debugging
        # here, we render the time-step, the state, and the observation
        move_to(ctx,300,400)
        show_text(ctx, @sprintf("t=%d, state=%s, o=%.3f",t,string(step.s),step.o))
    end
    show(c)
    sleep(0.1) # to slow down the simulation
end

### Evaluation 

Here, we demonstate a simple evaluation of the policy's performance for a few random seeds. This is meant to serve only as an example, and we encourage you to develop your own evaluation metrics.

We intialize the robot using five different random seeds, and simulate its performance for 100 time-steps. We then sum the rewards experienced during its interaction with the environment and track this total reward for the five trials.
Finally, we report the mean and standard error for the total reward. The standard error is the standard deviation of a sample set divided by the square root of the number of samples, and represents the uncertainty in the estimate of the mean value.

In [6]:
using Statistics

total_rewards = []
num_success = 0
num_seeds = 100
skip = 0

for exp = 1:num_seeds
    try
        println(string(exp))

        Random.seed!(exp+skip)
        #srand(exp)

        p = ToEnd(0)
        traj_rewards = 0
        for step in stepthrough(m,p,belief_updater, max_steps=100)
            traj_rewards += step.r
            if step.r > 5
                println("reached goal")
                num_success += 1
                push!(total_rewards, traj_rewards)
                break
            end
        end
    catch ex
        skip += 1
        exp -= 1
        continue
    end
#     traj_rewards = sum([step.r for step in stepthrough(m,p,belief_updater, max_steps=300)])
    
#     push!(total_rewards, traj_rewards)
end

success_rate = (num_success*1.0)/num_seeds
mtr = mean(total_rewards)
score = success_rate*success_rate*mtr
@printf("Percent that reached goal: %.3f%%", success_rate*100)
println()
@printf("Mean Total Reward: %.3f", mtr)
println()
x = 
@printf("Score: %.3f", score)

1
2
reached goal
3
4
5
6
7
8
9
10
reached goal
11
12
13
14
15
reached goal
16
17
18
reached goal
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
reached goal
35
36
37
38
39
40
41
42
43
reached goal
44
45
46
reached goal
47
48
49
50
51
52
53
54
reached goal
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
reached goal
77
reached goal
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
reached goal
97
98
99
100
Percent that reached goal: 11.000%
Mean Total Reward: 7.482
Score: 0.091