Trinh Pham and Angel

# Introduction

AWS Deepracer is a machine learning, cloud based, racing simulator that completes a given race track based on the user's reward functions and coding ability. With our first few models, we progressed through based on training time (1hr, 30mins, 3hrs), to see if the training time has a significant impact on the model. To our surprise, the training time has little to no impact on how well the model performs.

Throughout our model trainings, we decided to stick with one racetrack, the DBRO Raceway track. With this racetrack in particular, there were obstacles that needed to be passed, such as the sudden wide curves. With each training model, we achieved considerable amount of completion time progress.

---



# First model
In the first model, we implemented the standard 'stay within borders' model without adjusting the given code, and simply trained the model for 1 hour. With the first model we acheived rank *211* and time of *2:17*

---



# Second Model

In the second model, since we were on a time crunch, we decided to do a training time of 10 minutes. The short training time reflects how poorly the model performed on the track. Unlike the other models we've trained, this one was off track more frequently, with a slower acceleration rate than the others. The model had a completion time of 05:29s.

Since we chose the stay within the border reward function, the model will be rewarded whenever it stays within the border.



In [None]:
def reward_function(params):
    # Example of rewarding the agent to stay inside the two borders of the track

    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']

    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)


# Third Model
With our third model, we tried different reward functions, the already pre-coded 'prevent zig-zag' reward function, and one additional reward function to increase the model's speed. we set the max speed of the agent to 4 to adjust for the slow start it has. Although we adjsuted for the speed, we still needed some more trial and error checks for which speed max was most optimal. This model had a completion time of 2:20s.

In [None]:
MAX_SPEED = 4
def reward_function(params):
  #Example of penalize steering, which helps mitigate zig-zag behaviors
  speed = params ['speed']
  speed_rate = speed / MAX_SPEED
  return speed_rate ** 2
  #read input parameters
  distance_from_center =
    params['distance_from_center']
track_width = params['track_width']
abs_steering = abs(params['steering_angle']) #only need the absolute steering angle

#calculate 3 marks that are farther and farther away from the center line
marker_1 = 0.1 * track_width
marker_2 = 0.25 * track_width
marker_3 = 0.5 * track_width

#Give higher reward if the car is closer to center line and vice versa
if distance_from_center <= marker_1:
  reward = 5.0
elif distance_from_center <= marker2:
  reward = 0.5
elif distance_from_center <= marker_3:
  reward = 0.3
else:
  reward = 1e-10  # likely crashed/ close to off track

# Steering penality threshold, change the number based on your action space setting
ABS_STEERING_THRESHOLD = 15

# Penalize reward if the car is steering too much
if abs_steering > ABS_STEERING_THRESHOLD:
  reward *= 0.8

return float(reward)




---



# Fourth Model
The fourth model is almost identical to the third model but with a longer testing time (3hrs), including a progress reward, and a different reward pathway (Follow centerline). With a longer training time, and a new approach of a reward system, we hoped to get  better runtime results.

In the fourth model, as I'm evaluating the model's performance, I realized that the model wasn't doing better than our first model, where we didn't implement any additional rewards. Within 10 minutes, the model was able to complete the track with ease, however, the speed is still a bit slower than the  expected outcome. With our fourth model we were able to achieve a completion time of 02:39.787

---



In [None]:
def reward_function(params):
    # Example of rewarding the agent to follow center line

    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    #a speed reward system that penalizes the model when the speed is slow
    MAX_SPEED = 4

    def reward(params):
        speed = params['speed']
        speed_rate = speed / MAX_SPEED
        return speed_rate ** 2
    #the model is rewarded based on its progress, the further it can go in the track with fewer steps , the higher the reward

        progress = params['progress']
        steps = params['steps']

        return progress / steps
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3 # likely crashed/ close to off track

    return float(reward)


# Conclusion

The more we implemented a new reward function or increased the training time, we realized it didn't make any substantial changes. For significant change to occur, we must dive deeper into the overall code of the model and analyze the track. If we can implement a way to forsee the tracks layout, then we can reward the model better.