[smartcab] Incomplete reward system #185

pedropb · 2017-01-15T01:41:01Z

While working on this project I detected a small flaw in the reward system.

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and inputs['oncoming'] != 'left': # No oncoming traffic
                violation = 1 # Minor violation

       # (...)

        # Did the agent attempt a valid move?
        if violation == 0:
            if action == agent.get_next_waypoint(): # Was it the correct action?
                reward += 2 - penalty # (2, 1)
            elif action == None and light != 'green': # Was the agent stuck at a red light?
                reward += 2 - penalty # (2, 1)
            else: # Valid but incorrect
                reward += 1 - penalty # (1, 0)

This doesn't cover the case where the agent wants to turn left and there is oncoming traffic forward or right. The optimal policy would be to take no action (None), but None, right or forward would result in the same reward being applied: Valid but incorrect case.

The solution would be to expand:

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and (inputs['oncoming'] != 'left' or waypoint != 'left'): # No oncoming traffic
                violation = 1 # Minor violation

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)

and expand:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)

The text was updated successfully, but these errors were encountered:

pedropb added a commit to pedropb/machine-learning that referenced this issue Jan 15, 2017

Fixed udacity#185

f851b61

pedropb mentioned this issue Jan 15, 2017

Fixes #159, #178, #184 and #185 #179

Closed

pedropb added a commit to pedropb/machine-learning that referenced this issue Jan 16, 2017

Fixed udacity#185

76906ba

adarsh0806 closed this as completed May 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smartcab] Incomplete reward system #185

[smartcab] Incomplete reward system #185

pedropb commented Jan 15, 2017 •

edited

Loading

[smartcab] Incomplete reward system #185

[smartcab] Incomplete reward system #185

Comments

pedropb commented Jan 15, 2017 • edited Loading

pedropb commented Jan 15, 2017 •

edited

Loading