Skip to content
This repository has been archived by the owner on Feb 24, 2022. It is now read-only.

[smartcab] Incomplete reward system #185

Closed
pedropb opened this issue Jan 15, 2017 · 0 comments
Closed

[smartcab] Incomplete reward system #185

pedropb opened this issue Jan 15, 2017 · 0 comments

Comments

@pedropb
Copy link
Contributor

pedropb commented Jan 15, 2017

While working on this project I detected a small flaw in the reward system.

File environment.py lines 328-341

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and inputs['oncoming'] != 'left': # No oncoming traffic
                violation = 1 # Minor violation

       # (...)

        # Did the agent attempt a valid move?
        if violation == 0:
            if action == agent.get_next_waypoint(): # Was it the correct action?
                reward += 2 - penalty # (2, 1)
            elif action == None and light != 'green': # Was the agent stuck at a red light?
                reward += 2 - penalty # (2, 1)
            else: # Valid but incorrect
                reward += 1 - penalty # (1, 0)

This doesn't cover the case where the agent wants to turn left and there is oncoming traffic forward or right. The optimal policy would be to take no action (None), but None, right or forward would result in the same reward being applied: Valid but incorrect case.

The solution would be to expand:

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and (inputs['oncoming'] != 'left' or waypoint != 'left'): # No oncoming traffic
                violation = 1 # Minor violation

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)

and expand:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)
pedropb added a commit to pedropb/machine-learning that referenced this issue Jan 15, 2017
pedropb added a commit to pedropb/machine-learning that referenced this issue Jan 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants