Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How did you write the code for MountainCart.py? #2

Open
montallban opened this issue Sep 23, 2020 · 2 comments
Open

How did you write the code for MountainCart.py? #2

montallban opened this issue Sep 23, 2020 · 2 comments

Comments

@montallban
Copy link

montallban commented Sep 23, 2020

Have you done something like this before? I'm trying to understand it and see what it's doing. The comments are a bit sparse so it hasn't been exactly easy. Perhaps you can lead me through at some point? What will I have to change in order to implement eligibility traces?

Edit: Really, I think just seeing the pseudocode would help.

@tharmoth
Copy link
Owner

tharmoth commented Sep 23, 2020

I worked through this tutorial on q learning. Then I figured out discretization on my own and solved the cart pole problem. I then adapted this for MountainCart. I believe at some point i changed the Q learning algorithm to what we learned in class as the implementation from that tutorial wasn't working for mountain cart.

Sorry about the readability, I just pushed the code I was playing around with and haven't made it human readable. I'll go back through and clean the code up.

As for changing this to implement eligibility traces, you would need to change the train() method to use eligibility traces instead of raw q learning, and maybe also change the .evaluate() method if eligibility traces change more than just the q table.

@tharmoth
Copy link
Owner

tharmoth commented Sep 23, 2020

Here's some vaguely pseudo code of the train method.

    def train(self):
        streak = 0
        max_iterations = 10000
        loop while episodes are less than max or conditions are met
            update hyperperameters
            
            reset gym ai state

            run the simulation until complete
                
                # save the old state
                angle_old, velocity_old = angle, velocity
                
                # either do something random or do the models best predicted action
                if random.uniform(0, 1) < self.epsilon:
                    action = self.env.action_space.sample()  # Explore action space
                else:
                    action = np.argmax(self.q_table[angle_old][velocity_old])  # Exploit learned values

                # run the simulation
                next_state, reward, done, info = self.env.step(action)

                # convert the continueous state data to discrete data
                angle, velocity = self.bin_data(next_state)
                
                # update the q learning model
                next_max = np.max(self.q_table[angle][velocity_new])
                old_value = self.q_table[angle_old][velocity_old][action]
                self.q_table[angle_old][velocity_old][action] += self.alpha * (reward + self.gamma * next_max - old_value)

                # get ready for next loop
                state = next_state
                epochs += 1

            # The rest of the code are arbitrary conditions that signal the model is trained
            # I was playing around with them and these conditions seem to yield good results most of the time
            # Feel free to play around with this as much as youd like
            if epochs < 130:
                streak += 1
            else:
                streak = 0

            if streak > 2:
                print("Found Streak at Episode: " + str(episode))
                break

            if epochs < 100:
                print("Optimal Detected")
                # break

            # Print progress bar and then add data to graph
            if episode % (max_iterations / 10) == 0:
                # print("Training " + str(i / max_iterations * 100) + "% Complete.")
                pass
            self.convergence_graph.append(epochs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants