Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do the program only use two state? #33

Closed
guotong1988 opened this issue Mar 7, 2017 · 8 comments
Closed

Why do the program only use two state? #33

guotong1988 opened this issue Mar 7, 2017 · 8 comments

Comments

@guotong1988
Copy link

guotong1988 commented Mar 7, 2017

I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @yenchenlin

@ColdCodeCool
Copy link

@guotong1988 I think you should learn the very basic concept of reinforcement learning. It is basically a dynamic program, the state changes from time to time. You'd better learn Markov Decision Process and Bellman Equation first.

@guotong1988
Copy link
Author

the state changes from time to time
thank you
could you please have a look at my another question? thx!
the question is also in the issues

@guotong1988
Copy link
Author

反过来想,为什么不只用1个state呢,而用了2个state

@ColdCodeCool
Copy link

ColdCodeCool commented Apr 12, 2017

@guotong1988 no, you cannot use only one state, since intuitively you must communicate with the environment by behaving to learn a lesson. Once your action done, you are in another state, and you get reward or punishment from the environment, thus you can learn something.

@ColdCodeCool
Copy link

@guotong1988 for comprehensive understanding, you should learn mdp theory first.

@guotong1988
Copy link
Author

guotong1988 commented Apr 12, 2017

关键这两个state是紧挨着的,
就是说第二个state有情况,是前若干步决定的啊

@ColdCodeCool
Copy link

ColdCodeCool commented Apr 12, 2017

@guotong1988 like I said, you really need to learn mdp first. Markov property informs the current state captures all relevant information from the history. Thus the future state only depends on the current state. In mathematical forms, P[s_{t+1}|s_{t}] = P[s_{t+1}|s_1,...,s_t].

@guotong1988
Copy link
Author

The answer: One state contains 4 frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants