New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do the program only use two state? #33
Comments
@guotong1988 I think you should learn the very basic concept of reinforcement learning. It is basically a dynamic program, the state changes from time to time. You'd better learn Markov Decision Process and Bellman Equation first. |
the state changes from time to time |
反过来想,为什么不只用1个state呢,而用了2个state |
@guotong1988 no, you cannot use only one state, since intuitively you must communicate with the environment by behaving to learn a lesson. Once your action done, you are in another state, and you get reward or punishment from the environment, thus you can learn something. |
@guotong1988 for comprehensive understanding, you should learn mdp theory first. |
关键这两个state是紧挨着的, |
@guotong1988 like I said, you really need to learn mdp first. Markov property informs the current state captures all relevant information from the history. Thus the future state only depends on the current state. In mathematical forms, P[s_{t+1}|s_{t}] = P[s_{t+1}|s_1,...,s_t]. |
The answer: One state contains 4 frame. |
I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @yenchenlin
The text was updated successfully, but these errors were encountered: