Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: meaning of column in ray-stat #154

Closed
pnprog opened this issue Feb 25, 2018 · 5 comments
Closed

Question: meaning of column in ray-stat #154

pnprog opened this issue Feb 25, 2018 · 5 comments

Comments

@pnprog
Copy link

pnprog commented Feb 25, 2018

Hi!

When I enter the ray-stat command, I will get a result similar to this one:

|Move|Count|Simulation|Policy    |Value     |Win       |Best Sequence
| C16|   98|   59.1837|   20.1409|   64.1175|   63.1308|w Q4
| D16|   84|   47.6190|    8.1501|   68.1366|   64.0331|w D4
|  Q3|   45|   53.3333|   26.9417|   67.6667|   64.8000|w C16
| D17|   35|   51.4286|   10.9871|   71.1593|   67.2131|w D15
|  Q4|   32|   43.7500|    8.1715|   67.0206|   62.3664|w D16
|  D4|   29|   37.9310|    0.9061|   69.0554|   62.8305|w D16

I would like to confirm the meaning of the different columns. In my understanding:

  • Move: The move considered by Ray (they are sorted from best move to worse move)
  • Count: Is this the number of MC simulation performed to evaluate that move?
  • Simulations: Is this the wining rate (%) of the MC simulation?
  • Policy: Is this a policy network value? indicating the probability of that move to be played by human, as given by a neural network?
  • Value: Is this a value network win rate (%)? indicating a probability to win from that position, as given by a neural network?
  • Win: Is this the win rate (%) ? calculated as a weighted average of Simulation and Value ? Then I am not sure why the value are not decreasing. In the example above, what is the best move? D17 or C16?
  • Best Sequence: the best sequence for both player starting from that move

Thanks a lot!

@zakki
Copy link
Owner

zakki commented Feb 26, 2018

You are right.
Win: Ray plays the biggest Count move. For example, by only 1 search, Win 100% or 0%. Playing that 100% winning move doesn't make sense.
At first ray tends to select bigger policy moves. After thinking more and more, ray selects bigger Win moves.

@pnprog
Copy link
Author

pnprog commented Feb 26, 2018

Thanks for your prompt reply.

Sometime, I get not data for Win:

|Move|Count|Simulation|Policy    |Value     |Win       |Best Sequence
| F17|  409|   50.3667|   69.7768|||w C10
| P17|  157|   52.8662|   25.1722|||w C6
|  C6|  104|   51.9231|    0.2533|   55.9466|   55.1419|w F4
| O17|   11|   36.3636|    2.5871|||
|  F3|    9|   22.2222|    0.2220|   53.7620|   47.4540|
| P16|    6|   50.0000|    1.0271|||
| C14|    4|   75.0000|    0.2098|||

Is that an indication that my computer is not fast enought? That I should give more thinking time to Ray?

@zakki
Copy link
Owner

zakki commented Feb 26, 2018

Do you use ray without GPU and with multi thread e.g. --thread 4?
Current Rn's network is too heavy for CPU, so NN evaluation didn't complete in time.

@pnprog
Copy link
Author

pnprog commented Feb 26, 2018

My computer is old and slow (no GPU) that's why.
But for my application, it is not a problem. The important thing is that when I got missing values for Win I can issue a warning to the user to tell him to increase the thinking time.

Thanks!

@pnprog pnprog closed this as completed Feb 26, 2018
@zakki
Copy link
Owner

zakki commented Feb 27, 2018

@pnprog Perhaps --reuse-subtree is useful for goreviewpartner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants