Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on expanding research possibilities #4

Open
jiahui-x opened this issue Mar 5, 2021 · 4 comments
Open

Discussion on expanding research possibilities #4

jiahui-x opened this issue Mar 5, 2021 · 4 comments

Comments

@jiahui-x
Copy link

jiahui-x commented Mar 5, 2021

Dear author:
It is really an excellent job on poker bot!!!

 Here I have one question for further research possibilities you may have considered:  whether or how can the training deepbot agent learn from more intelligent opponent AIs like DeepStack(we have recreated a 6-handed version and include muti-iterations children version), especially in 6-handed Nolimited Texas holdem games?


Thank you in advance!!!
@schreven
Copy link
Collaborator

Hello thank you for your message!

First think to note is that this bot was aimed at 6 handed, but Sit and Go. DeepStack is for cash games, as are most AIs out there. The first iteration of Deepbot was also for cash games, but Heads Up. See /code/main_functions/u_training_games::run_games() to see the three main 'environments' in which the bot can play. See /code/bots/networks.py to see the 4 different neural nets that the bot can be composed of.

I don't see any blockers to having it work for a 6 handed cash game. The main thing will be about the reward function. First generations are random, so lots of them will go all in all the time. It's important to play enough hands to not only maintain the genes of the 'lucky' instances that won a couple all-ins.

In practical terms this bot here is built to play on pypokerengine. The code for that is in /code/bots/bot_Deep_Bot.py. If you would like to have it play on a different api, you would likely have to edit that part to conform to the other api. Another perhaps simpler option, is to make your deepstack implementation compatible with pypokerengine. That consists in extending BasePokerPlayer and implementing the main functions, mainly declare_action on the top of my head.

Thinking about it more theoretically, I think this bot is also able to learn from 'smarter' AIs. The one presented to him here were wrecked so it definitely has more capacity. Also the genetic algorithm is quite robust, so the learning should not be blocked at some local minimal or such. However I do not know if/when it will hit a limit. The network could always be made wider / deeper, but still.
If it reaches that point, another idea I had was to switch to gradient descent at the end. Apparently it can be stronger to optimize with precision the last bit.
Finally this bot had a strong focus on 'modelling opponent player'. The game will be different if there are 5 deepstacks + 1 deepbot at the table, or 3 fish + 2 deepstacks + 1 deepbot. Also the 'opponent modelling' part of the neural network could be stripped if it only plays vs the same table composition / bots. But that would be a shame in a way ;) This approach will likely always struggle to beat 'nash equilibrium' bots, it is rather intended for the real world where exploiting weaker players yields a lot of return.

You'll find much more info in the docs/Deepbot_report.pdf in case you did not see it. Feel free to ask if you have other interrogations.

I have a couple. In what language is your version of DeepStack? Is it compatible with a public api? Do you want to play on a uniform table composition (all the same bots) or not? Are you aiming for opponent modelling, or rather approaching an optimum?

@jiahui-x
Copy link
Author

Thank you so much for the thoughtful answer !

Here are my answers to your questions: Our work is written of Lua and we are trying to make it compatible with Python, as well as a public api. We may first try a table composition of 4 fish + 1 deepstacks + 1 deepbot, hoping that our AI can not only beat itself in the future generation, but alse learn multiple play styles from "opponent modelling" to be competitive against a variety of possible opponents. We consider opponent modelling as a vital approach in multi-players games, and we aim to give it a try in the next step of poker and other application area. It would be great if it worked!

So I want to ask a few more questions for the Deepbot: how the opp network model the aggressiveness of opponent, or is it implicited at the parameter level? What do you think about the possible loss of information after 'average' between the opp net work and decision network in 6-handed situation?

Thanks again for your work and experience sharing!

@schreven
Copy link
Collaborator

You're welcome! I'm happy to hear about about another project, and it seems we approach it in a similar way.

I agree opponent modelling is a vital approach, and started with that in mind. The other thing I considered important was to have visibility on the strategy of the bot. This way human players can compare their playstyle to it, and get some insights. Additionally, it's useful in order to validate the bot, and get ideas on how to improve it.

I think it's a great approach to start with the table composition you describe (4 fish + 1 deepstack + 1 deepbot). When you say "hoping that our AI can (not only) beat itself in the future generation", you mean beat deepstack? From reading other work at the time, my intuition is that deepbot could beat deepstack, but only on such a table composition. It would however require significantly more work to beat deepstack "heads on". In other words it might lose in hands against deepstack but compensate enough on the rest. It will likely also learn to avoid hands against deepstack (the strong opponent).

The aggressiveness is modeled at the parameter level yes. To take a step back, the modeling happens in the memory of the lstm's. So if we take a deepbot that is fully trained; the lstm's will record and take into account past actions / results. It's parameters will change accordingly. Then at some interval its parameters are reset. That duration is it's memory span. As is displayed on figure 4.5, there is memory at the scale of the "session" and another at the scale of a "round". For sit and go a "session" lasts the length of the sit and go. For cash games I think I put it at 500 hands. So this part is modelling the general strategy of the opponent. Then what I named a "round" is one poker hand (from preflop to river / showdown / all fold). I used this wording as "hand" is often used for other things in poker and it can get confusing. So these "round" lstm's are reset at each "round", and rather model the aggressiveness of the opponent player for that hand. Indirectly it models the opponent's hand, or at least what he is representing.
Additionally (and importantly), in the "game network", the memory spans for the duration of the session. That is modelling the general strategy of the table.

What do you think about the possible loss of information after 'average' between the opp net work and decision network in 6-handed situation?

Good point. Indeed there must be loss of information there. For example it might be that one opponent's behaviour is more important than the other one, and here the averaging does not it into account. The averaging could be replaced by another neural net layer.

This leads me to the next thing I would have done if I continued this project: I would have generated a graphic like 5.13, but where each quadrant is a player at the table, instead of a full table composition. Currently there is no validation on the (individual) opponent modelling, there is only table modelling. Given that the opponents are not so strong, it could be that the bot got away with doing only that. This is one of the reasons I also think it can be very interesting if deepbot is presented to a stronger opponent!

Cheers!

@schreven
Copy link
Collaborator

Regarding loss of information, you might want to take a close look at the paragraph 3.3. I chose the features I deemed most relevant, while trying to keep the number low. But there is almost inevitably loss of information there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants