-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on expanding research possibilities #4
Comments
Hello thank you for your message! First think to note is that this bot was aimed at 6 handed, but Sit and Go. DeepStack is for cash games, as are most AIs out there. The first iteration of Deepbot was also for cash games, but Heads Up. See I don't see any blockers to having it work for a 6 handed cash game. The main thing will be about the reward function. First generations are random, so lots of them will go all in all the time. It's important to play enough hands to not only maintain the genes of the 'lucky' instances that won a couple all-ins. In practical terms this bot here is built to play on Thinking about it more theoretically, I think this bot is also able to learn from 'smarter' AIs. The one presented to him here were wrecked so it definitely has more capacity. Also the genetic algorithm is quite robust, so the learning should not be blocked at some local minimal or such. However I do not know if/when it will hit a limit. The network could always be made wider / deeper, but still. You'll find much more info in the I have a couple. In what language is your version of DeepStack? Is it compatible with a public api? Do you want to play on a uniform table composition (all the same bots) or not? Are you aiming for opponent modelling, or rather approaching an optimum? |
Thank you so much for the thoughtful answer ! Here are my answers to your questions: Our work is written of Lua and we are trying to make it compatible with Python, as well as a public api. We may first try a table composition of 4 fish + 1 deepstacks + 1 deepbot, hoping that our AI can not only beat itself in the future generation, but alse learn multiple play styles from "opponent modelling" to be competitive against a variety of possible opponents. We consider opponent modelling as a vital approach in multi-players games, and we aim to give it a try in the next step of poker and other application area. It would be great if it worked! So I want to ask a few more questions for the Deepbot: how the opp network model the aggressiveness of opponent, or is it implicited at the parameter level? What do you think about the possible loss of information after 'average' between the opp net work and decision network in 6-handed situation? Thanks again for your work and experience sharing! |
You're welcome! I'm happy to hear about about another project, and it seems we approach it in a similar way. I agree opponent modelling is a vital approach, and started with that in mind. The other thing I considered important was to have visibility on the strategy of the bot. This way human players can compare their playstyle to it, and get some insights. Additionally, it's useful in order to validate the bot, and get ideas on how to improve it. I think it's a great approach to start with the table composition you describe (4 fish + 1 deepstack + 1 deepbot). When you say "hoping that our AI can (not only) beat itself in the future generation", you mean beat deepstack? From reading other work at the time, my intuition is that deepbot could beat deepstack, but only on such a table composition. It would however require significantly more work to beat deepstack "heads on". In other words it might lose in hands against deepstack but compensate enough on the rest. It will likely also learn to avoid hands against deepstack (the strong opponent). The aggressiveness is modeled at the parameter level yes. To take a step back, the modeling happens in the memory of the lstm's. So if we take a deepbot that is fully trained; the lstm's will record and take into account past actions / results. It's parameters will change accordingly. Then at some interval its parameters are reset. That duration is it's memory span. As is displayed on figure 4.5, there is memory at the scale of the "session" and another at the scale of a "round". For sit and go a "session" lasts the length of the sit and go. For cash games I think I put it at 500 hands. So this part is modelling the general strategy of the opponent. Then what I named a "round" is one poker hand (from preflop to river / showdown / all fold). I used this wording as "hand" is often used for other things in poker and it can get confusing. So these "round" lstm's are reset at each "round", and rather model the aggressiveness of the opponent player for that hand. Indirectly it models the opponent's hand, or at least what he is representing.
Good point. Indeed there must be loss of information there. For example it might be that one opponent's behaviour is more important than the other one, and here the averaging does not it into account. The averaging could be replaced by another neural net layer. This leads me to the next thing I would have done if I continued this project: I would have generated a graphic like 5.13, but where each quadrant is a player at the table, instead of a full table composition. Currently there is no validation on the (individual) opponent modelling, there is only table modelling. Given that the opponents are not so strong, it could be that the bot got away with doing only that. This is one of the reasons I also think it can be very interesting if deepbot is presented to a stronger opponent! Cheers! |
Regarding loss of information, you might want to take a close look at the paragraph 3.3. I chose the features I deemed most relevant, while trying to keep the number low. But there is almost inevitably loss of information there |
Dear author:
It is really an excellent job on poker bot!!!
The text was updated successfully, but these errors were encountered: