# COGS 188 - Project Proposal

# Names

- Zak Bamford
- Sreetama Chowdhury
- Joseph Edmonston

# Abstract 

We seek to create a poker bot that will play optimally in order to provide the best possible returns. After developing our initial bot, we will compare its performance against a bot that takes random actions as a baseline. From there, we will iteratively improve our bot, and compare its performance against prior iterations. The bots' performance will be measured by the amount of money they win while playing against other bots over a large number of games to minimize the effects of variance. To improve the bot, we will use Monte Carlo methods to estimate the values of each hand and each set of table cards. These values will then be used to determine whether folding, calling, or raising would be the best play based on the win probability of the hand. We will determine the bot's success by measuring its performance against the random bot and prior iterations of the bot. In addition, we will compare the bot against publicly available poker bots to determine if it is a good competitor.

# Background

Poker is a popular card game that blends skill, strategy, and chance. Players engage in betting on the strength of their hands throughout multiple rounds, with the goal of either having the best hand or convincing their opponents to fold through strategic bluffing. In essence, poker is a "game of chip management,"<a name="basics"></a>[<sup>[1]</sup>](#basics) emphasizing the importance of making decisions based on one's chip count relative to that of their opponents. In addition, poker is a mathematically complex game; there are 2,598,960 unique poker hands <a name="math"></a>[<sup>[2]</sup>](#math) and players can choose to bet as much as they want. This makes poker a great candidate for an AI algorithm, as humans could never evaluate all the possible states.

Several aspects of poker can be algorithmized -- the calculation of probabilities, pot odds, and hand strengths to determine the best actions, such as betting, raising, or folding. Algorithms can model and predict opponents' behavior by analyzing historical data, enabling adaptive strategies that counteract different playing styles. Game theory principles can be employed to devise strategies that minimize losses and maximize gains over the long term, even in the face of unpredictable opponents. There are multiple approaches to poker bots; these include basic strategy bots and bots that adapt to their opponents' moves <a name="bots"></a>[<sup>[3]</sup>](#bots). Basic strategy bots play a mathematically optimal strategy which can be exploited by strong opponents. On the other hand, adaptive bots attempt to change their strategies based on what their opponents do, but this may lead to suboptimal play since bots are not good at reading humans' bluffs.


# Problem Statement

In each possible state of a poker game consisting of the bot’s hand, the table cards, and the pot, we will attempt to find the optimal policy that maximizes our rewards from that state while playing against another poker player. Since poker is partially a game of chance, the bot’s profit over a large number of games will be measured. Initially we will design our bot for Heads-up poker, poker with only two players. This may be expanded to multi bot play.

# Data

Since we will have bots play against each other for training and testing, we should not need an external dataset.

# Proposed Solution

We will use a monte carlo training method. We will run simulations, having our bot play against itself as it learns along with bots that play at random.

The number of possible states without abstraction in poker is far too large to be able to train a model to cover all these states. Because of this, our solution will attempt to reduce the state and action space in a manner that still allows us to have an effective bot.

The state space will be reduced to a set of the following:

 the win probability of your hand to a randomly dealt opponent’s hand. 

The actions that have led up to this state from the start of the round.

The bucketed quantities of money both players have. Again to decrease our state space, we think it might be useful to, instead of representing every possible amount of money our players can have, to represent it as a bucketed ratio of the initial buy in. 

For example, the buy in is represented as 1, the max value you can have is 2, the sum opponent value + your value = 2. Then when you have .846… of your initial buy in we would just bucket it to .85.

The current round of betting that we are in ( pre flop, flop, turn, river)

For example a state might look like (55, preflop, opponent-check, , user=.85, opponent =1.15)
and you would pick an action from this state

The set of actions will also be simplified, in particular what our bot might bet . Our bot can either fold, check, or raise. When it decides to raise, instead of considering every possible bet it will draw from a set of possibilities. It can either decide to bet on the current amount in the pot, the amount it currently owns, or the amount the opponent owns. Then it can bet either ¼, ½, ¾, or the entire quantity. So 12 possible betting options, which can be expanded for greater granularity if needed.

The rewards will be assigned at the end of the round, and will go back up the action tree with a decay value. The reward will just be the amount of money won or lost. It may also be interesting to look at what we might have won/lost if we kept playing and use curLost - might have gained as our reward instead. 

Our solution has a reasonable chance of success because even with an extremely large reduction of states from the true number of states The most important key points of information remain. For example, a low probability hand where your opponent has been consistently betting is all you really need to know to know that you should fold. 

Because of the reduction in states It should be possible to train our bot in a reasonable time frame.


# Evaluation Metrics

 One method to evaluate our bot is to run our trained bot against a randomly betting bot. We can run our bot for a set number of games and see its returns. We can also design a few other dummy bots for it to run against like an all in bot or a cautious bot to see how it does.

# Ethics & Privacy

Given our present plans we don't think this project has too many serious ethical considerations -- it's a fairly simple bot, and while poker is a game played with real stakes, we're not using physical money. It's not real gambling at this point, and we aren't using any data, so there aren't privacy concerns. In the event that an upgraded version of this bot learned so well that it became unbeatable at poker and involved actual monetary bets being made, it might be dangerous when played against by people with gambling issues, but as of right now it's fine. 

# Team Expectations 

* We will meet at least once a week to assign tasks and update each other on our progress.
* We will complete our assigned tasks on time.
* We will communicate respectfully.
* We will inform each other if we're having problems with our tasks so we can unblock each other.

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 5/15 | 6:30 PM |  Start work on proposal  | Discuss completed proposal tasks, make a plan to get the rest of it done, discuss proposal submission |
| 5/20 | 6:30 PM |  Complete proposal | Discuss random bot for benchmark, discuss poker implementation, change proposal as needed based on feedback |
| 5/27 | 6:30 PM |  Implement random bot and poker game | Discuss strategy for Monte Carlo bot, divide tasks for its development |
| 6/3 | 6:30 PM |  Develop Monte Carlo bot | Discuss training and possible improvements to bot, discuss extensions of state space/more players |
| 6/8 | 6:30 PM | Finish bot programming, training, and testing  | Discuss results, assign tasks for final report |
| 6/12 | Before 11:59 PM |  | Submit final report |

# Footnotes
<a name="basics"></a>1.[^](#basics): Bicycle. (n.d.). Basics of poker. Bicycle Cards. https://bicyclecards.com/how-to-play/basics-of-poker <br> 
<a name="math"></a>2.[^](#math): Math of poker - basics: Brilliant math & science wiki. Brilliant. (n.d.). https://brilliant.org/wiki/math-of-poker <br>
<a name="bots"></a>3.[^](#bots): Chaffin, S. (n.d.). How do you combat the perfect poker bot player?. 888 Poker Online. https://www.888poker.com/magazine/strategy/texas-holdem/playing-against-poker-bots
