Title
Our project aims to create a poker solver that is trained on low-stakes gameplay.
Who
Jason Silva (Jsilva13) Torsten Ullrich (tullric1) Ken Ngamprasertsith (tngampra)
Introduction
The goal of our project is to use a neural network in order to estimate the EV of certain actions in poker. We will be focusing on No Limit Texas Hold'Em the most popular variety of poker played in casinos all around America. By training our solver on low-stakes (and notably far from theoretically optimal) gameplay, our hope is the estimated EV will be more in tune with your average poker player. Other solvers estimate EV using perfect play from both sides, a task that is nigh-impossible even for the best professionals to recreate. This will be a supervised learning regression task, as the model aims to estimate an expected value.
Related work
https://www.deepstack.ai/ https://www.cs.hmc.edu/~ktantia/poker.html
This second project is most alike to what we are attempting to accomplish here. The team created a bot using Q-learning that would consult its RL policy at any timestep. The largest shortcoming of this project is the data collection - the bots were trained on essential random poker data, and therefore they saw incredibly unusual behavior of the bot after training, such as folding every hand to avoid loss. Most pedagogical poker solvers like these two projects also focus on heads-up play. We are attempting to create a solution for a 6-player (which can be scaled to other sizes) poker table.
Data
Since we envision this project to provide insight into average poker players, we will acquire data from very low stakes, often dubbed 'microstakes' (read, 5$ buy-in). Millions of hands at these stakes can be purchased online for a relatively small cost. This data is a summary of the action within the hand. Given our knowledge of poker, we will create a parser that captures relevant information from within the hand summary, such as sequences of moves, sizes of bets, and player positions. This parser will likely take a large chunk of the time in this project's implementation.
Our database will consist of millions of these hand histories, which we will parse down into a vector representation capturing what we deem to be the relevant features based on common poker heuristics.
Methodology
At this time, we are planning to simply use a dense network. The real challenge of this project will be feature selection. Since we are not applying algorithms such as convolution to pick out features, we will need to be careful in designing our vector representations so that the model can learn with only dense layers.
A benefit of feature selecting ourselves is also that the model is highly interpretable, which means that it can have a practical application. After playing a hand, for example, one could consult the application to determine whether one selected the highest expected value move.
Our model differs from other poker solver applications in a few key ways.
- We do not limit ourselves to heads-up (2-player) poker.
- We focus on interpretability and practicality (what is often called 'exploitation' in poker) in our algorithm, not theoretical perfection. This can be contrasted to industry poker solvers which, by brute force, determine a strategy that is guaranteed to be profitable in the long run.
- Our data is collected from real play, which allows us to capture real-life player tendencies rather than those of some artificial agent.
Metrics
Our loss function will be very simple: does the model predict the outcome of the hand, down to the specific dollar amount? There will be a few challenges with this approach:
- Poker is a game of chance. In an earlier round of action, a move can be theoretically strong but misfortune can strike. In short, there is variability even when making the same exact move. This effect will dissipate in the later stages of a hand, where more information is known and less chance is to come.
- How accurate, to the dollar amount, can a simplified representation be? If we define general bet sizings as a percentage of the pot instead of specific dollar amounts, the solver is more interpretable and useable but can output a less fine-tuned expected value.
- A lot of moves in poker are close to break-even. Will this lead to predictions very close to 0 in order to minimize the loss function? Will our algorithm hack the reward function?
With this in mind, here are our goals:
- Base goal: Create a solver which behaves as we would expect in some common situations. For example, raise pocket aces (AA) preflop, do not fold it. This is objectively a good move in poker, and our solver should be able to capture it.
- Target goal: Be able to recognize common heuristics in our solver. For example, we would expect the player who raised before the flop (before community cards are presented) to bet a large frequency of the time on certain flops.
- Stretch goal: In situations where there is no clear theoretical best option, we hope to see that the solver makes exploitative adjustments. For example, theory usually mixes moves on certain rounds which are similar in theoretical value. However, given player tendencies -- such as calling too many attempted bluffs -- we would hope to see the solver make some adjustments not predicted by brute force solvers.
Ethics
What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?
Our dataset is scraped from a popular online poker site and provides hand histories from users on that site. This can be viewed as an invasion of privacy, as these players are likely not consenting (whether or not it is in the fine print of their site) to their data being used.
In terms of bias, our data set is biased, and that's kind of the point. We selected the dataset mentioned (microstakes, 5$ buy-in) because we hope to investigate poker within a population of weaker players. Here, we make the assumption that players at these stakes are weaker players, which is true by and large. However, there may be some strong players within these stakes. For example, players in countries with lower costs of living and incomes may play these stakes for a living. We do not think this presents any real ethical issues, as we are not using this data maliciously against such players specifically, and this information is important to the solver as it represents the real landscape we are trying to understand.
How are you planning to quantify or measure error or success? What implications does your quantification have?
We are planning to quantify success purely in terms of determining the action which presents the highest expected value. While this is usually the best option for a poker player trying to play good poker, there are extreme scenarios where this metric is not the best.
For example, we may want to avoid variance at very high stakes. If you buy in to a poker game for $1,000,000 and are faced with a decision that theoretically would yield you $1000 every time you make it, but can range from a -$1,000,000 to +$1,000,000 decision, you may choose to forgo this spot and go for the move that yields you $100 but does not risk your ruin.
In poker, there is something called bankroll management which helps you select games where you can pursue theoretically optimal plays without worrying about variance. However, there is still some merit to considering situations where variance plays a roll.
In terms of a possible extension, risk of ruin is very important in tournament poker. When facing a pay jump between placements, there is merit to not pursuing pure chip gain when the risk of ruin would result in missed dollar profits.
Since our solver purely measures the expected value, these subtleties cannot be represented. But again, for our purposes, that is entirely fine. It should just be noted that the solver is not applicable to every game of poker with our intended design.
Division of Labor
Jason and Torsten are very familiar with poker theory, and as such will do the brunt of the work in data parsing and feature selection. Ken will be focused on programmatic challenges, such as designing and tuning the model.
Reflection for Check-in 3
https://docs.google.com/document/d/1ayWOn_wg-twUc6Dmsj4yp6XHJ1eXdVTOGcQNn5juTJw/edit (This link should be accessible to anyone with a Brown email address)
Final Reflection
https://docs.google.com/document/d/1hSJn3vL2-VTsYUGhyL8G0r5PYaw9zkjwXYEC-Ia5D-I/edit?usp=sharing
Log in or sign up for Devpost to join the conversation.