top of page
Overview
We began this project with the idea to create a machine that learns how to play – and, hopefully, win – poker. What follows is a brief breakdown of how this idea was implemented via machine learning.
What is Monte Carlo?
Monte Carlo methods use random sampling to solve problems. These algorithms work well for problems with incomplete information; they do not require full knowledge of the environment to decide how to act. In the context of poker, this means that we can train a bot with only information that a human player would have - we do not need to know the other player’s cards. This serves us well since, as mentioned in The Code/Implementation, we had initially planned to train our bot on historic poker datasets, but were unable to find completely transparent documentation of real poker games.
​
Taking Action
Before the bot is trained, it has an equal probability of taking any possible action, and chooses with complete randomness. Note that, in our game of poker, only four possible actions exist: raising, calling, checking, or folding.
Eventually, due to the rules of the environment (i.e., the rules of poker following its moves and its opponents moves), the bot will be rewarded or penalized for said actions – proportional to the amount of money it has gained or lost at the game’s end.
Continuing to play games, and averaging the sampled rewards for every state/action pair it encounters, will give the bot a decent approximation of what action produces what rewards at any state of the game.
We usually want this policy to be “greedy”, feeding its own improvement by returning the single action which maximizes reward. However, in order to encourage semi-exploratory behavior, we have the bot pseudorandomly choose from a weighted matrix of actions. That is, the “best” action has an increased probability of being chosen, but the algorithm may still explore unseen paths to find an even more rewarding strategy.
Research suggests that this problem is best solved via reinforcement learning, which is concerned with how agents "take actions in an environment so as to maximize some notion of cumulative reward” (Reinforcement Learning, 2017).
In other words, we want the bot to develop a policy which maps the state of the environment to the best possible action. For PokerBot, we do this by employing a Monte Carlo method.
​
Image taken from en.wikipedia.org/wiki/Reinforcement_learning
bottom of page