As some of my followers, and most of my friends know, I love poker. As an engineer, efficiency and automation are always at the forefront of my mind. Therefore, when playing online poker, I always ask myself: How could I make this more efficient?

Obviously, as a lover of the game, I know that cheating, and especially defrauding my opponents is highly unethical, and simply wrong. I would never use, nor would I provide a full tutorial on how to build one.

Yet whenever I played, this inner anxiousness coupled with curiosity continued to resurface. I simply had to prove to myself that I could make such a poker bot if I wanted to.

As is so often in engineering, as well as in physics, ability equals causality.

Because we can


First we need to make a list of our prerequisites. For a simple proof of concept, we'll need to be able to satisfy the following requirements:

  1. Be able to capture a poker table within a poker client
  2. Analyze the image to recognize the hands and the table position
  3. Calculate the poker math to decide whether to push, fold, or raise–and by how much.
  4. Provide a User Interface output for the user.
  5. Optionally–send trackpad signals to the operating system to automate the player's moves.

1. Capturing a table

Poker sites, for many obvious reasons, typically do not provide APIs. Therefore, there's no easy way for us to have access to the table information, such as player positions, or money and playing cards of any online poker client.

My idea was to use image recognition as an input into our application by taking screenshots of the desktop, and then using a Machine Learning model to recognize what the user sees.

As a disclaimer, it is not my intention to write an application for use in production. Therefore, most of the variables will not be dynamic, but hard-coded. To save time, I'm going to hard code everything to my specific environment.  

I'm using my iMac, with the poker client locked onto the bottom left corner of the screen, with Xcode on the complete right side of it.

The result is a caption of our poker table on our desktop

2. Recognizing hands and table positions

Now that we have a live caption of our table, we need to analyze what the state of the game is. We will write recognizers to read the cards the user is holding, the board on the table (the five community cards), and we will recognize the position on the table in relation to the dealer button.

Since we don't know when the situation on the table changes, we will also need to perform the recognition described above in a timed interval. I'm going to use one second as the repeat interval.

To give everything a nice API, while sticking to the philosophies of clean code and separation of concerns, we’re going to define two models that represent the poker game's information.

We now have a visual representation of the poker table and we have nice, defined models to handle the information. To extrapolate information from the screenshots, we're going to use Machine Learning.

The first step is to train an ML model. To train it, we need to feed it as many examples as possible.

We want to be able to recognize the so-called Suite of a poker card, therefore we will take screenshots of the poker client's poker cards. We will need screenshots of as many Spades, Hearts, Clubs, and Diamonds as possible.

The Face, or the value of the card, such as King or a Two, are typefaces–letters and numbers that a Text Recognizer is much more qualified for.

Screenshots of different spades cards
Screenshots of different clubs cards
Screenshots of different diamond cards
Screenshots of different hearts cards

Surprisingly, Apple’s documentation on how to create an ML model is more than sufficient, and it took me less than 5 minutes to train the model with 100% accuracy.

I give my appreciation to Apple for this beautiful Framework. Thank you.

I performed a Google search to learn how to use the Text Recognizer, and the answer was quite surprising. It's absolutely simple and works very well.

Our application now has everything it needs to accurately read all the cards of our online poker session, in real time.

The application recognized every single hand that it was intended too

Where to go from here?

Further problems we need to solve require the same techniques we have explored thus far. We need to know the user's table position. Is he the dealer, or is he the Big Blind?

My idea was to train the Image Recognizer to the position of the Dealer Button's table positions. I would modify the screenshot to overlay the cards and chips on the table, so the model doesn’t misinterpret different cards on the table as logical reasons for a specific table position.

We solely want the dealer button's position.

Next, we'll need to read the community cards, called The Board. Then, we'll need every player's chip stack. Poker players use all these factors to make a mathematical decision for the next move.

As I said at the beginning, I love poker. That’s the reason why this article ends here. The purpose of the article was to document the fact that I was able to prove to myself that I could create this. It is not my intention to provide a step by step tutorial.

My biggest surprise with this experiment was how simple it was to train and use an ML model. Thank you for spending your time reading about my journey. I hope you enjoyed it.