
Do not index
Do not index
As some of my followers, and most of my friends know, I love poker. As an engineer, efficiency and automation are always at the forefront of my mind. Therefore, when playing online poker, I always ask myself: How could I make this more efficient?
Obviously, as a lover of the game, I know that cheating, and especially defrauding my opponents is highly unethical, and simply wrong. I would never use, nor would I provide a full tutorial on how to build one.
Yet whenever I played, this inner anxiousness coupled with curiosity continued to resurface. I simply had to prove to myself that I could make such a poker bot if I wanted to.
As is so often in engineering, as well as in physics, ability equals causality.
Because we can
Planning
First we need to make a list of our prerequisites. For a simple proof of concept, we’ll need to be able to satisfy the following requirements:
- Be able to capture a poker table within a poker client
- Analyze the image to recognize the hands and the table position
- Calculate the poker math to decide whether to push, fold, or raise–and by how much.
- Provide a User Interface output for the user.
- Optionally–send trackpad signals to the operating system to automate the player’s moves.
1. Capturing a table
Poker sites, for many obvious reasons, typically do not provide APIs. Therefore, there’s no easy way for us to have access to the table information, such as player positions, or money and playing cards of any online poker client.
My idea was to use image recognition as an input into our application by taking screenshots of the desktop, and then using a Machine Learning model to recognize what the user sees.
As a disclaimer, it is not my intention to write an application for use in production. Therefore, most of the variables will not be dynamic, but hard-coded. To save time, I’m going to hard code everything to my specific environment.
I’m using my iMac, with the poker client locked onto the bottom left corner of the screen, with Xcode on the complete right side of it.
import Foundation
import SwiftUI
struct Canvas {
// The image for SwiftUI View
let image: Image
// The CGImage for further processing
let screenshot: CGImage
// Failable initializer
init?() {
// Reference for the number of displays we have
var displayCount: UInt32 = 0;
// Updating displayCount as inout variable
var result = CGGetActiveDisplayList(0, nil, &displayCount)
// If we encounter issues here, we want to fail
guard result == CGError.success else {
// MARK: - TODO, Handle appropriatly
return nil
}
// Get references to out active displays
let activeDisplays = UnsafeMutablePointer<CGDirectDisplayID>
.allocate(capacity: Int(displayCount))
// Update our Core Graphics result object
result = CGGetActiveDisplayList(displayCount, activeDisplays, &displayCount)
// If we encounter issues here, we want to fail
guard result == CGError.success else {
// MARK: - TODO, Handle appropriatly
return nil
}
// I only care for the my iMac's display, so I hard-code it
let desktop: CGImage = CGDisplayCreateImage(activeDisplays[Int(0)])!
// Now I'm getting the hard-coded area my desktop is in
// This takes some trial and error
let area = Rect.getTable(in: desktop)
// Crop the desktop to remain with the Poker table
self.screenshot = desktop.cropping(to: area)!
// Lastly turn the image into an image we can display in SwiftUI
// The idea here is to get a visual representation.
// So we can check if `desktop` and `area` are proper values
self.image = Image(nsImage: self.screenshot.asNSImage()!)
}
}
// Rect is simply a lightweight type used to keep the code clean
enum Rect {
// This is our hard-coded location of the poker table on my desktop.
static func getTable(in screenshot: CGImage) -> CGRect {...}
}

The result is a caption of our poker table on our desktop
2. Recognizing hands and table positions
Now that we have a live caption of our table, we need to analyze what the state of the game is. We will write recognizers to read the cards the user is holding, the board on the table (the five community cards), and we will recognize the position on the table in relation to the dealer button.
Since we don’t know when the situation on the table changes, we will also need to perform the recognition described above in a timed interval. I’m going to use one second as the repeat interval.
var body: some View {
VStack {
...
}
.frame(maxWidth: .infinity, maxHeight: .infinity)
// We need to start our runner within the body
// So we are able to update our immutable view attributes
.onAppear(perform: self.run)
}
}
// This is our runner that updates our view and kicks off new calculations
private func run() {
// For the current state of our POC,
// let's update once per second
Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
// First let's get a new capturing of the poker table
guard let canvas = Canvas() else {
// MARK: - TODO, Handle appropriatly
return
}
// The image to display for us so we see what's going on
self.image = canvas.image
// The CGImage we want to use for further calculations
self.screenshot = canvas.screenshot
self.setWholeCards(in: self.screenshot!)
self.findDealer(in: self.screenshot!)
}
}
// We're using the screenshot to now analyze the table
private func setWholeCards(in frame: CGImage) {
// We're getting our left card.
// We're passing our hard-coded coordinates of the face and the suite.
// The face is the letter/number of our hole card.
// For example A/K/T/9/7/6/etc
// The suite is the picture of the card
// For example diamond or heart
// getLeftWholeCardFace and getLeftWholeCardSuite
// once again each just return hard-coded CGRects
getCard(
face: Rect.getLeftWholeCardFace(in: frame),
suite: Rect.getLeftWholeCardSuite(in: frame)) { (face, suite) in
// Once we have the face and the suite, we'll attach it to our game model
// I'll show the game model in a minute
game.setLeftWholeCard(face, suite)
}
// Repeat for the right card
getCard(
face: Rect.getRightWholeCardFace(in: frame),
suite: Rect.getRightWholeCardSuite(in: frame)) { (face, suite) in
// Same as above
...
}
}
// Type alias for better readability
typealias CardCompletion = ((face: String, suite: String)) -> Void
// This function wraps the getFace and getSuite method to provide a simple API
private func getCard(face facePosition: CGRect,
suite suitePosition: CGRect,
onCompletion: @escaping CardCompletion) {
// getFace and getSuite will be explained shortly
getFace(at: facePosition) { face in
getSuite(at: suitePosition) { suite in
onCompletion((face, suite))
}
}
}
To give everything a nice API, while sticking to the philosophies of clean code and separation of concerns, we’re going to define two models that represent the poker game’s information.
typealias WholeCards = (left: Card, right: Card)
struct Game {
// The whole card are your left and your right hand.
var wholeCards: WholeCards = (.none, .none) {
didSet {
// We're printing the new cards to debug the ML models.
print("New wholeCards: ", wholeCards)
}
}
// MARK: - Public
// Sets the left whole card for the player
mutating public func setLeftWholeCard(_ face: String, _ suite: String) {
// We first need to make sure, that the ML model recognizes valid elements.
// If we for example got a `D` as Face, we'd not get a valid Face case.
guard let face = Card.Face(rawValue: face),
let suite = Card.Suite(rawValue: suite) else {
// MARK: - TODO, handle appropriately
return
}
// Initate a new card for Face/Suite
let card = Card(face, suite)
// Since we're using SwiftUI, we want to be as efficient as possible.
// Therefore we're only updating the card, if the card changed.
guard wholeCards.left != card else { return }
wholeCards.left = card
}
// Same as above for the left whole card, just on the right.
mutating public func setRightWholeCard(_ face: String, _ suite: String) {
...
}
}
struct Card: CustomStringConvertible, Equatable {
// Overwriting the description to get a decent print log
var description: String {
// Example the King of hearts: `Kh`
return "\(face.rawValue)\(suite.rawValue)"
}
// All possible faces
enum Face: String {
case A = "A"
case K = "K"
case Q = "Q"
case J = "J"
case T = "10"
case Nine = "9"
case Eight = "8"
case Seven = "7"
case Six = "6"
case Five = "5"
case Four = "4"
case Three = "3"
case Two = "2"
case None
}
// All possible suites
enum Suite: String {
case Clubs = "c"
case Hearts = "h"
case Spades = "s"
case Diamonds = "d"
case None
}
let face: Face
let suite: Suite
init(_ face: Face, _ suite: Suite) {
self.face = face
self.suite = suite
}
// MARK: - Public
public func get(_ face: String, _ suite: String) -> Card {
return Card(
Face(rawValue: face)!,
Suite(rawValue: suite)!
)
}
// MARK: - Static
// For easy access in SwiftUI previews
static var none: Card {
return Card(
.None,
.None
)
}
}
We now have a visual representation of the poker table and we have nice, defined models to handle the information. To extrapolate information from the screenshots, we’re going to use Machine Learning.
The first step is to train an ML model. To train it, we need to feed it as many examples as possible.
We want to be able to recognize the so-called Suite of a poker card, therefore we will take screenshots of the poker client’s poker cards. We will need screenshots of as many Spades, Hearts, Clubs, and Diamonds as possible.
The Face, or the value of the card, such as King or a Two, are typefaces–letters and numbers that a Text Recognizer is much more qualified for.

Screenshots of different spades cards

Screenshots of different clubs cards

Screenshots of different diamond cards

Screenshots of different hearts cards
Surprisingly, Apple’s documentation on how to create an ML model is more than sufficient, and it took me less than 5 minutes to train the model with 100% accuracy.
I give my appreciation to Apple for this beautiful Framework. Thank you.
import Foundation
import Vision
struct ImageRecognizer {
static func get(from image: CGImage, onComplete: @escaping (String) -> Void) {
// Kick off a new ML Configuration.
// Here I'm not sure if it's a better idea
// to create one configurator for the whole class,
// or to create a new one every time we want to recognize stuff.
// Didn't spend enough time with it to care, honestly.
let configurator = MLModelConfiguration()
// Kicking off a new classifier for out Suites model
let classifier = try! Suites(configuration: configurator).model
// Getting a model
// MARK: - TODO, do not force unwrap
let model = try! VNCoreMLModel(for: classifier)
// And using it
handle(VNCoreMLRequest(model: model) { (finished, error) in
if let error = error {
// MARK: - TODO, Handle appropriately
return
}
guard let results = finished.results as? [VNClassificationObservation],
let first = results.first else {
// MARK: - TODO, Handle appropriately
return
}
onComplete(first.identifier)
}, for: image)
}
// MARK: - Private
// We have extracted the handler into its own class to clean up the code.
// Without it we've had a whole bunch of nesting.
private static func handle(_ request: VNCoreMLRequest, for image: CGImage) {
do {
try VNImageRequestHandler(cgImage: image, options: [:]).perform([request])
} catch {
fatalError(error.localizedDescription)
}
}
}
I performed a Google search to learn how to use the Text Recognizer, and the answer was quite surprising. It’s absolutely simple and works very well.
import Foundation
import Vision
struct TextRecognizer {
static func get(from image: CGImage, onComplete: @escaping (String) -> Void) {
// We kick off a text recognition handler
handle(VNRecognizeTextRequest { (request, error) in
// First we need to make sure, that the request returned results
guard let observations = request.results as? [VNRecognizedTextObservation] else {
// MARK: - TODO, handle appropriately
return
}
// We're going to iterate over all possible results
observations.forEach { observation in
// And for each result, we're only going to care for the top result
// Since our recognizer has to return at max one letter,
// this should be a fairly acceptable way of handling that.
observation.topCandidates(1).forEach { text in
// Lastly we're passing the String to our closure
onComplete(text.string)
}
}
}, for: image)
}
// MARK: - Private
// We have extracted the handler into its own class to clean up the code.
// Without it we've had a whole bunch of nesting.
private static func handle(_ request: VNRecognizeTextRequest, for image: CGImage) {
// Since we only need one letter, we can go with the highest accuracy
request.recognitionLevel = .accurate
// The language is en_GB
request.recognitionLanguages = ["en_GB"]
// I found, that the letter Q wasn't always recognized well.
// Adding it to custom words solved the issue for good.
request.customWords = ["Q"]
do {
try VNImageRequestHandler(cgImage: image, options: [:]).perform([request])
} catch {
fatalError(error.localizedDescription)
}
}
}
Our application now has everything it needs to accurately read all the cards of our online poker session, in real time.

Where to go from here?
Further problems we need to solve require the same techniques we have explored thus far. We need to know the user’s table position. Is he the dealer, or is he the Big Blind?
My idea was to train the Image Recognizer to the position of the Dealer Button’s table positions. I would modify the screenshot to overlay the cards and chips on the table, so the model doesn’t misinterpret different cards on the table as logical reasons for a specific table position.
We solely want the dealer button’s position.
Next, we’ll need to read the community cards, called The Board. Then, we’ll need every player’s chip stack. Poker players use all these factors to make a mathematical decision for the next move.
As I said at the beginning, I love poker. That’s the reason why this article ends here. The purpose of the article was to document the fact that I was able to prove to myself that I could create this. It is not my intention to provide a step by step tutorial.
My biggest surprise with this experiment was how simple it was to train and use an ML model. Thank you for spending your time reading about my journey. I hope you enjoyed it.
Make sure to follow me on Twitter and say hi if you have feedback, questions, or requests for future articles like this.