How to write a poker bot in SwiftUI

Do not index

As some of my followers, and most of my friends know, I love poker. As an engineer, efficiency and automation are always at the forefront of my mind. Therefore, when playing online poker, I always ask myself: How could I make this more efficient?

Obviously, as a lover of the game, I know that cheating, and especially defrauding my opponents is highly unethical, and simply wrong. I would never use, nor would I provide a full tutorial on how to build one.

Yet whenever I played, this inner anxiousness coupled with curiosity continued to resurface. I simply had to prove to myself that I could make such a poker bot if I wanted to.

As is so often in engineering, as well as in physics, ability equals causality.

Because we can

Planning

First we need to make a list of our prerequisites. For a simple proof of concept, we’ll need to be able to satisfy the following requirements:

Be able to capture a poker table within a poker client

Analyze the image to recognize the hands and the table position

Calculate the poker math to decide whether to push, fold, or raise–and by how much.

Provide a User Interface output for the user.

Optionally–send trackpad signals to the operating system to automate the player’s moves.

1. Capturing a table

Poker sites, for many obvious reasons, typically do not provide APIs. Therefore, there’s no easy way for us to have access to the table information, such as player positions, or money and playing cards of any online poker client.

My idea was to use image recognition as an input into our application by taking screenshots of the desktop, and then using a Machine Learning model to recognize what the user sees.

As a disclaimer, it is not my intention to write an application for use in production. Therefore, most of the variables will not be dynamic, but hard-coded. To save time, I’m going to hard code everything to my specific environment.

I’m using my iMac, with the poker client locked onto the bottom left corner of the screen, with Xcode on the complete right side of it.

import Foundation
import SwiftUI

struct Canvas {
    // The image for SwiftUI View
    let image: Image
    // The CGImage for further processing
    let screenshot: CGImage
    // Failable initializer
    init?() {
        // Reference for the number of displays we have
        var displayCount: UInt32 = 0;
        // Updating displayCount as inout variable
        var result = CGGetActiveDisplayList(0, nil, &displayCount)
        // If we encounter issues here, we want to fail
        guard result == CGError.success else {
            // MARK: - TODO, Handle appropriatly
            return nil
        }

        // Get references to out active displays
        let activeDisplays = UnsafeMutablePointer<CGDirectDisplayID>
            .allocate(capacity: Int(displayCount))

        // Update our Core Graphics result object
        result = CGGetActiveDisplayList(displayCount, activeDisplays, &displayCount)
        // If we encounter issues here, we want to fail
        guard result == CGError.success else {
            // MARK: - TODO, Handle appropriatly
            return nil
        }

        // I only care for the my iMac's display, so I hard-code it
        let desktop: CGImage = CGDisplayCreateImage(activeDisplays[Int(0)])!
        // Now I'm getting the hard-coded area my desktop is in
        // This takes some trial and error
        let area = Rect.getTable(in: desktop)
        // Crop the desktop to remain with the Poker table
        self.screenshot = desktop.cropping(to: area)!
        // Lastly turn the image into an image we can display in SwiftUI
        // The idea here is to get a visual representation.
        // So we can check if `desktop` and `area` are proper values
        self.image = Image(nsImage: self.screenshot.asNSImage()!)
    }
}

// Rect is simply a lightweight type used to keep the code clean
enum Rect {
    // This is our hard-coded location of the poker table on my desktop.
    static func getTable(in screenshot: CGImage) -> CGRect {...}
}

The result is a caption of our poker table on our desktop

2. Recognizing hands and table positions

Now that we have a live caption of our table, we need to analyze what the state of the game is. We will write recognizers to read the cards the user is holding, the board on the table (the five community cards), and we will recognize the position on the table in relation to the dealer button.

Since we don’t know when the situation on the table changes, we will also need to perform the recognition described above in a timed interval. I’m going to use one second as the repeat interval.

var body: some View {
  VStack {
    ...
  }
  .frame(maxWidth: .infinity, maxHeight: .infinity)
    // We need to start our runner within the body
    // So we are able to update our immutable view attributes 
    .onAppear(perform: self.run)
  }     
}
// This is our runner that updates our view and kicks off new calculations
private func run() {
  // For the current state of our POC,
  // let's update once per second
  Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
      // First let's get a new capturing of the poker table
      guard let canvas = Canvas() else {
        // MARK: - TODO, Handle appropriatly
        return
      }
  
      // The image to display for us so we see what's going on                                  
      self.image = canvas.image
      // The CGImage we want to use for further calculations                                                      
      self.screenshot = canvas.screenshot
      self.setWholeCards(in: self.screenshot!)
      self.findDealer(in: self.screenshot!)
  }
}

// We're using the screenshot to now analyze the table
private func setWholeCards(in frame: CGImage) {
  // We're getting our left card.
  // We're passing our hard-coded coordinates of the face and the suite.
  // The face is the letter/number of our hole card.
  // For example A/K/T/9/7/6/etc
  // The suite is the picture of the card
  // For example diamond or heart
  // getLeftWholeCardFace and getLeftWholeCardSuite 
  // once again each just return hard-coded CGRects
  getCard(
    face: Rect.getLeftWholeCardFace(in: frame), 
    suite: Rect.getLeftWholeCardSuite(in: frame)) { (face, suite) in
      // Once we have the face and the suite, we'll attach it to our game model
      // I'll show the game model in a minute
      game.setLeftWholeCard(face, suite)
  }
  // Repeat for the right card
  getCard(
    face: Rect.getRightWholeCardFace(in: frame),
    suite: Rect.getRightWholeCardSuite(in: frame)) { (face, suite) in
      // Same as above
      ...                                     
  }
}

// Type alias for better readability
typealias CardCompletion = ((face: String, suite: String)) -> Void
// This function wraps the getFace and getSuite method to provide a simple API
private func getCard(face facePosition: CGRect,
                     suite suitePosition: CGRect,
                     onCompletion: @escaping CardCompletion) {
  // getFace and getSuite will be explained shortly
  getFace(at: facePosition) { face in
    getSuite(at: suitePosition) { suite in
      onCompletion((face, suite))
    }
  }
}

To give everything a nice API, while sticking to the philosophies of clean code and separation of concerns, we’re going to define two models that represent the poker game’s information.

typealias WholeCards = (left: Card, right: Card)

struct Game {
  // The whole card are your left and your right hand.  
  var wholeCards: WholeCards = (.none, .none) {
    didSet {
      // We're printing the new cards to debug the ML models.
      print("New wholeCards: ", wholeCards)
    }
  }

  // MARK: - Public
  // Sets the left whole card for the player
  mutating public func setLeftWholeCard(_ face: String, _ suite: String) {
    // We first need to make sure, that the ML model recognizes valid elements.
    // If we for example got a `D` as Face, we'd not get a valid Face case.
    guard let face = Card.Face(rawValue: face),
          let suite = Card.Suite(rawValue: suite) else {
      // MARK: - TODO, handle appropriately 
      return
    }
    // Initate a new card for Face/Suite
    let card = Card(face, suite)
    // Since we're using SwiftUI, we want to be as efficient as possible.
    // Therefore we're only updating the card, if the card changed.
    guard wholeCards.left != card else { return }
    wholeCards.left = card
  }
  
  // Same as above for the left whole card, just on the right.
  mutating public func setRightWholeCard(_ face: String, _ suite: String) {
    ...
  }
}

struct Card: CustomStringConvertible, Equatable {
  // Overwriting the description to get a decent print log
  var description: String {
    // Example the King of hearts: `Kh`
    return "\(face.rawValue)\(suite.rawValue)"
  }

  // All possible faces
  enum Face: String {
    case A = "A"
    case K = "K"
    case Q = "Q"
    case J = "J"
    case T = "10"
    case Nine = "9"
    case Eight = "8"
    case Seven = "7"
    case Six = "6"
    case Five = "5"
    case Four = "4"
    case Three = "3"
    case Two = "2"
    case None
  }

  // All possible suites
  enum Suite: String {
    case Clubs = "c"
    case Hearts = "h"
    case Spades = "s"
    case Diamonds = "d"
    case None
  }

  let face: Face
  let suite: Suite

  init(_ face: Face, _ suite: Suite) {
    self.face = face
    self.suite = suite
  }

  // MARK: - Public
  public func get(_ face: String, _ suite: String) -> Card {
    return Card(
      Face(rawValue: face)!,
      Suite(rawValue: suite)!
    )
  }

  // MARK: - Static
  // For easy access in SwiftUI previews
  static var none: Card {
    return Card(
      .None,
      .None
    )
  }
}

We now have a visual representation of the poker table and we have nice, defined models to handle the information. To extrapolate information from the screenshots, we’re going to use Machine Learning.

The first step is to train an ML model. To train it, we need to feed it as many examples as possible.

We want to be able to recognize the so-called Suite of a poker card, therefore we will take screenshots of the poker client’s poker cards. We will need screenshots of as many Spades, Hearts, Clubs, and Diamonds as possible.

The Face, or the value of the card, such as King or a Two, are typefaces–letters and numbers that a Text Recognizer is much more qualified for.

Screenshots of different spades cards

Screenshots of different clubs cards

Screenshots of different diamond cards

Screenshots of different hearts cards

Surprisingly, Apple’s documentation on how to create an ML model is more than sufficient, and it took me less than 5 minutes to train the model with 100% accuracy.

I give my appreciation to Apple for this beautiful Framework. Thank you.

import Foundation
import Vision

struct ImageRecognizer {
    
  static func get(from image: CGImage, onComplete: @escaping (String) -> Void) {
    // Kick off a new ML Configuration.
    // Here I'm not sure if it's a better idea 
    // to create one configurator for the whole class,
    // or to create a new one every time we want to recognize stuff.
    // Didn't spend enough time with it to care, honestly.
    let configurator = MLModelConfiguration()
    // Kicking off a new classifier for out Suites model
    let classifier = try! Suites(configuration: configurator).model
    // Getting a model 
    // MARK: - TODO, do not force unwrap
    let model = try! VNCoreMLModel(for: classifier)
    // And using it
    handle(VNCoreMLRequest(model: model) { (finished, error) in
      if let error = error {
        // MARK: - TODO, Handle appropriately
        return
      }
      guard let results = finished.results as? [VNClassificationObservation],
            let first = results.first else {
        // MARK: - TODO, Handle appropriately
        return
      }
      onComplete(first.identifier)
    }, for: image)
  }

  // MARK: - Private
  // We have extracted the handler into its own class to clean up the code.
  // Without it we've had a whole bunch of nesting.
  private static func handle(_ request: VNCoreMLRequest, for image: CGImage) {
    do {
      try VNImageRequestHandler(cgImage: image, options: [:]).perform([request])
    } catch {
      fatalError(error.localizedDescription)
    }
  }
}

I performed a Google search to learn how to use the Text Recognizer, and the answer was quite surprising. It’s absolutely simple and works very well.

import Foundation
import Vision

struct TextRecognizer {
 
  static func get(from image: CGImage, onComplete: @escaping (String) -> Void) {
    // We kick off a text recognition handler
    handle(VNRecognizeTextRequest { (request, error) in
      // First we need to make sure, that the request returned results
      guard let observations = request.results as? [VNRecognizedTextObservation] else {
        // MARK: - TODO, handle appropriately
        return
      }
      // We're going to iterate over all possible results                             
      observations.forEach { observation in
        // And for each result, we're only going to care for the top result
        // Since our recognizer has to return at max one letter,
        // this should be a fairly acceptable way of handling that.                    
        observation.topCandidates(1).forEach { text in
          // Lastly we're passing the String to our closure                                    
          onComplete(text.string)
        }
      }
    }, for: image)
  }

  // MARK: - Private
  // We have extracted the handler into its own class to clean up the code.
  // Without it we've had a whole bunch of nesting.
  private static func handle(_ request: VNRecognizeTextRequest, for image: CGImage) {
    // Since we only need one letter, we can go with the highest accuracy
    request.recognitionLevel = .accurate
    // The language is en_GB 
    request.recognitionLanguages = ["en_GB"]
    // I found, that the letter Q wasn't always recognized well.
    // Adding it to custom words solved the issue for good.
    request.customWords = ["Q"]

    do {
      try VNImageRequestHandler(cgImage: image, options: [:]).perform([request])
    } catch {
      fatalError(error.localizedDescription)
    }
  }
}

Our application now has everything it needs to accurately read all the cards of our online poker session, in real time.

Where to go from here?

Further problems we need to solve require the same techniques we have explored thus far. We need to know the user’s table position. Is he the dealer, or is he the Big Blind?

My idea was to train the Image Recognizer to the position of the Dealer Button’s table positions. I would modify the screenshot to overlay the cards and chips on the table, so the model doesn’t misinterpret different cards on the table as logical reasons for a specific table position.

We solely want the dealer button’s position.

Next, we’ll need to read the community cards, called The Board. Then, we’ll need every player’s chip stack. Poker players use all these factors to make a mathematical decision for the next move.

As I said at the beginning, I love poker. That’s the reason why this article ends here. The purpose of the article was to document the fact that I was able to prove to myself that I could create this. It is not my intention to provide a step by step tutorial.

My biggest surprise with this experiment was how simple it was to train and use an ML model. Thank you for spending your time reading about my journey. I hope you enjoyed it.

Make sure to follow me on Twitter and say hi if you have feedback, questions, or requests for future articles like this.

How to write a poker bot in SwiftUI

Planning

1. Capturing a table

2. Recognizing hands and table positions

Where to go from here?

Join other 3200+ marketers now!

Stay Ahead: Subscribe to the Newsletter!