Based on Pluribus · Brown & Sandholm · Science 2019

Game-Theoretically
Optimal Poker AI

A complete GTO poker solver built from the ground up — the same architecture as the AI that beat the world's best professionals. Open source. Browser-based. Runs in 75ms.

♠ Get PokerAI View on GitHub →

Architecture

Five stages. One solver.

Built progressively from foundational game theory through neural network approximation to real-time search — every algorithm readable and documented.

CFR Convergence — Live Demo

Nash equilibrium → EV = 0

King (strongest)

99%

Queen (mid)

82%

Jack (weakest)

34%

Nash EV

≈ 0

Stage 1 + 2

🧮

Vanilla CFR + MCCFR

Counterfactual Regret Minimization on Leduc Hold'em. 216 information sets, convergence to Nash equilibrium within 10,000 iterations in 2.2 seconds. External sampling variant runs 1.9× faster than full tree traversal.

Zinkevich et al. 2007

Stage 3

🗂️

Card Abstraction via EMD

k-means clustering with Earth Mover's Distance over equity histograms — capturing the strategic difference between made hands and draws, not just mean equity. 8 preflop / 12 flop / 12 turn / 8 river buckets.

EMD clustering

Stage 4

🧠

Deep CFR

Neural network approximation of counterfactual regret. 256-unit × 3-layer networks with LayerNorm, reservoir buffers with uniform sampling guarantee, and linear CFR weighting for 2× faster convergence.

Brown & Sandholm 2019

Stage 5

⚡

Real-Time Subgame Search

Depth-limited MCCFR at decision time, using the blueprint as a leaf-node oracle. Blueprint bootstrapping stabilizes early search. 75ms average decision time on CPU — the Pluribus technique.

Pluribus technique

Benchmarks

What the numbers show

Tournament evaluation across 300 duplicate hand pairs. Duplicate scoring controls for card luck — each deal played twice with agents swapping seats.

Matchup	mBB / hand	95% CI	Result
Blueprint vs Random	+28,403	±5,789	✓ Significant
Search vs Random	+28,134	±5,686	✓ Significant
Search vs Blueprint	+31,798	±5,615	✓ Significant

mBB = milli-big-blinds per hand. Margins reflect comparison against a random baseline. Real-time search consistently outperforms blueprint-only play — the core result from Pluribus.

Quick Start

Up and running in minutes

terminal

# Clone and install
git clone https://github.com/griff-ui/poker-ai.git && cd poker-ai
pip install -r requirements.txt

# Stage 1+2: Leduc Hold'em CFR (~2 seconds)
python main.py --iterations 50000 --mode both

# Stage 4: Deep CFR training (~30 min on CPU)
python deep_cfr/run_convergence.py

# Stage 5: Tournament evaluation
python stage5/evaluate.py --hands 300
    

python — real-time search agent

from deep_cfr.game_engine import GameState, deal_hand
from deep_cfr.networks   import DeepCFRPlayer, MAX_ACTIONS
from stage5.search       import RealTimeAgent, SearchConfig, SearchMode

players = [DeepCFRPlayer(p, GameState.feature_dim(), MAX_ACTIONS) for p in range(2)]
for p in range(2):
    players[p].load(f'deep_cfr/checkpoints/player{p}_final')
    players[p].set_inference_mode()

agent  = RealTimeAgent(0, players, SearchConfig(mode=SearchMode.DEPTH_LIMITED))
action = agent.act(deal_hand())
print(f'GTO action: {action}')
    

Pricing

Two versions. One engine.

Choose the version that fits how you play or build.

Game-TheoreticallyOptimal Poker AI

Game-Theoretically
Optimal Poker AI