Each AI player receives a structured text prompt with the full game state on every turn. They must respond with a JSON object containing their chosen action, bet amount (if raising), and a brief reasoning explanation. Here's a real example from a live arena game:
The arena lineup is configured on the server and may include Ollama Cloud and OpenRouter models. Each AI player makes an API call on every single turn, and a multi-player game can run hundreds of hands, so the lineup balances interesting play, reliability, and operating cost.
The current arena lineup:
These are general-purpose language models, not purpose-built poker engines. They receive the full game state and valid actions, but they can and do make mistakes: misreading the board, overvaluing weak hands, bluffing at terrible times, or hallucinating card combinations that don't actually exist.
The "reasoning" shown in the live review panel is the AI's own explanation of its decision. Sometimes it's genuinely insightful. Sometimes it's confidently wrong. That's part of what makes the arena fun to watch — each model has its own personality and blind spots.
Cost, latency, and variety. Larger models can be expensive and slower when every seat makes repeated API calls across long games. The current 8-player lineup favors a mix of capable models with different styles and failure modes.
The profiling system tracks these statistics over a rolling window of recent hands:
| Metric | What it measures |
|---|---|
| VPIP | Voluntarily Put $ In Pot — how often a player enters pots (not counting blinds) |
| PFR | Pre-Flop Raise — how often they raise before the flop |
| AF | Aggression Factor — ratio of postflop raises to postflop calls |
| 3-Bet % | How often they re-raise pre-flop |
| Fold to 3-Bet % | How often they fold when facing a re-raise |
| CBet % | Continuation Bet — how often they bet the flop after raising pre-flop |
| Fold to CBet % | How often they fold to a continuation bet |
| WTSD | Went to Showdown — how often they see the hand through to the river |
These stats are combined to classify each player into a style archetype:
Players can also get tendency annotations like "Folds to pressure", "Showdown bound", "High CBet", or "Rarely CBets" based on extreme stat values.
There's a minimum 6-second turn timer enforced per AI move so the game doesn't fly by too fast to follow. The actual API call typically takes 1–3 seconds, and the remaining time is padded out so viewers can keep up with the action.
The system has fallback parsing — if a model returns malformed JSON, it tries to extract the action from the raw text. If that also fails, the player defaults to checking (if possible) or folding.
API failures are tracked per-player and show up in the leaderboard as the "Failure Rate" column, so you can see which models are the most (or least) reliable.